Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Test Data for Galaxy tutorial "Clustering 3k PBMCs with Seurat" - SCTransform workflow
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We acquired 10x Visium spatial transcriptomics (ST) data from 9 patients with invasive adenocarcinomas [1–5] to explore the role of the tumour microenvironment (TME) on intratumor heterogeneity (ITH) and drug response in breast cancer. By leveraging a new version of Beyondcell 6, a tool for identifying tumour cell subpopulations with distinct drug response patterns, we predicted sensitivity to over 1,200 drugs while accounting for the spatial context and interaction between the tumour and TME compartments. Moreover, we also used Beyondcell to compute spot-wise functional enrichment scores and identify niche-specific biological functions.
Here, you can find:
In signatures folder:
SSc breast: Collection of gene signatures used to predict sensitivity to > 1,200 drugs derived from breast cancer cell lines.
Functional signatures: Collection of gene signatures used to compute enrichment in different biological pathways.
In visium folder:
Visium objects: Processed ST Seurat objects with deconvoluted spots, SCTransform-normalised counts, and clonal composition predicted with SCEVAN [7]. These objects, together with the signatures, were used to compute the Beyondcell objects.
In single-cell folder:
Single-cell objects: Raw and filtered merged single-cell RNA-seq (scRNA-seq) Seurat objects with unnormalised counts used as a reference for spot deconvolution.
In beyondcell folder:
Beyondcell sensitivity objects with prediction scores for all drug response signatures in SSc breast.
Beyondcell functional objects with enrichment scores for all functional signatures.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description adjusted from the sctransform documentation (https://satijalab.org/seurat/articles/sctransform_vignette.html):"The results of sctransfrom are stored in layers with the “SCT” prefix. SCT_normalized contains the residuals (normalized values), and is used directly as input to PCA. To assist with visualization and interpretation. we also convert Pearson residuals back to ‘corrected’ UMI counts. You can interpret these as the UMI counts we would expect to observe if all cells were sequenced to the same depth. The ‘corrected’ UMI counts are stored in SCT_corrected_UMI. We store log-normalized versions of these corrected counts in SCT_lognorm_corrected_UMI, which are very helpful for visualization.You can use the corrected log-normalized counts for differential expression and integration. However, in principle, it would be most optimal to perform these calculations directly on the residuals (stored in the SCT_normalized slot) themselves."
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
This collection of datasets comprises results from four single-cell spatial experiments conducted on mouse brains: two spatial transcriptomics experiments and two spatial proteomics experiments. These experiments were performed using the Bruker Nanostring CosMx technology on 10µm coronal brain sections from the following mouse models: (1) 14-month-old male 5xFAD;ApoeCh mice and genotype controls, and (2) 9-month-old PS19;ApoeCh mice and genotype controls. Each dataset is provided as an RDS file which includes raw and corrected counts for the RNA data and mean fluorescent intensity for the protein data, along with comprehensive metadata. Metadata includes mouse genotype, sample ID, cell type annotations, sex (for PS19;ApoeCh dataset), and X-Y coordinates of each cell. Results from differential gene expression analysis for each cell type between genotypes using MAST are also included as .csv files. Methods Sample preparation: Isopentane fresh-frozen brain hemispheres were embedded in optimal cutting temperature (OCT) compound (Tissue-Tek, Sakura Fintek, Torrance, CA), and 10µm thick coronal sections were prepared using a cryostat (CM1950, LeicaBiosystems, Deer Park, IL). Six hemibrains were mounted onto each VWR Superfrost Plus microscope slide (Avantor, 48311-703) and kept at -80°C until fixation. For both 5xFAD (14 months old, males) and PS19 (9 months old, females and 1 male ApoeCh) models, n=3 mice per genotype except for n=2 for PS19;ApoeCh (wild-type, ApoeCh HO, 5xFAD HEMI or PS19 HEMI, and 5xFAD HEMI; ApoeCh HO or PS19 HEMI;ApoeCh HO) were used for transcriptomics and proteomics. The same mice were used for both transcriptomics and proteomics. Tissues were processed according to the Nanostring CosMx fresh-frozen slide preparation manual for RNA and protein assays (NanoString University). Data processing: Spatial transcriptomics datasets were filtered using the AtoMx RNA Quality Control module to flag outlier negative probes (control probes targeting non-existent sequences to quantify non-specific hybridization), lowly-expressing cells, FOVs, and target genes. Datasets were then normalized and scaled using Seurat 5.0.1 SCTransform to account for differences in library size across cell types [31]. Principal component analysis (PCA) and uniform manifold approximation and projection (UMAP) analysis were performed to reduce dimensionality and visualize clusters in space. Unsupervised clustering at 1.0 resolution yielded 33 clusters for the 5xFAD dataset and 40 clusters for the PS19 dataset. Clusters were manually annotated based on gene expression and spatial location. Spatial proteomics data were filtered using the AtoMx Protein Quality Control module to flag unreliable cells based on segmented cell area, negative probe expression, and overly high/low protein expression. Mean fluorescence intensity data were hyperbolic arcsine transformed with the AtoMx Protein Normalization module. Cell types were automatically annotated based on marker gene expression using the CELESTA algorithm.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The peripheral blood immune cell (PBMC) samples were collected from patients infected with dengue virus (DENV) at four time points: two and one day(s) before defervescence (febrile phase), at defervescence (critical phase), and two-week convalescence. The raw and filtered matrix files were generated using CellRanger version 3.0.2 (10x Genomics, USA) with the reference human genome GRCh38 1.2.0. Potential contamination of ambient RNAs was corrected using SoupX. Low quality cells, including cells expressing mitochondrial genes higher than 10% and doublets/multiplets, were excluded using Seurat and doubletFinder, respectively. The individual samples were then integrated using the SCTransform method with 3,000 gene features. Principal component analysis (PCA) and clustering were performed with the Louvain algorithm applying multi-level refinement algorithm. The gene expression level of each cell was normalized using the LogNormalize method in Seurat. Cell types were annotated using the canonical marker genes described in the original paper, see related link below.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary.10 primary GBM and 8 recurrent GBM samples (14/18 matched) profiled using single nucleus RNA- sequencing (sci-RNA-seq3 protocol).Data Format.Data is provided as preprocessed dataset, stored in Seurat Object.Sample processing, sci-RNA-seq3 library generation, and sequencingSnap-frozen patient pGBM and rGBM tissues were chopped with a razor blade or scissors before nucleus isolation. Nuclei extraction and fixation were performed as previously described (Cao 2019), except for the use of a modified CST lysis buffer50 plus 1% of SUPERase-In RNase Inhibitor (Invitrogen, #AM2696). Lysis time and washing steps were further optimized based on human GBM tissue. Nuclei quality was checked with DAPI and Wheat Germ Agglutinin (WGA) staining. Sci-RNA-seq3 libraries were generated as previously described49 using three-level combinatorial indexing. The final libraries were sequenced on Illumina NovaSeq as follows: read 1: 34bp, read 2: >=69bp, index 1: 10bp, index 2: 10bp.Demultiplexing and read alignments.Raw sequencing reads were first demultiplexed based on i5/i7 PCR barcodes. FASTQ files were then processed using the sci-RNA-Seq3 pipeline. After barcodes and unique molecular identifiers (UMIs) were extracted from the read1 of FASTQ files, read alignment was performed using STAR short-read aligner (v2.5.2b) with the human genome (hg19) and Gencode v24 gene annotations. After removing duplicate reads based on UMI, barcode, chromosome and alignment position, reads were summarized into a count matrix of M genes × N nuclei.Filtering, normalization, integration, and dimensional reduction.Raw count matrices were loaded into a Seurat object (version 4.0.1) and filtered to retain cells with (i) 200 – 9000 recovered genes per cell, (ii) less than 60% mitochondrial content, and (iii) unmatched rate within 3 median absolute deviations of the median. To normalize count matrix, we adopted the modeling framework previously described and implemented in sctransform (R Package, version 0.3.2). In brief, count data were modelled by regularized negative binomial regression, using sequencing depth as a model covariate to regress out the influence of technical effects, and Pearson residuals were used as the normalized and variance stabilized biological signal for downstream analysis. Data from each patient were integrated with the reciprocal PCA method (Seurat) using the top 2000 variable features. PCA was performed on the integrated dataset, and the top N components that accounted for 90% of the observed variance were used for UMAP embedding, RunUMAP(max_components = 2, n_neighbours = 50, min_dist = 01, metric = cosine).Contact.Contact Dr. Nicholas Mikolajewicz regarding any questions about the data or analysis (n.mikolajewicz@utoronto.ca)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mucosal biopsy samples were obtained from patients during routine ileocolonoscopy at the Department of Gastroenterology and Hepatology of the Amsterdam UMC between December 2020 and April 2021. Patients were aged ≥16 years with an established diagnosis of ulcerative colitis (UC) determined through endoscopy and histopathology, where disease activity was scored by a trained gastroenterologist using the Mayo Endoscopic Score (MES) and the Ulcerative Colitis Endoscopic Index of Severity (UCEIS). In total, we acquired samples from 10 UC patients: 5 inflamed UC (MES≥1), 5 non-inflamed UC (MES=0). Exclusion criteria were ongoing malignancy, a history of colonic dysplasia, or colonic surgery. Control samples were obtained from resection specimens acquired from 4 patients with no established diagnosis of UC (CD, suspicion of rectum carcinoma, trans-anal total mesorectal excision or hemicolectomy), which were obtained from the biobank of the Amsterdam UMC.
Raw reads were aligned to GRCh38 using Cellranger (v7.0.0) (10X Genomics) generating unique molecular identifiers (UMIs) were obtained. Samples were imported separately, processed, and analysed in the R programming environment (v4.2.1) using Seurat (v4.3.0) 43,44. UMI counts were normalized by SCTransform (v) using default parameters.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R script for single-cell RNA-seq data analysis. The code includes steps for quality control, normalization using SCTransform, dimensional reduction (PCA and UMAP), clustering, differential gene expression analysis, and visualization of marker genes. Integration workflows were performed to combine control and LT-treated organoid datasets, followed by annotation of epithelial subtypes based on established marker genes. Additional scripts generate figures such as UMAP projections, heatmaps, dot plots, and violin plots.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single-cell protein activity was computed on the SCTransform-scaled and Anchor-Integrated gene expression signatures across metaCells by the metaVIPER function in the VIPER package (Bioconductor). Briefly, metaVIPER was developed as an adaptation of VIPER to single-cell data. Protein activity is inferred for a given gene expression signature using multiple networks which are integrated on a protein-by-protein basis using the square of the NES generated by each individual network. Since a non-relevant network would generate a protein activity score close to zero under the null model, networks that generate more extreme NES can be interpreted to more accurately match the given biological context and are thus weighted more heavily for each protein. VIPER-inferred protein activity was computed on the gene expression signatures of all the single cells using the gene expression cluster-based single-cell ARACNe networks, and on the gene expression signatures of the tumor compartment single cells using the six patient-specific tumor single-cell ARACNe networks. The VIPER matrix includes all significant Master Regulators (MR).
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Test Data for Galaxy tutorial "Clustering 3k PBMCs with Seurat" - SCTransform workflow