Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Test Data for Galaxy tutorial "Clustering 3k PBMCs with Seurat" - SCTransform workflow
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description adjusted from the sctransform documentation (https://satijalab.org/seurat/articles/sctransform_vignette.html):"The results of sctransfrom are stored in layers with the “SCT” prefix. SCT_normalized contains the residuals (normalized values), and is used directly as input to PCA. To assist with visualization and interpretation. we also convert Pearson residuals back to ‘corrected’ UMI counts. You can interpret these as the UMI counts we would expect to observe if all cells were sequenced to the same depth. The ‘corrected’ UMI counts are stored in SCT_corrected_UMI. We store log-normalized versions of these corrected counts in SCT_lognorm_corrected_UMI, which are very helpful for visualization.You can use the corrected log-normalized counts for differential expression and integration. However, in principle, it would be most optimal to perform these calculations directly on the residuals (stored in the SCT_normalized slot) themselves."
Integration Skript:
library(Seurat)
library(tidyverse)
library(Matrix)
#cite <- readRDS("C:/Users/alex/sciebo/ALL_NGS/scRNAseq/scRNAseq/Merge AAA mit Cite AAA/Cite_seq_v0.41.rds")
#CD45 <- readRDS("C:/Users/alex/sciebo/ALL_NGS/scRNAseq/scRNAseq/Schrader/Fertige_Analysen/TS_d5_paper/CD45.rds")
AAA <- readRDS("C:/Users/alex/sciebo/AAA_Zhao_v4.rds")
cite <- readRDS("C:/Users/alex/sciebo/CITE_Seq_v0.5.rds")
all4 <- readRDS("C:/Users/alex/sciebo/ALL_NGS/scRNAseq/scRNAseq/Schrader/Fertige_Analysen/Schrader_All4_Rohanalyse/all4_220228.rds")
#fuse lists
c <- list(cite, all4, AAA)
names(c) <- c("cite", "all4", "AAA")
pancreas.list <- c[c("cite", "all4", "AAA")]
for (i in 1:length(pancreas.list)) {
pancreas.list[[i]] <- SCTransform(pancreas.list[[i]], verbose = FALSE)
}
pancreas.features <- SelectIntegrationFeatures(object.list = pancreas.list, nfeatures = 3000)
#options(future.globals.maxSize= 6091289600)
#pancreas.list <- PrepSCTIntegration(object.list = pancreas.list, anchor.features = pancreas.features,
#verbose = FALSE) #future.globals.maxsize was to low. changed it to options(future.globals.maxSize= 1091289600)
#identify anchors
#alternative from tutorial (https://satijalab.org/seurat/articles/integration_introduction.html)
#memory.limit(9999999999)
features <- SelectIntegrationFeatures(object.list = pancreas.list, nfeatures = 3000)
pancreas.list <- PrepSCTIntegration(object.list = pancreas.list, anchor.features = features)
pancreas.anchors <- FindIntegrationAnchors(object.list = pancreas.list, normalization.method = "SCT", anchor.features = pancreas.features, verbose = FALSE)
pancreas.integrated <- IntegrateData(anchorset = pancreas.anchors, normalization.method = "SCT",
verbose = FALSE)
setwd("C:/Users/alex/sciebo/ALL_NGS/scRNAseq/scRNAseq/Schrader/Fertige_Analysen/TS_d5_paper")
saveRDS(pancreas.integrated, file = "integrated_AAA_Cite_AMI.rds")
saveRDS(cd45, file = "integrated_AAA_Cite_CD45.rds")
seurat <- pancreas.integrated
#seurat <- readRDS("C:/Users/alex/sciebo/ALL_NGS/scRNAseq/scRNAseq/Schrader/Fertige_Analysen/TS_d5_paper/integrated_d5_cite.rds")
DefaultAssay(object = seurat) <- "integrated"
seurat <- FindVariableFeatures(seurat, selection.method = "vst", nfeatures = 3000)
seurat <- ScaleData(seurat, verbose = FALSE)
seurat <- RunPCA(seurat, npcs = 30, verbose = FALSE)
seurat <- FindNeighbors(seurat, dims = 1:30)
seurat <- FindClusters(seurat, resolution = 0.5)
seurat <- RunUMAP(seurat, reduction = "pca", dims = 1:30)
DimPlot(seurat, reduction = "umap", split.by = "treatment") + NoLegend()
DimPlot(seurat, label = T, repel = T) + NoLegend()
DefaultAssay(object = seurat) <- "ADT"
adt_marker_integrated <- FindAllMarkers(seurat, logfc.threshold = 0.3)
write.csv(adt_marker_integrated, file = "adt_marker_all4_integrated.csv")
DefaultAssay(object = seurat) <- "RNA"
RNA_marker_integrated <- FindAllMarkers(seurat, logfc.threshold = 0.5)
write.csv(RNA_marker_integrated, file = "RNA_marker_all4_integrated.csv")
DimPlot(seurat, label = T, repel = T, split.by = "tissue") + NoLegend()
FeaturePlot(seurat, features = "Cd40", order = T, label = T)
FeaturePlot(seurat, features = "Ms.CD40", order = T, label = T)
#####
#leanup:
> seurat@meta.data[["sen_score1"]] <- NULL
> seurat@meta.data[["sen_score2"]] <- NULL
> seurat@meta.data[["sen_score3"]] <- NULL
> seurat@meta.data[["sen_score4"]] <- NULL
> seurat@meta.data[["sen_score5"]] <- NULL
> seurat@meta.data[["sen_score6"]] <- NULL
> seurat@meta.data[["sen_score7"]] <- NULL
> seurat@meta.data[["pANN_0.25_0.1_1211"]] <- NULL
> seurat@meta.data[["DF.classifications_0.25_0.1_1211"]] <- NULL
> seurat@meta.data[["DF.classifications_0.25_0.1_466"]] <- NULL
> seurat@assays[["prediction.score.celltype"]] <- NULL
> seurat@meta.data[["predicted.celltype"]] <- NULL
> seurat@meta.data[["DF.classifications_0.25_0.1_184"]] <- NULL
> seurat@meta.data[["DF.classifications_0.25_0.1_953"]] <- NULL
> seurat@meta.data[["integrated_snn_res.3"]] <- NULL
> seurat@meta.data[["RNA_snn_res.3"]] <- NULL
> seurat@meta.data[["SingleR"]] <- NULL
> seurat@meta.data[["SingleR_fine"]] <- NULL
> seurat@meta.data[["ImmGen"]] <- NULL
> seurat@meta.data[["ImmGen_fine"]] <- NULL
> seurat@meta.data[["percent.mt"]] <- NULL
> seurat@meta.data[["nCount_integrated"]] <- NULL
> seurat@meta.data[["nFeature_integrated"]] <- NULL
> seurat@meta.data[["S.Score"]] <- NULL
> seurat@meta.data[["G2M.Score"]] <- NULL
> seurat@meta.data[["Phase"]] <- NULL
> seurat@meta.data[["sen_score8"]] <- NULL
> seurat@meta.data[["sen_score9"]] <- NULL
> seurat@meta.data[["sen_score10"]] <- NULL
> seurat@meta.data[["sen_score11"]] <- NULL
> seurat@meta.data[["sen_score12"]] <- NULL
> seurat@meta.data[["sen_score13"]] <- NULL
> seurat@meta.data[["sen_score14"]] <- NULL
> seurat@meta.data[["sen_score15"]] <- NULL
> seurat@meta.data[["sen_score16"]] <- NULL
> seurat@meta.data[["sen_score17"]] <- NULL
> seurat@meta.data[["sen_score18"]] <- NULL
> seurat@meta.data[["sen_score19"]] <- NULL
seurat@meta.data[["pANN_0.25_0.1_184"]] <- NULL
seurat@meta.data[["pANN_0.25_0.1_953"]] <- NULL
seurat@meta.data[["pANN_0.25_0.1_466"]] <- NULL
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
This collection of datasets comprises results from four single-cell spatial experiments conducted on mouse brains: two spatial transcriptomics experiments and two spatial proteomics experiments. These experiments were performed using the Bruker Nanostring CosMx technology on 10µm coronal brain sections from the following mouse models: (1) 14-month-old male 5xFAD;ApoeCh mice and genotype controls, and (2) 9-month-old PS19;ApoeCh mice and genotype controls. Each dataset is provided as an RDS file which includes raw and corrected counts for the RNA data and mean fluorescent intensity for the protein data, along with comprehensive metadata. Metadata includes mouse genotype, sample ID, cell type annotations, sex (for PS19;ApoeCh dataset), and X-Y coordinates of each cell. Results from differential gene expression analysis for each cell type between genotypes using MAST are also included as .csv files. Methods Sample preparation: Isopentane fresh-frozen brain hemispheres were embedded in optimal cutting temperature (OCT) compound (Tissue-Tek, Sakura Fintek, Torrance, CA), and 10µm thick coronal sections were prepared using a cryostat (CM1950, LeicaBiosystems, Deer Park, IL). Six hemibrains were mounted onto each VWR Superfrost Plus microscope slide (Avantor, 48311-703) and kept at -80°C until fixation. For both 5xFAD (14 months old, males) and PS19 (9 months old, females and 1 male ApoeCh) models, n=3 mice per genotype except for n=2 for PS19;ApoeCh (wild-type, ApoeCh HO, 5xFAD HEMI or PS19 HEMI, and 5xFAD HEMI; ApoeCh HO or PS19 HEMI;ApoeCh HO) were used for transcriptomics and proteomics. The same mice were used for both transcriptomics and proteomics. Tissues were processed according to the Nanostring CosMx fresh-frozen slide preparation manual for RNA and protein assays (NanoString University). Data processing: Spatial transcriptomics datasets were filtered using the AtoMx RNA Quality Control module to flag outlier negative probes (control probes targeting non-existent sequences to quantify non-specific hybridization), lowly-expressing cells, FOVs, and target genes. Datasets were then normalized and scaled using Seurat 5.0.1 SCTransform to account for differences in library size across cell types [31]. Principal component analysis (PCA) and uniform manifold approximation and projection (UMAP) analysis were performed to reduce dimensionality and visualize clusters in space. Unsupervised clustering at 1.0 resolution yielded 33 clusters for the 5xFAD dataset and 40 clusters for the PS19 dataset. Clusters were manually annotated based on gene expression and spatial location. Spatial proteomics data were filtered using the AtoMx Protein Quality Control module to flag unreliable cells based on segmented cell area, negative probe expression, and overly high/low protein expression. Mean fluorescence intensity data were hyperbolic arcsine transformed with the AtoMx Protein Normalization module. Cell types were automatically annotated based on marker gene expression using the CELESTA algorithm.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single cell RNA-sequencing dataset of peripheral blood mononuclear cells (pbmc: T, B, NK and monocytes) extracted from two healthy donors.
Cells labeled as C26 come from a 30 years old female and cells labeled as C27 come from a 53 years old male. Cells have been isolated from blood using ficoll. Samples were sequenced using standard 3' v3 chemistry protocols by 10x genomics. Cellranger v4.0.0 was used for the processing, and reads were aligned to the ensembl GRCg38 human genome (GRCg38_r98-ensembl_Sept2019). QC metrics were calculated on the count matrix generated by cellranger (filtered_feature_bc_matrix). Cells with less than 3 genes per cells, less than 500 reads per cell and more than 20% of mithocondrial genes were discarded.
The processing steps was performed with the R package Seurat (https://satijalab.org/seurat/), including sample integration, data normalisation and scaling, dimensional reduction, and clustering. SCTransform method was adopted for the normalisation and scaling steps. The clustered cells were manually annotated using known cell type markers.
Files content:
- raw_dataset.csv: raw gene counts
- normalized_dataset.csv: normalized gene counts (single cell matrix)
- cell_types.csv: cell types identified from annotated cell clusters
- cell_types_macro.csv: cell macro types
- UMAP_coordinates.csv: 2d cell coordinates computed with UMAP algorithm in Seurat
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary.Single nucleus RNA sequencing (snRNA-seq; sci-RNA-seq3 protocol) was used to profile the temporal transcriptomic changes in F2L6.13 and F7L6.13-treated H1 hESCs during days 2-5 of in vitro mesoderm induction.Dataset contains 81294 nuclei, with a median of 2097 genes detected per nucleus.Data Format.Data is provided as preprocessed dataset, stored in Seurat Object.Sample processing, sci-RNA-seq3 library generation, and sequencingCells were harvested with 0.25% typsin-EDTA and neuron dissociation solution (Tian et al., 2019), respectively. Cell pellets were immediately snap-frozen in liquid nitrogen and then stored at -80°C for sci-RNA-Seq3-based single-nucleus RNA-Seq processing.Samples from all conditions were processed together to minimize batch effects. Nuclei extraction and fixation were performed as previously described (Cao et al., 2019), except for the use of a modified CST lysis buffer (Slyper et al., 2020) plus 1% SUPERase In RNase Inhibitor (AM2696). Nuclei quality was checked with DAPI and Wheat Germ Agglutinin (WGA) staining. Sci-RNA-Seq3 libraries were generated as previously described (Cao et al., 2019) using three-level combinatorial indexing. The final libraries were sequenced on an Illumina NovaSeq 6000 using the following protocol: read 1: 34bp, read 2: 69bp, index 1: 10bp, index 2: 10bp.Raw sequencing reads were first demultiplexed based on i5/i7 PCR barcodes. FASTQ files were then processed using the sci-RNA-Seq3 pipeline (Cao et al., 2019). After barcodes and UMIs were extracted from the read1 FASTQ files, read alignment was performed using the STAR short-read aligner (v2.5.2b) with the mouse genome (hg38) and Gencode v25 gene annotations. After removing duplicate reads based on UMI, barcode, chromosome and alignment position, reads are summarized into a count matrix of M genes x N nuclei.FilteringRaw single-cell gene count matrices were loaded into a Seurat object (version 4.0.4) (Butler et al., 2018; Hao et al., 2021; Satija et al., 2015; Stuart et al., 2019) and filtered to retain cells with (i) 200 – 9000 recovered genes per cell, (ii) less than 60% mitochondrial content, and (iii) unmatched rate within 3 median absolute deviations of the median.NormalizationTo normalize expression values, we adopted the modeling framework previously described and implemented in the sctransform R Package (version 0.3.2) (Hafemeister & Satija, 2019). In brief, count data were modelled by regularized negative binomial regression, using sequencing depth as a model covariate to regress out the influence of technical effects, and Pearson residuals were used as the normalized and variance stabilized biological signal for downstream analysis.IntegrationCells from each treatment condition and differentiation day ere integrated in Seurat using the reciprocal principal component analysis-based approach, using the top 3000 variable features.Dimensional reductionPCA was applied to normalized and scaled data, and the top components (accounting for 90% of variance observed in the first 50 PCs) were used for UMAP embedding using RunUMAP(max_components = 2, n_neighbours = 50, min_dist = 01, metric = cosine) in Seurat.ClusteringTo identify clusters, we performed Louvain clustering in Seurat using the FindClusters functionContactContact Dr. Nicholas Mikolajewicz regarding any questions about the data or analysis (n.mikolajewicz@utoronto.ca)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary.10 primary GBM and 8 recurrent GBM samples (14/18 matched) profiled using single nucleus RNA- sequencing (sci-RNA-seq3 protocol).Data Format.Data is provided as preprocessed dataset, stored in Seurat Object.Sample processing, sci-RNA-seq3 library generation, and sequencingSnap-frozen patient pGBM and rGBM tissues were chopped with a razor blade or scissors before nucleus isolation. Nuclei extraction and fixation were performed as previously described (Cao 2019), except for the use of a modified CST lysis buffer50 plus 1% of SUPERase-In RNase Inhibitor (Invitrogen, #AM2696). Lysis time and washing steps were further optimized based on human GBM tissue. Nuclei quality was checked with DAPI and Wheat Germ Agglutinin (WGA) staining. Sci-RNA-seq3 libraries were generated as previously described49 using three-level combinatorial indexing. The final libraries were sequenced on Illumina NovaSeq as follows: read 1: 34bp, read 2: >=69bp, index 1: 10bp, index 2: 10bp.Demultiplexing and read alignments.Raw sequencing reads were first demultiplexed based on i5/i7 PCR barcodes. FASTQ files were then processed using the sci-RNA-Seq3 pipeline. After barcodes and unique molecular identifiers (UMIs) were extracted from the read1 of FASTQ files, read alignment was performed using STAR short-read aligner (v2.5.2b) with the human genome (hg19) and Gencode v24 gene annotations. After removing duplicate reads based on UMI, barcode, chromosome and alignment position, reads were summarized into a count matrix of M genes × N nuclei.Filtering, normalization, integration, and dimensional reduction.Raw count matrices were loaded into a Seurat object (version 4.0.1) and filtered to retain cells with (i) 200 – 9000 recovered genes per cell, (ii) less than 60% mitochondrial content, and (iii) unmatched rate within 3 median absolute deviations of the median. To normalize count matrix, we adopted the modeling framework previously described and implemented in sctransform (R Package, version 0.3.2). In brief, count data were modelled by regularized negative binomial regression, using sequencing depth as a model covariate to regress out the influence of technical effects, and Pearson residuals were used as the normalized and variance stabilized biological signal for downstream analysis. Data from each patient were integrated with the reciprocal PCA method (Seurat) using the top 2000 variable features. PCA was performed on the integrated dataset, and the top N components that accounted for 90% of the observed variance were used for UMAP embedding, RunUMAP(max_components = 2, n_neighbours = 50, min_dist = 01, metric = cosine).Contact.Contact Dr. Nicholas Mikolajewicz regarding any questions about the data or analysis (n.mikolajewicz@utoronto.ca)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Using 6-months, 16-months, and 24-months old mice of a inducible expression of human a-syn constructs based Parkinson mouse model, we produced a single nucleus RNA dataset by cutting 0mm Bregma to -5mm Bregma. The Chromium 3’ Single Cell Library Kit (10x Genomics) was used and Sequencing was performed on a NovaSeq 6000.
Paired 150bp snRNA-seq was performed using the 10X Genomics Gene Expression (GEX) 3´protocol with an NovaSeq 6000 sequencer. For the alignment of reads, a custom reference was created by adding the sequences of the S1/S2 transgene and the CamkIIa promoter to the mm10 mouse reference genome. Count matrices were obtained using the cellranger count 7.1 pipeline, including introns. Six samples were mapped using the bwUni2.0 High-Performance Computing infrastructure.
The unfiltered count matrices were loaded into R and corrected for ambient mRNA using SoupX 1.6.0 with default settings, adjusting “tfidfMin” settings between 0.9 and 1.3 depending on the sample. Seurat objects were created for each sample and subsequently merged. Cells were filtered out based on the following criteria: number of unique molecular identifiers (nUMI) < 2500, number of genes (nGene) < 1500, mitochondrial gene percentage > 3%, ribosomal gene percentage > 1.5%, or log10(Genes/nUMI) < 0.85. Subsequently, doublets and sex-doublets were removed using scDblFinder 1.16.0 and cellXY 0.99.0.
Normalization was performed using the SCTransform function on 4000 variable features with glmGamPoi method implemented in Seurat 5.0.1, and top 50 embeddings were obtained via scVI (scvi-tools 1.1.1) integration for sex, age, batch, and number of pooled animals. Clustering was done using the Leiden algorithm and visualized with Uniform Manifold Approximation and Projection algorithm (UMAP). Clusters represented by few samples, less than 100 cells, or a single batch, and not of conditions of interest, were removed. Clusters driven by ribosomal or mitochondrial genes, as well as markers of hindbrain and olfactory cell types, were also discarded. The steps from normalization onward were repeated until no further clusters needed removal. Final integration was performed using harmony 1.2.0 with an integration diversity penalty (theta) of 2, followed by final clustering based on the top 30 harmony components and UMAP visualization. Each subsequent clustering for annotation of sub-cell types was computed following the same procedure.
Clusters were annotated in a hierarchical manner using literature, the Mouse Brain Atlas (mousebrain.org), the Human Protein Atlas, and markers identified via the FindConservedMarkers function in Seurat. First, neurons and non-neuronal cells were distinguished using mainly canonical markers, such as but not limited to Rbfox3 (neurons), Mbp (oligodendrocytes), Acsbg1 (astrocytes), Pdgfra (oligodendrocyte precursor cells), Cx3cr1 (microglia), Colec12 (vascular cells), and Ttr (choroid plexus cells). Neurons were further classified into Vglut1, Vglut2, GABA, cholinergic, and dopaminergic neurons. Vglut1 and GABA neurons were further sub-annotated.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We acquired 10x Visium spatial transcriptomics (ST) data from 9 patients with invasive adenocarcinomas [1–5] to explore the role of the tumour microenvironment (TME) on intratumor heterogeneity (ITH) and drug response in breast cancer. By leveraging a new version of Beyondcell 6, a tool for identifying tumour cell subpopulations with distinct drug response patterns, we predicted sensitivity to over 1,200 drugs while accounting for the spatial context and interaction between the tumour and TME compartments. Moreover, we also used Beyondcell to compute spot-wise functional enrichment scores and identify niche-specific biological functions.
Here, you can find:
In signatures folder:
SSc breast: Collection of gene signatures used to predict sensitivity to > 1,200 drugs derived from breast cancer cell lines.
Functional signatures: Collection of gene signatures used to compute enrichment in different biological pathways.
In visium folder:
Visium objects: Processed ST Seurat objects with deconvoluted spots, SCTransform-normalised counts, and clonal composition predicted with SCEVAN [7]. These objects, together with the signatures, were used to compute the Beyondcell objects.
In single-cell folder:
Single-cell objects: Raw and filtered merged single-cell RNA-seq (scRNA-seq) Seurat objects with unnormalised counts used as a reference for spot deconvolution.
In beyondcell folder:
Beyondcell sensitivity objects with prediction scores for all drug response signatures in SSc breast.
Beyondcell functional objects with enrichment scores for all functional signatures.
Kidney stone disease causes significant morbidity and increases health care utilization. The pathogenesis of stone disease is incompletely understood, due in part to the poor characterization of the cellular and molecular makeup of the human papilla and its alteration with disease. In this work, we characterize the human renal papilla in health and calcium oxalate stone disease using single nuclear RNA sequencing, spatial transcriptomics and high-resolution large scale multiplexed 3D and Co-Detection by indexing (CODEX) imaging. We define and localize subtypes of principal cells enriched in the papilla as well as immune and stromal cell populations. We further uncovered an undifferentiated epithelial cell signature in the papilla, particularly during nephrolithiasis. Frozen 10 µm sections were mounted on to etched frames of the Visium spatial gene expression (VSGE) slides with capture areas according to 10x Genomics protocols (Visium Spatial Protocols—Tissue Preparation Guide, Document Number CD=G000240 Rev A, 10x Genomics). H&E-stained sections were imaged with a Keyence BZ-X810 microscope equipped with a Nikon 10× CFI Plan Fluor objective at 0.7547 um/pixel and camera dimensions of 1920x1440. Stained tissues were permeabilized for 12 minutes. mRNA bound to oligonucleotides on the capture areas of the Visium slides was extracted. cDNA libraries were prepared with second strand synthesis and sequenced utilizing the NovaSeq 6000 Sequencing system (Illumina) in the 28 bp + 120 bp paired-end sequencing mode. Samples were mapped using Space Ranger 1.2.0 with the reference genome GRCh3-2020-A. The data were normalized by SCTransform and merged to build a unified UMAP and dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 3: Table S2. Overview of Visium SRT sample sequencing information and 3,000 gene features used fro SCTransform.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Test Data for Galaxy tutorial "Clustering 3k PBMCs with Seurat" - SCTransform workflow