Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Test Data for Galaxy Tutorial "Clustering 3k PBMCs with Seurat"
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file is the scRNA-seq data seurat cluster markers
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.
Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.
Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).
Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.
Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).
Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).
Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.
Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.
Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).
Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Scripts used for analysis of V1 and V2 Datasets.seurat_v1.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, PCA analysis, clustering, tSNE visualization. Used for v1 datasets. merge_seurat.R - merge two or more seurat objects into one seurat object. Perform linear regression to remove batch effects from separate objects. Used for v1 datasets. subcluster_seurat_v1.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA. Used for v1 datasets.seurat_v2.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, and PCA analysis. Used for v2 datasets. clustering_markers_v2.R - clustering and tSNE visualization for v2 datasets. subcluster_seurat_v2.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA analysis. Used for v2 datasets.seurat_object_analysis_v1_and_v2.R - downstream analysis and plotting functions for seurat object created by seurat_v1.R or seurat_v2.R. merge_clusters.R - merge clusters that do not meet gene threshold. Used for both v1 and v2 datasets. prepare_for_monocle_v1.R - subcluster cells of interest and perform linear regression, but not scaling in order to input normalized, regressed values into monocle with monocle_seurat_input_v1.R monocle_seurat_input_v1.R - monocle script using seurat batch corrected values as input for v1 merged timecourse datasets. monocle_lineage_trace.R - monocle script using nUMI as input for v2 lineage traced dataset. monocle_object_analysis.R - downstream analysis for monocle object - BEAM and plotting. CCA_merging_v2.R - script for merging v2 endocrine datasets with canonical correlation analysis and determining the number of CCs to include in downstream analysis. CCA_alignment_v2.R - script for downstream alignment, clustering, tSNE visualization, and differential gene expression analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The provided datasets correspond to the analyses of individual donor single-cell RNA Sequencing (scRNA-Seq) datasets, before their integration. The datasets have been saved as Seurat v4.0.5 objects. For clustering, we used default settings in Seurat 4.0.5 (resolution 0.8) and increased resolution, if necessary, to separate epithelium in proximal and distal.
The *_clusters.pdf files show the suggested clusters in the individual datasets and the *_indiv_anno1.pdf files show the cell annotations according to the 84 cell states, described in the study with title "Developmental origins of cell heterogeneity in the human lung" (1st preprint version doi: https://doi.org/10.1101/2022.01.11.475631).
The "*_cluster_annotations.csv" files provide information about the suggested annotations of the clusters.
The "*_object_raw_and_log_counts.RData" objects contain the metadata and the UMI-counts [raw and log2(counts+1)] for each donor scRNA-Seq dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
##### CD40 activation and the effect on Neutrophils
# Load necessary libraries for data manipulation, analysis, and visualization
library(dplyr)
library(Seurat)
library(patchwork)
library(plyr)
# Set the working directory to the folder containing the data
setwd("C:/Users/ALL/sciebo - Lang, Alexander (allan101@uni-duesseldorf.de)@uni-duesseldorf.sciebo.de/ALL_NGS/scRNAseq/scRNAseq/05_FGK45 Wirkung auf Neutros - scRNAseq/938-1_cellranger_count/outs")
# Read the M0 dataset from the 10X Genomics format
pbmc.data <- Read10X(data.dir = "filtered_feature_bc_matrix/")
RNA <- pbmc.data$`Gene Expression`
ADT <- pbmc.data$`Antibody Capture`
HST <- pbmc.data$`Multiplexing Capture`
# Load the Matrix package
library(Matrix)
# Hashtag 1, 2 and 3 are marking the organs (heart, blood, spleen)
# Subset the rows based on row names
subsetted_rows <- c("TotalSeq-B0301", "TotalSeq-B0302", "TotalSeq-B0303")
animals_data <- HST[subsetted_rows, , drop = FALSE]
# Hashtag 4, 5, 6, 7 are representing IgG_1, IgG_1, FGK45_1 and FGK45_1
subsetted_rows <- c("TotalSeq-B0304", "TotalSeq-B0305", "TotalSeq-B0306", "TotalSeq-B0307")
treatment_data <- HST[subsetted_rows, , drop = FALSE]
#Create a Seurat obeject and more assays to combine later
RNA <- CreateSeuratObject(counts = RNA)
ADT <- CreateAssayObject(counts = ADT)
Organ <- CreateAssayObject(counts = animals_data)
Treatment <- CreateAssayObject(counts = treatment_data)
seurat <- RNA
#Add the Assays
seurat[["ADT"]] <- ADT
seurat[["HST_Mice"]] <- Organ
seurat[["HST_Treatment"]] <- Treatment
#Check for AK Names
rownames(seurat[["ADT"]])
#Cluster cells on the basis of their scRNA-seq profiles
# perform visualization and clustering steps
DefaultAssay(seurat) <- "RNA"
seurat <- NormalizeData(seurat)
seurat <- FindVariableFeatures(seurat)
seurat <- ScaleData(seurat)
seurat <- RunPCA(seurat, verbose = FALSE)
seurat <- FindNeighbors(seurat, dims = 1:30)
seurat <- FindClusters(seurat, resolution = 0.8, verbose = FALSE)
seurat <- RunUMAP(seurat, dims = 1:30)
DimPlot(seurat, label = TRUE)
FeaturePlot(seurat, features = "S100a9", order = T)
# Normalize ADT data,
DefaultAssay(seurat) <- "ADT"
seurat <- NormalizeData(seurat, normalization.method = "CLR", margin = 2)
#Demultiplex cells based on Mouse_Hashtag Enrichment
seurat <- NormalizeData(seurat, assay = "HST_Mice", normalization.method = "CLR")
seurat <- HTODemux(seurat, assay = "HST_Mice", positive.quantile = 0.99)
#Visualize demultiplexing results
# Global classification results
table(seurat$HST_Mice_classification.global)
DimPlot(seurat, group.by = "HST_Mice_classification")
#Demultiplex cells based on Treatment_Hashtag Enrichment
seurat <- NormalizeData(seurat, assay = "HST_Treatment", normalization.method = "CLR")
seurat <- HTODemux(seurat, assay = "HST_Treatment", positive.quantile = 0.99)
#Visualize demultiplexing results
# Global classification results
table(seurat$HST_Treatment_classification.global)
DimPlot(seurat, group.by = "HST_Treatment_classification")
Idents(seurat) <- seurat$HST_Treatment_classification
pbmc.singlet <- subset(seurat, idents = "Negative", invert = T)
Idents(pbmc.singlet) <- pbmc.singlet$HST_Mice_classification
pbmc.singlet <- subset(pbmc.singlet, idents = "Negative", invert = T)
DimPlot(pbmc.singlet, group.by = "HST_Treatment_maxID")
#Redo the clssification to remove the doublettes
pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Treatment", positive.quantile = 0.99)
table(pbmc.singlet$HST_Treatment_classification.global)
DimPlot(pbmc.singlet, group.by = "HST_Treatment_classification")
pbmc.singlet <- subset(pbmc.singlet, idents = "Doublet", invert = T)
pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Mice", positive.quantile = 0.99)
table(pbmc.singlet$HST_Mice_classification.global)
pbmc.singlet <- subset(pbmc.singlet, idents = "Doublet", invert = T)
pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Mice", positive.quantile = 0.60)
pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Treatment", positive.quantile = 0.60)
DimPlot(pbmc.singlet, group.by = "HST_Treatment_maxID")
DimPlot(pbmc.singlet, group.by = "HST_Mice_maxID")
seurat <- pbmc.singlet
seurat$organ <- seurat$HST_Mice_maxID
seurat$mouse <- seurat$HST_Treatment_maxID
seurat$treatment <- seurat$HST_Treatment_maxID
library(plyr)
seurat$treatment <- revalue(seurat$treatment, c(
"TotalSeq-B0304" = "IgG",
"TotalSeq-B0305" = "IgG",
"TotalSeq-B0306" = "FGK45",
"TotalSeq-B0307" = "FGK45"
))
library(plyr)
seurat$organ <- revalue(seurat$organ, c(
"TotalSeq-B0301" = "heart",
"TotalSeq-B0302" = "blood",
"TotalSeq-B0303" = "spleen"
))
seurat$mouse <- revalue(seurat$mouse, c(
"TotalSeq-B0304" = "1",
"TotalSeq-B0305" = "2",
"TotalSeq-B0306" = "3",
"TotalSeq-B0307" = "4"
))
#Cluster cells on the basis of their scRNA-seq profiles without doublettes
# perform visualization and clustering steps
DefaultAssay(seurat) <- "RNA"
seurat <- NormalizeData(seurat)
seurat <- FindVariableFeatures(seurat)
seurat <- ScaleData(seurat)
seurat <- RunPCA(seurat, verbose = FALSE)
seurat <- FindNeighbors(seurat, dims = 1:30)
seurat <- FindClusters(seurat, resolution = 0.8, verbose = FALSE)
seurat <- RunUMAP(seurat, dims = 1:30)
DimPlot(seurat, label = TRUE)
DefaultAssay(seurat) <- "ADT"
seurat <- NormalizeData(seurat, normalization.method = "CLR", margin = 2)
setwd("C:/Users/ALL/sciebo - Lang, Alexander (allan101@uni-duesseldorf.de)@uni-duesseldorf.sciebo.de/ALL_NGS/scRNAseq/scRNAseq/05_FGK45 Wirkung auf Neutros - scRNAseq/Analyse")
saveRDS(seurat, file = "FGK45_heart_blood_spleen.v0.1.RDS")
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
##### CD40 inhibiton in AMI on d5, seq on d7 and d14
# Load necessary libraries for data manipulation, analysis, and visualization
library(dplyr)
library(Seurat)
library(patchwork)
library(plyr)
# Set the working directory to the folder containing the data
setwd("C:/Users/ALL/sciebo - Lang, Alexander (allan101@uni-duesseldorf.de)@uni-duesseldorf.sciebo.de/ALL_NGS/scRNAseq/scRNAseq/01_TS_d5_paper/03_CD40 inhibition on day 5, seq on day 7 and 14/938-2_cellranger_count/outs")
# Read the M0 dataset from the 10X Genomics format
pbmc.data <- Read10X(data.dir = "filtered_feature_bc_matrix/")
RNA <- pbmc.data$`Gene Expression`
ADT <- pbmc.data$`Antibody Capture`
HST <- pbmc.data$`Multiplexing Capture`
# Load the Matrix package
library(Matrix)
# Hashtag 1, 2 and 3 are marking the mouse replicates per condition
# Subset the rows based on row names
subsetted_rows <- c("TotalSeq-B0301", "TotalSeq-B0302", "TotalSeq-B0303")
animals_data <- HST[subsetted_rows, , drop = FALSE]
# Hashtag 4, 5, 6, 7 are representing DMSO d7, TS d7, DMSO d14 and TS d14
subsetted_rows <- c(""TotalSeq-B0304", "TotalSeq-B0305", "TotalSeq-B0306", "TotalSeq-B0307")
treatment_data <- HST[subsetted_rows, , drop = FALSE]
#Create a Seurat obeject and more assays to combine later
RNA <- CreateSeuratObject(counts = RNA)
ADT <- CreateAssayObject(counts = ADT)
Mice <- CreateAssayObject(counts = animals_data)
Treatment <- CreateAssayObject(counts = treatment_data)
seurat <- RNA
#Add the Assays
seurat[["ADT"]] <- ADT
seurat[["HST_Mice"]] <- Mice
seurat[["HST_Treatment"]] <- Treatment
#Check for AK Names
rownames(seurat[["ADT"]])
#Cluster cells on the basis of their scRNA-seq profiles
# perform visualization and clustering steps
DefaultAssay(seurat) <- "RNA"
seurat <- NormalizeData(seurat)
seurat <- FindVariableFeatures(seurat)
seurat <- ScaleData(seurat)
seurat <- RunPCA(seurat, verbose = FALSE)
seurat <- FindNeighbors(seurat, dims = 1:30)
seurat <- FindClusters(seurat, resolution = 0.8, verbose = FALSE)
seurat <- RunUMAP(seurat, dims = 1:30)
DimPlot(seurat, label = TRUE)
FeaturePlot(seurat, features = "Col1a1", order = T)
# Normalize ADT data,
DefaultAssay(seurat) <- "ADT"
seurat <- NormalizeData(seurat, normalization.method = "CLR", margin = 2)
#Demultiplex cells based on Mouse_Hashtag Enrichment
seurat <- NormalizeData(seurat, assay = "HST_Mice", normalization.method = "CLR")
seurat <- HTODemux(seurat, assay = "HST_Mice", positive.quantile = 0.60)
#Visualize demultiplexing results
# Global classification results
table(seurat$HST_Mice_classification.global)
DimPlot(seurat, group.by = "HST_Mice_classification")
#Demultiplex cells based on Treatment_Hashtag Enrichment
seurat <- NormalizeData(seurat, assay = "HST_Treatment", normalization.method = "CLR")
seurat <- HTODemux(seurat, assay = "HST_Treatment", positive.quantile = 0.60)
#Visualize demultiplexing results
# Global classification results
table(seurat$HST_Treatment_classification.global)
DimPlot(seurat, group.by = "HST_Treatment_classification")
Idents(seurat) <- seurat$HST_Treatment_classification
pbmc.singlet <- subset(seurat, idents = "Negative", invert = T)
Idents(pbmc.singlet) <- pbmc.singlet$HST_Mice_classification
pbmc.singlet <- subset(pbmc.singlet, idents = "Negative", invert = T)
DimPlot(pbmc.singlet, group.by = "HST_Treatment_maxID")
#Redo the clssification to remove the doublettes
pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Treatment", positive.quantile = 0.99)
table(pbmc.singlet$HST_Treatment_classification.global)
DimPlot(pbmc.singlet, group.by = "HST_Treatment_classification")
pbmc.singlet <- subset(pbmc.singlet, idents = "Doublet", invert = T)
pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Mice", positive.quantile = 0.99)
table(pbmc.singlet$HST_Mice_classification.global)
pbmc.singlet <- subset(pbmc.singlet, idents = "Doublet", invert = T)
pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Mice", positive.quantile = 0.60)
pbmc.singlet <- HTODemux(pbmc.singlet, assay = "HST_Treatment", positive.quantile = 0.60)
DimPlot(pbmc.singlet, group.by = "HST_Treatment_maxID")
DimPlot(pbmc.singlet, group.by = "HST_Mice_maxID")
seurat <- pbmc.singlet
seurat$mice <- seurat$HST_Mice_maxID
seurat$treatment <- seurat$HST_Treatment_maxID
library(plyr)
seurat$treatment <- revalue(seurat$treatment, c(
"TotalSeq-B0304" = "DMSO_d7",
"TotalSeq-B0305" = "TS_d7",
"TotalSeq-B0306" = "DMSO_d14",
"TotalSeq-B0307" = "TS_d14"
))
library(plyr)
seurat$mice <- revalue(seurat$mice, c(
"TotalSeq-B0301" = "1",
"TotalSeq-B0302" = "2",
"TotalSeq-B0303" = "3"
))
#Cluster cells on the basis of their scRNA-seq profiles without doublettes
# perform visualization and clustering steps
DefaultAssay(seurat) <- "RNA"
seurat <- NormalizeData(seurat)
seurat <- FindVariableFeatures(seurat)
seurat <- ScaleData(seurat)
seurat <- RunPCA(seurat, verbose = FALSE)
seurat <- FindNeighbors(seurat, dims = 1:30)
seurat <- FindClusters(seurat, resolution = 0.8, verbose = FALSE)
seurat <- RunUMAP(seurat, dims = 1:30)
DimPlot(seurat, label = TRUE)
DefaultAssay(seurat) <- "ADT"
seurat <- NormalizeData(seurat, normalization.method = "CLR", margin = 2)
setwd("C:/Users/ALL/sciebo - Lang, Alexander (allan101@uni-duesseldorf.de)@uni-duesseldorf.sciebo.de/ALL_NGS/scRNAseq/scRNAseq/01_TS_d5_paper/03_CD40 inhibition on day 5, seq on day 7 and 14/Analyse")
saveRDS(seurat, file= "TSd5.v0.1.RDS")
Facebook
TwitterDataset created in the study "A Spatial Transcriptomics Atlas of the Malaria-infected Liver Indicates a Crucial Role for Lipid Metabolism and Hotspots of Inflammatory Cell Infiltration"
Structure
ST_berghei_liver
contains data generated during stpipeline analysis and imaging on 2k arrays Spatial Transcriptomics platform as well as data necessary for and from hepaquery analysis. These samples include 38 sections in total of which 8 are from mice (n=4) infected with sporozoites for 12h, 5 sections from control mice (n=3) at 12h, 7 sections from mice (n=4) infected with sporozoites for 24h and 4 sections from control mice (n=3) for 24 as well as 8 samples of mice (n=2) infected with sporozoites for 38h and control mice (n =2) for 38h.
STUtiility_mus_pb_ST.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in ST_berghei_liver
visium_berghei_liver
contains data generated with the spaceranger pipeline and imaging using the Visium spatial transcriptomics platform. These samples include 8 sections in total, of which 1 was infected with sporozoites for 12h, 1 control section at 12h, 1 section infected with sporozoites for 24h and 1 control section at 24 as well as 2 sporozoite infected sections, and 2 control sections at 38h.
V10S29-135_B1 contains spaceranger output for section 1 for infected and control sections at 12h post-infection
V10S29-135_C1 contains spaceranger output for section 1 for infected and control sections at 24h post-infection
V10S29-135_D1 contains spaceranger output for section 2 for infected and control sections at 38h post-infection
se_visium.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in visium_berghei_liver
snSeq_berghei_liver
contains data generated with the cellranger pipeline and imaging using the Visium spatial transcriptomics platform. These samples include single nuclei of 2 infected and control mice after 12h, 2 infected and control mice after 24h, 2 infected and control mice after 38h, and 2 uninfected mice prior to a challenge.
cellranger_cnt_out contains feature count matrix information from cell ranger output
final_merged_curated_annotations_270623.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in snSeq_berghei_liver.tar.gz
raw images.zip contains raw images for supplementary figures 20-22
adjusted images.zip contains brightness and contrast adjusted images for supplementary figures 20-22
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TLDRSeurat object of the 16 NPM1-mutated AML samples (n = 83,162 cells).AML samplesAll sixteen peripheral blood and bone marrow samples were obtained from patients with AML at diagnosis (n=15) or relapse after chemotherapy (n=1) with written informed consent according to the Declaration of Helsinki. Mononuclear cells were isolated by Ficoll-Isopaque density gradient centrifugation and cryopreserved in the Leiden University Medical Center (LUMC) Biobank for Hematological Diseases after approval by the LUMC Institutional Review Board (protocol no. B18.047).Upstream processing pipelineCellRanger v7.0.0 was run on all samples with the human reference genome hg38. For all QC Seurat v4 was used15. Our QC pipeline had three steps per sample: 1) soft filtering, 2) low quality cluster removal, and 3) doublet detection. In soft filtering, Seurat objects were created with cells expressing at least 200 genes and with the genes expressed at least in 3 cells. Then, standard Seurat command list with default parameters was run to detect low quality clusters. Clusters with >15% mitochondrial and 15% mitochondrial mRNA. We used standard Seurat commands to scale and normalize the data on integrated features. First 30 principal components were used to create UMAP plots. We used clustree to determine optimal cluster number, based on FindClusters with resolutions sweeping from 0 to 1.2. We chose res=0.5, as clusters became stable. Next, we merged two clusters (CC5 and CC12) into one GMP-like cluster as one of these clusters (CC12) had high expression of HSP-genes yet still retained its cell-type specific properties.Note: The file was processed with Seurat v4 but the object is updated for v5. Uploaded as .qs file format for faster reading. To read the file: qs:qread("path/to/data.qs")This data is available for research use only; and cannot be used for commercial purposes.For further queries please refer to our paper:
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The distal region of the uterine (Fallopian) tube is commonly associated with high-grade serous carcinoma (HGSC), the predominant and most aggressive form of ovarian or extra-uterine cancer. Specific cell states and lineage dynamics of the adult tubal epithelium (TE) remain insufficiently understood, hindering efforts to determine the cell of origin for HGSC. Here, we report a comprehensive census of cell types and states of the mouse uterine tube. We show that distal TE cells expressing the stem/progenitor cell marker Slc1a3 can differentiate into both secretory (Ovgp1+) and ciliated (Fam183b+) cells. Inactivation of Trp53 and Rb1, whose pathways are commonly altered in HGSC, leads to elimination of targeted Slc1a3+ cells by apoptosis, thereby preventing their malignant transformation. In contrast, pre-ciliated cells (Krt5+, Prom1+, Trp73+) remain cancer-prone and give rise to serous tubal intraepithelial carcinomas and overt HGSC. These findings identify transitional pre-ciliated cells as a previously unrecognized cancer-prone cell state and point to pre-ciliation mechanisms as novel diagnostic and therapeutic targets. Methods
Single-cell RNA-sequencing library preparation For TE single cell expression and transcriptome analysis we isolated TE from C57BL6 adult estrous female mice. In 3 independent experiments a total of 62 uterine tubes were collected. Each uterine tube was placed in sterile PBS containing 100 IU ml-1 of penicillin and 100 µg ml-1 streptomycin (Corning, 30-002-Cl), and separated in distal and proximal regions. Tissues from the same region were combined in a 40 µl drop of the same PBS solution, cut open lengthwise, and minced into 1.5-2.5 mm pieces with 25G needles. Minced tissues were transferred with help of a sterile wide bore 200 µl pipette tip into a 1.8 ml cryo vial containing 1.2 ml A-mTE-D1 (300 IU ml-1 collagenase IV mixed with 100 IU ml-1 hyaluronidase; Stem Cell Technologies, 07912, in DMEM Ham’s F12, Hyclone, SH30023.FS). Tissues were incubated with loose cap for 1 h at 37°C in a 5% CO2 incubator. During the incubation tubes were taken out 4 times and tissues suspended with a wide bore 200 µl pipette tip. At the end of incubation, the tissue-cell suspension from each tube was transferred into 1 ml TrypLE (Invitrogen, 12604013) pre-warmed to 37°C, suspended 70 times with a 1000 µl pipette tip, 5 ml A-SM [DMEM Ham’s F12 containing 2% fetal bovine serum (FBS)] were added to the mix, and TE cells were pelleted by centrifugation 300x g for 10 minutes at 25°C. Pellets were then suspended with 1 ml pre-warmed to 37°C A-mTE-D2 (7 mg ml-1 Dispase II, Worthington NPRO2, and 10 µg ml-1 Deoxyribonuclease I, Stem Cell Technologies, 07900), and mixed 70 times with a 1000 µl pipette tip. 5 ml A-mTE-D2 was added and samples were passed through a 40 µm cell strainer, and pelleted by centrifugation at 300x g for 7 minutes at +4°C. Pellets were suspended in 100 µl microbeads per 107 total cells or fewer, and dead cells were removed with the Dead Cell Removal Kit (Miltenyi Biotec, 130-090-101) according to the manufacturer’s protocol. Pelleted live cell fractions were collected in 1.5 ml low binding centrifuge tubes, kept on ice, and suspended in ice cold 50 µl A-Ri-Buffer (5% FBS, 1% GlutaMAX-I, Invitrogen, 35050-079, 9 µM Y-27632, Millipore, 688000, and 100 IU ml-1 penicillin 100 μg ml-1 streptomycin in DMEM Ham’s F12). Cell aliquots were stained with trypan blue for live and dead cell calculation. Live cell preparations with a target cell recovery of 5,000-6,000 were loaded on Chromium controller (10X Genomics, Single Cell 3’ v2 chemistry) to perform single cell partitioning and barcoding using the microfluidic platform device. After preparation of barcoded, next-generation sequencing cDNA libraries samples were sequenced on Illumina NextSeq500 System.
Download and alignment of single-cell RNA sequencing data For sequence alignment, a custom reference for mm39 was built using the cellranger (v6.1.2, 10x Genomics) mkref function. The mm39.fa soft-masked assembly sequence and the mm39.ncbiRefSeq.gtf (release 109) genome annotation last updated 2020-10-27 were used to form the custom reference. The raw sequencing reads were aligned to the custom reference and quantified using the cellranger count function.
Preprocessing and batch correction All preprocessing and data analysis was conducted in R (v.4.1.1 (2021-08-10)). The cellranger count outs were first modified with the autoEstCont and adjustCounts functions from SoupX (v.1.6.1) to output a corrected matrix with the ambient RNA signal (soup) removed (https://github.com/constantAmateur/SoupX). To preprocess the corrected matrices, the Seurat (v.4.1.1) NormalizeData, FindVariableFeatures, ScaleData, RunPCA, FindNeighbors, and RunUMAP functions were used to create a Seurat object for each sample (https://github.com/satijalab/seurat). The number of principal components used to construct a shared nearest-neighbor graph were chosen to account for 95% of the total variance. To detect possible doublets, we used the package DoubletFinder (v.2.0.3) with inputs specific to each Seurat object. DoubletFinder creates artificial doublets and calculates the proportion of artificial k nearest neighbors (pANN) for each cell from a merged dataset of the artificial and actual data. To maximize DoubletFinder’s predictive power, mean-variance normalized bimodality coefficient (BCMVN) was used to determine the optimal pK value for each dataset. To establish a threshold for pANN values to distinguish between singlets and doublets, the estimated multiplet rates for each sample were calculated by interpolating between the target cell recovery values according to the 10x Chromium user manual. Homotypic doublets were identified using unannotated Seurat clusters in each dataset with the modelHomotypic function. After doublets were identified, all distal and proximal samples were merged separately. Cells with greater than 30% mitochondrial genes, cells with fewer than 750 nCount RNA, and cells with fewer than 200 nFeature RNA were removed from the merged datasets. To correct for any batch defects between sample runs, we used the harmony (v.0.1.0) integration method (github.com/immunogenomics/harmony).
Clustering parameters and annotations After merging the datasets and batch-correction, the dimensions reflecting 95% of the total variance were input into Seurat’s FindNeighbors function with a k.param of 70. Louvain clustering was then conducted using Seurat’s FindClusters with a resolution of 0.7. The resulting 19 clusters were annotated based on the expression of canonical genes and the results of differential gene expression (Wilcoxon Rank Sum test) analysis. One cluster expressing lymphatic and epithelial markers was omitted from later analysis as it only contained 2 cells suspected to be doublets. To better understand the epithelial populations, we reclustered 6 epithelial populations and reapplied harmony batch correction. The clustering parameters from FindNeighbors was a k.param of 50, and a resolution of 0.7 was used for FindClusters. The resulting 9 clusters within the epithelial subset were further annotated using differential expression analysis and canonical markers.
Pseudotime analysis Potential of heat diffusion for affinity-based transition embedding (PHATE) is dimensional reduction method to more accurately visualize continual progressions found in biological data 35. A modified version of Seurat (v4.1.1) was developed to include the ‘RunPHATE’ function for converting a Seurat Object to a PHATE embedding. This was built on the phateR package (v.1.0.7) (https://github.com/scottgigante/seurat/tree/patch/add-PHATE-again). In addition to PHATE, pseudotime values were calculated with Monocle3 (v.1.2.7), which computes trajectories with an origin set by the user 36,55–57. The origin was set to be a progenitor cell state confirmed with lineage tracing experiments. 35. Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat Biotechnol 37, 1482–1492 (2019). doi:10.1038/s41587-019-0336-3 36. Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019). doi:10.1038/s41586-019-0969-x 55. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nature Biotechnology 32, 381–386 (2014). doi:10.1038/nbt.2859 56. Qiu, X. et al. Single-cell mRNA quantification and differential analysis with Census. Nature Methods 14, 309–315 (2017). doi:10.1038/nmeth.4150 57. Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat Methods 14, 979–982 (2017). doi:10.1038/nmeth.4402
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
seurat v3 object
ASSAYS: ADT: Antibody expression data RNA: mRNA expression data
DIMENSIONALITY REDUCTION projAML: Data was projected on the main AML dataset from Cohorts A and B. pca: pca computed on the 2000 most variable features umapSimple: umap computed from first 15 principal components
METADATA patient: Patient ct: Projected cell type (Triana et al., 2021) ct_simple: Simplified projected celltype batch: Cells come either from total bone marrow or CD34+ enrichment using either FACS or MACS seurat_clusters: unsupervised clusters obtained running the standard Seurat workflow proj_cluster: Projected cluster from main AML dataset leukemia_prob: CloneTracer leukemia probability status: Binarized status (healthy, leukemic, unsure) clonal_probability: CloneTracer clonal probability clone: Binarized clone dormancy_score: score assessing the dormancy of cells based on Zhang 2021 and Cabezas-Wallscheid 2017 gene lists.
Facebook
Twitterdata.tar.gz contains all files from the data directory (except for sam outputs from STAR) associated with the 230926_EJ_Setbp1_AlternativeSplicing GitHub project and includes the following files: ./marvel: - This directory contains rds and Rdata objects that were created using the MARVEL R package cell_type_goresults.rds - This is the go results split by cell type marvel_04_split_counts.Rdata - This R data includes all environment objects from MARVEL script 04, and is used for downstream plotting normalized_sj_expression.Rds - This object is the normalized splice junction expression Setbp1_marvel_aligned.rds - Final prepared MARVEL object before any SJU analyses have been run significant_tables.RData - For those who do not want to load multiple massive files, this includes all significant SJU results for each cell type sj_usage_cell_type.rds - This data object has splice junction usage calculated for each cell type sj_usage_condition.rds - This data object has splice junction usage calculated for each cell type and also split by condition ./seurat: - This directory contains all intermediate and final Seurat single-cell gene expression objects annotated_brain_samples.rds - This is the final iteration of the processing in Seurat for a final annotated object. Please use this object for any Seurat or single-cell gene expression analyses. clustered_brain_samples.rds - This is the clustered Seurat object, before cell type annotation based on canonical markers. filtered_brain_samples_pca.rds - This is the filtered Seurat object, before clustering but after PCA. filtered_brain_samples.rds - This is the filtered Seurat object, before PCA. integrated_brain_samples.rds - This the integrated Seurat object, before other steps. ./star: - All files in the STAR directory are outputs from STARsolo, as described in our methods. Each output directory contains the same files, so only one example is included here for brevity. Intermediate SAM files were removed to optimize space. J1/ - This directory contains outputs for brain sample J1 J13/ - This directory contains outputs for brain sample J13 J15/ - This directory contains outputs for brain sample J15 J2/ - This directory contains outputs for brain sample J2 J3/ - This directory contains outputs for brain sample J3 J4/ - This directory contains outputs for brain sample J4 K1/ - This directory contains outputs for kidney sample K1 K2/ - This directory contains outputs for kidney sample K2 K3/ - This directory contains outputs for kidney sample K3 K4/ - This directory contains outputs for kidney sample K4 K5/ - This directory contains outputs for kidney sample K5 K6/ - This directory contains outputs for kidney sample K6 ./star/genome: - This directory contains outputs from running STAR genomeGenerate. Detailed file descriptions available from https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf chrLength.txt chrNameLength.txt chrName.txt chrStart.txt exonGeTrInfo.tab exonInfo.tab geneInfo.tab Genome genomeParameters.txt Log.out SA SAindex sjdbInfo.txt sjdbList.fromGTF.out.tab sjdbList.out.tab transcriptInfo.tab ./star/J1: - This is the head STAR directory for sample J1. It contains logs, basic QC, and gene and splice junction counts. For more information about the STAR pipeline and its outputs, please refer to the STAR documentation https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf Log.final.out Log.out Log.progress.out SJ.out.tab Solo.out/ STARgenome/ ./star/J1/Solo.out:- This directory contains the outputs used for downstream analysis Barcodes.stats GeneFull_Ex50pAS/ SJ/ ./star/J1/Solo.out/GeneFull_Ex50pAS: - This directory contains the filtered and raw barcodes, features, and matrix files for gene expression (including introns) Features.stats filtered/ raw/ Summary.csv UMIperCellSorted.txt ./star/J1/Solo.out/GeneFull_Ex50pAS/filtered: - This directory contains the filtered tsv and mtx gene expression files required for creating a Seurat object (or other single cell packages) barcodes.tsv.gz - This file contains filtered cell barcodes features.tsv.gz - This file contains filtered features (genes) matrix.mtx.gz - This file contains the filtered cell by gene expression count matrix ./star/J1/Solo.out/GeneFull_Ex50pAS/raw: - This directory contains the unfiltered tsv and mtx gene expression files required for creating a Seurat object (or other single cell packages). Files are the same as previously described for filtered. barcodes.tsv features.tsv matrix.mtx ./star/J1/Solo.out/SJ: - This directory contains the QC and raw barcodes, features, and matrix files for splice junction expression Features.stats raw/ Summary.csv ./star/J1/Solo.out/SJ/raw: - This directory contains the raw barcodes, features, and matrix files for splice junction expression barcodes.tsv - This file contains filtered cell barcodes features.tsv - This file contains filtered features (splice junctions) m...
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains processed single-cell RNA-sequencing (scRNA-seq) data from the PBMC 3K experiment.
It includes quality-control (QC) visualizations, cell-level metrics, clustering outputs, and exploratory analysis plots.
The dataset is designed to guide beginners and intermediate users through the essential steps of scRNA-seq preprocessing and analysis.
The PBMC 3K dataset represents human peripheral blood mononuclear cells sequenced using the 10x Genomics platform.
Included QC metrics help identify low-quality cells, doublets, stressed cells, and outliers based on standard thresholds.
The dataset covers filtering based on mitochondrial gene percentage, total UMIs, and number of detected genes.
All plots follow widely accepted scRNA-seq workflows commonly used in tools like Seurat, Scanpy, and SingleCellExperiment.
The QC violin plots illustrate distributions of nFeature_RNA, nCount_RNA, percent.mt, and other metrics used to assess cell quality.
The data also highlights the effect of filtering on overall dataset structure and variability.
Clustering-related files provide a visual understanding of how cells segregate into biologically meaningful groups.
Dimensionality-reduction plots also show patterns such as immune-cell diversity present in PBMC populations.
This dataset is suitable for hands-on learning, tutorial creation, classroom instruction, or benchmarking workflows.
It serves as a ready reference for researchers who wish to practice QC interpretation and cluster inspection.
The dataset allows quick reproduction of PBMC 3K quality-control visualizations without running the entire analysis pipeline.
It provides an accessible introduction to scRNA-seq analysis concepts for students, data scientists, and bioinformaticians.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary: Dendritic cells (DCs) orchestrate innate and adaptive immunity, by translating the sensing of distinct danger signals into the induction of different effector lymphocyte responses, to induce different defense mechanisms suited to face distinct types of threats. Hence, DCs are very plastic, which results from two key characteristics. First, DCs encompass distinct cell types specialized in different functions. Second, each DC type can undergo different activation states, fine-tuning its functions depending on its tissue microenvironment and the pathophysiological context, by adapting the output signals it delivers to the input signals it receives. Hence, to better understand DC biology and harness it in the clinic, we must determine which combinations of DC types and activation states mediate which functions, and how.
To decipher the nature, functions and regulation of DC types and their physiological activation states, one of the methods that can be harnessed most successfully is ex vivo single cell RNA sequencing (scRNAseq). However, for new users of this approach, determining which analytics strategy and computational tools to choose can be quite challenging, considering the rapid evolution and broad burgeoning of the field. In addition, awareness must be raised on the need for specific, robust and tractable strategies to annotate cells for cell type identity and activation states. It is also important to emphasize the necessity of examining whether similar cell activation trajectories are inferred by using different, complementary methods. In this chapter, we take these issues into account for providing a pipeline for scRNAseq analysis and illustrating it with a tutorial reanalyzing a public dataset of mononuclear phagocytes isolated from the lungs of naïve or tumor-bearing mice. We describe this pipeline step-by-step, including data quality controls, dimensionality reduction, cell clustering, cell cluster annotation, inference of the cell activation trajectories and investigation of the underpinning molecular regulation. It is accompanied with a more complete tutorial on Github. We anticipate that this method will be helpful for both wet lab and bioinformatics researchers interested in harnessing scRNAseq data for deciphering the biology of DCs or other cell types, and that it will contribute to establishing high standards in the field.
Data:
MDAlab_cDC1_maturation.tar : Docker image used for the analysis
Facebook
TwitterThe adult Hydra polyp continually renews all of its cells using three separate stem cell populations, but the genetic pathways enabling this homeostatic tissue maintenance are not well understood. We sequenced 24,985 Hydra single-cell transcriptomes and identified the molecular signatures of a broad spectrum of cell states, from stem cells to terminally differentiated cells. We constructed differentiation trajectories for each cell lineage and identified gene modules and putative regulators expressed along these trajectories, thus creating a comprehensive molecular map of all developmental lineages in the adult animal. In addition, we built a gene expression map of the Hydra nervous system. Our work constitutes a resource for addressing questions regarding the evolution of metazoan developmental processes and nervous system function.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
seurat v3 object
ASSAYS: AB: Antibody expression data RNA: mRNA expression data
DIMENSIONALITY REDUCTION Projected: Data was projected on the reference dataset MOFAUMAP coordinates (Triana et al., 2021) scanorama: Data was integrated with Scanorama, using the cohort as Batch key umap: umap computed from Scanorama components
METADATA patient: Patient cohort: Cohort day: Day of sampling (relevant for Cohort B only) ct: Projected cell type (Triana et al., 2021) ct_simple: Simplified projected celltype pseudo_myel: Projected myeloid pseudotime (Triana et al., 2021) projection_score: Score assessing the quality of projection (Triana et al 2021) id: Unsupervised clustering result leukemia_prob: CloneTracer leukemia probability status: Binarized status (healthy, leukemic, unsure)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The attached R Scripts supplement our protocol paper currently under editorial review at the Journal of Visualized Experiments.Scope of the article:This protocol describes the general processes and quality control checks necessary for preparing healthy adult single cells in preparation for droplet-based, high-throughput single cell RNA-Seq analysis using the 10X Genomics' Chromium System. We also describe sequencing parameters, alignment and downstream single-cell bioinformatic analysis.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Serialized R data files (.rds) associated with the inDrop single-cell RNA-seq analysis in Huang et al., 2019. Each file has a single Seurat object containing a subset of clusters from the full processed dataset, which were separated into different objects due to file size limitations. Raw data (UMIFM counts) are included in the corresponding slot in each Seurat object. Seurat objects can be re-merged into a single object containing the full dataset using the MergeSeurat function.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Clinical interventions and inflammatory signaling shape the transcriptional and cellular architecture of the early postnatal lung
These are fully processed, integrated and annotated datasets from:
23 histologically normal early postnatal (0 to 2 year) distal lung specimens: (FullEarlyPostnatalAtlas.RDS).
4 distal lung specimens from patients diagnosed with "Evolving" or "Organized" Bronchopulmonary Dysplasia: (FullBronchopulmonaryDysplasiaAtlas.RDS).
5 distal lung specimens from patients diagnosed with Pulmonary Interstitial Glycogenosis: (FullPulmInstGlycogenosisAtlas.RDS).
3 distal lung specimens from 16, 21 and 23 weeks post-conception: (16_21_23weekAdditionalSamples.RDS).
Code used in analysis of this data is available at: http://github.com/jason-spence-lab/Frum-et-al.-2025a.git
METHODS
Submission of samples for single nucleus RNA-sequencing (snRNA-seq)
Samples were stored in LN2 until preparation for snRNA-seq. Small pieces from each sample were shaved of a single region specimen and then minced into small rice grain sized fragments using a No.1 Scapel in a 6 cm dish on dry ice. Samples were transferred to 1.5 mL tubes, dissociated by pestle and then nuclei were purified using the 10X Chromium Nuclei Isolation Kit (10X Genomics, Cat#1000493) following the manufacturer’s recommendations, counted using a Countess Automated Cell Counter (v2) and resuspended at 1000 cells/µl in PBS + 1% BSA (Mitenyl, Cat#). The University of Michigan Advanced Genomics Sequencing Core (prepared libraries using the Chromium Next GEM Single Cell 3’ GEM, Library and Gel Bead Kit v3.1 (10X Genomics, Cat#PN1000128) targeting 7500 nuclei per specimen. snRNA-sequencing libraries were sequenced to a projected average read depth of 80,000 reads per nuclei using a NovaSeq 6000 with S4 300 cycle reagents.
Computational Analysis of snRNA-seq data
Ambient RNA Correction
Reads were mapped to human genome (GRCh38-2020-A) and gene expression matrices generated using CellRanger v7. Raw matrices were used as input for CellBender v0.30, which re-called nuclei and corrected for ambient RNA at a false-positive rate of 0.01. CellBender corrected gene expression matrices were imported into Seurat v5 in RStudio v1.4 running R v4.1.
Preprocessing/QC Filtering
Only nuclei within the following thresholds were considered for further analysis: between 500 to 7500 features, more than 1000 unique molecules sequenced, less than 5% mitochondrial RNA reads and less than 7.5% ribosomal reads.
Data Integration, Dimensional Reduction, Clustering
For each specimen data was normalized using Seurat::NormalizeData() and then all specimens were integrated using a mutual nearest neighbors batch correction implemented by SeuratWrappers::RunFastMNN() in SeuratWrappers v0.3.5 using 2000 features. The first 30 dimensions of the mutual neighbors reduction was used to generate a Uniform Manifold Approximation and Projection (UMAP) by Seurat::RunUMAP(). Nearest-neighbor graph construction was performed using Seurat::FindNeighbors(). Louvain clustering was performed at a resolution of 1.0 using Seurat::FindClusters().
Additional QC
Clusters were inspected for mutually exclusive expression of major cell class markers (Epithelial: CDH1, EPCAM;Endothelial: PECAM1, Immune: PTPRC, Mesenchymal: PDGFRA, PDGFRB). Clusters of cells at the center of the UMAP coexpressing markers of multiple major cell classes were removed. We speculate these data points are doublets or ambient RNA. After removal, Data Integration, Dimensional Reduction and Clustering were performed again.
Cell Type Annotation
Clusters were first annotated as Epithelium, Mesenchyme, Immune or Endothelium based on unique expression of the major cell class markers identified above (Fig. 1B). Each of these annotations was subclustered, using the following top dimensions of the integrated mutual nearest neighbors reduction calculated by Seurat::RunFastMNN() on the complete dataset as input for Seurat::RunUMAP() and Seurat::FindNeighbors(): Epithelium: 20, Mesenchyme: 25, Immune: 16, Endothelium: 15. Louvain clustering was performed using Seurat::FindClusters() at the following resolutions: Epithelium: 0.8, Mesenchyme: 0.4, Immune: 0.3, Endothelium: 0.15. At this point clusters were annotated to minor cell classes based on known markers (i.e. Airway vs Alveolar, Lymphoid vs. Myeloid, etc) (Fig. 1C, D). Some of these minor cell classes were further subclustered to achieve a cell type level annotation (Vessels, Lymphoid, Myeloid), while all others were annotated on the cluster structure evident at the first round of subclustering. Cell type annotations (Fig. 1E) were consistent with known markers (Fig. 1H) of cell type identity.
Full Preprint: https://doi.org/10.1101/2025.10.17.683116
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The major interest domains of single-cell RNA sequential analysis are identification of existing and novel types of cells, depiction of cells, cell fate prediction, classification of several types of tumor, and investigation of heterogeneity in different cells. Single-cell clustering plays an important role to solve the aforementioned questions of interest. Cluster identification in high dimensional single-cell sequencing data faces some challenges due to its nature. Dimensionality reduction models can solve the problem. Here, we introduce a potential cluster specified frequent biomarkers discovery framework using dimensionality reduction and hierarchical agglomerative clustering Louvain for single-cell RNA sequencing data analysis. First, we pre-filtered the features with fewer number of cells and the cells with fewer number of features. Then we created a Seurat object to store data and analysis together and used quality control metrics to discard low quality or dying cells. Afterwards we applied global-scaling normalization method “LogNormalize” for data normalization. Next, we computed cell-to-cell highly variable features from our dataset. Then, we applied a linear transformation and linear dimensionality reduction technique, Principal Component Analysis (PCA) to project high dimensional data to an optimal low-dimensional space. After identifying fifty “significant”principal components (PCs) based on strong enrichment of low p-value features, we implemented a graph-based clustering algorithm Louvain for the cell clustering of 10 top significant PCs. We applied our model to a single-cell RNA sequential dataset for a rare intestinal cell type in mice (NCBI accession ID:GSE62270, 23,630 features and 1872 samples (cells)). We obtained 10 cell clusters with a maximum modularity of 0.885 1. After detecting the cell clusters, we found 3871 cluster-specific biomarkers using an expression feature extraction statistical tool for single-cell sequencing data, Model-based Analysis of Single-cell Transcriptomics (MAST) with a log 2FC threshold of 0.25 and a minimum feature detection of 25%. From these cluster-specific biomarkers, we found 1892 most frequent markers, i.e., overlapping biomarkers. We performed degree hub gene network analysis using Cytoscape and reported the five highest degree genes (Rps4x, Rps18, Rpl13a, Rps12 and Rpl18a). Subsequently, we performed KEGG pathway and Gene Ontology enrichment analysis of cluster markers using David 6.8 software tool. In summary, our proposed framework that integrated dimensionality reduction and agglomerative hierarchical clustering provides a robust approach to efficiently discover cluster-specific frequent biomarkers, i.e., overlapping biomarkers from single-cell RNA sequencing data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Test Data for Galaxy Tutorial "Clustering 3k PBMCs with Seurat"