100+ datasets found

n
Data from: Large-scale integration of single-cell transcriptomic data...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+2more
zip
Updated Dec 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.t4b8gtj34
Dataset updated
Dec 14, 2021
Dataset provided by
Cornell University
Authors
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using
scRNA-seq + scATAC-seq Challenge at NeurIPS 2021
kaggle.com
zip
Updated Sep 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Chervov (2022). scRNA-seq + scATAC-seq Challenge at NeurIPS 2021 [Dataset]. https://www.kaggle.com/datasets/alexandervc/scrnaseq-scatacseq-challenge-at-neurips-2021
Explore at:
zip(2917180928 bytes)Available download formats
Dataset updated
Sep 16, 2022
Authors
Alexander Chervov
Description
Context

Dataset from NeurIPS2021 challenge similar to Kaggle 2022 competition: https://www.kaggle.com/competitions/open-problems-multimodal "Open Problems - Multimodal Single-Cell Integration Predict how DNA, RNA & protein measurements co-vary in single cells"

It is https://en.wikipedia.org/wiki/ATAC-seq#Single-cell_ATAC-seq single cell ATAC-seq data. And single cell RNA-seq data: https://en.wikipedia.org/wiki/Single-cell_transcriptomics#Single-cell_RNA-seq

Single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package

(For companion dataset on CITE-seq = scRNA-seq + Proteomics, see: https://www.kaggle.com/datasets/alexandervc/citeseqscrnaseqproteins-challenge-neurips2021)

Particular data

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122

Expression profiling by high throughput sequencing Genome binding/occupancy profiling by high throughput sequencing Summary Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors. Half the samples were measured using the 10X Multiome Gene Expression and Chromatin Accessability kit and half were measured using the 10X 3' Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site. In the competition, participants were tasked with challenges including modality prediction, matching profiles from different modalities, and learning a joint embedding from multiple modalities.

Overall design Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors.

Contributor(s) Burkhardt DB, Lücken MD, Lance C, Cannoodt R, Pisco AO, Krishnaswamy S, Theis FJ, Bloom JM Citation https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/158f3069a435b314a80bdcb024f8e422-Abstract-round2.html

Related datasets:

Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"

Inspiration

Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x

Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles

(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833
f
Table1_Influence of single-cell RNA sequencing data integration on the...
frontiersin.figshare.com
docx
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomasz Kujawa; Michał Marczyk; Joanna Polanska (2023). Table1_Influence of single-cell RNA sequencing data integration on the performance of differential gene expression analysis.docx [Dataset]. http://doi.org/10.3389/fgene.2022.1009316.s002
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2022.1009316.s002
Dataset updated
Jun 13, 2023
Dataset provided by
Frontiers
Authors
Tomasz Kujawa; Michał Marczyk; Joanna Polanska
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Large-scale comprehensive single-cell experiments are often resource-intensive and require the involvement of many laboratories and/or taking measurements at various times. This inevitably leads to batch effects, and systematic variations in the data that might occur due to different technology platforms, reagent lots, or handling personnel. Such technical differences confound biological variations of interest and need to be corrected during the data integration process. Data integration is a challenging task due to the overlapping of biological and technical factors, which makes it difficult to distinguish their individual contribution to the overall observed effect. Moreover, the choice of integration method may impact the downstream analyses, including searching for differentially expressed genes. From the existing data integration methods, we selected only those that return the full expression matrix. We evaluated six methods in terms of their influence on the performance of differential gene expression analysis in two single-cell datasets with the same biological study design that differ only in the way the measurement was done: one dataset manifests strong batch effects due to the measurements of each sample at a different time. Integrated data were visualized using the UMAP method. The evaluation was done both on individual gene level using parametric and non-parametric approaches for finding differentially expressed genes and on gene set level using gene set enrichment analysis. As an evaluation metric, we used two correlation coefficients, Pearson and Spearman, of the obtained test statistics between reference, test, and corrected studies. Visual comparison of UMAP plots highlighted ComBat-seq, limma, and MNN, which reduced batch effects and preserved differences between biological conditions. Most of the tested methods changed the data distribution after integration, which negatively impacts the use of parametric methods for the analysis. Two algorithms, MNN and Scanorama, gave very poor results in terms of differential analysis on gene and gene set levels. Finally, we highlight ComBat-seq as it led to the highest correlation of test statistics between reference and corrected dataset among others. Moreover, it does not distort the original distribution of gene expression data, so it can be used in all types of downstream analyses.
single cell data integration code and dataset
figshare.com
txt
Updated Dec 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
luye an (2022). single cell data integration code and dataset [Dataset]. http://doi.org/10.6084/m9.figshare.21498168.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21498168.v1
Dataset updated
Dec 23, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
luye an
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This folder contains the processed single cell dataset in RData format, as well as the pipeline to analyze immune cell populations (R code.)
CITE-seq=scRNA-seq+Proteins: Challenge NeurIPS2021
kaggle.com
zip
Updated Jan 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Chervov (2023). CITE-seq=scRNA-seq+Proteins: Challenge NeurIPS2021 [Dataset]. https://www.kaggle.com/datasets/alexandervc/citeseqscrnaseqproteins-challenge-neurips2021
Explore at:
zip(646191284 bytes)Available download formats
Dataset updated
Jan 22, 2023
Authors
Alexander Chervov
Description
Context

Dataset from NeurIPS2021 challenge similar to Kaggle 2022 competition: https://www.kaggle.com/competitions/open-problems-multimodal "Open Problems - Multimodal Single-Cell Integration Predict how DNA, RNA & protein measurements co-vary in single cells"

CITE-seq - joint single cell RNA sequencing + single cell measurements of CD** proteins. (https://en.wikipedia.org/wiki/CITE-Seq) (For companion dataset on scRNA-seq + scATAC-seq, see: https://www.kaggle.com/datasets/alexandervc/scrnaseq-scatacseq-challenge-at-neurips-2021 )

Single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package

Particular data

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122

Expression profiling by high throughput sequencing Genome binding/occupancy profiling by high throughput sequencing Summary Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors. Half the samples were measured using the 10X Multiome Gene Expression and Chromatin Accessability kit and half were measured using the 10X 3' Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site. In the competition, participants were tasked with challenges including modality prediction, matching profiles from different modalities, and learning a joint embedding from multiple modalities.

Overall design Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors.

Contributor(s) Burkhardt DB, Lücken MD, Lance C, Cannoodt R, Pisco AO, Krishnaswamy S, Theis FJ, Bloom JM Citation https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/158f3069a435b314a80bdcb024f8e422-Abstract-round2.html

Related datasets:

Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"

Inspiration

Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x

Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles

(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833
CITE-seq = scRNA-seq + Proteins: Human PBMCs 2019
kaggle.com
zip
Updated Sep 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Chervov (2022). CITE-seq = scRNA-seq + Proteins: Human PBMCs 2019 [Dataset]. https://www.kaggle.com/datasets/alexandervc/citeseq-scrnaseq-proteins-human-pbmcs-2019
Explore at:
zip(44334628 bytes)Available download formats
Dataset updated
Sep 11, 2022
Authors
Alexander Chervov
Description
Data and Context

Data - results of single cell RNA sequencing and CD** proteins measurements - so-called CITE-seq technology, which combines single cell RNA sequencing with protein measurements.

Kaggle competition https://www.kaggle.com/competitions/open-problems-multimodal/overview uses similar data in one the subtasks.

Particular data: Paper: Stuart T, Butler A, Hoffman P, Hafemeister C et al. Comprehensive Integration of Single-Cell Data. Cell 2019 Jun 13;177(7):1888-1902.e21. PMID: 31178118 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6687398/

Data: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE128639

Related datasets:

Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"

Inspiration

Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x

Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles

(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833

Challenges in unsupervised clustering of single-cell RNA-seq data https://www.nature.com/articles/s41576-018-0088-9 Review Article 07 January 2019 Vladimir Yu Kiselev, Tallulah S. Andrews & Martin Hemberg Nature Reviews Genetics volume 20, pages273–282 (2019)

Challenges and emerging directions in single-cell analysis https://link.springer.com/article/10.1186/s13059-017-1218-y Published: 08 May 2017 Guo-Cheng Yuan, Long Cai, Michael Elowitz, Tariq Enver, Guoping Fan, Guoji Guo, Rafael Irizarry, Peter Kharchenko, Junhyong Kim, Stuart Orkin, John Quackenbush, Assieh Saadatpour, Timm Schroeder, Ramesh Shivdasani & Itay Tirosh Genome Biology volume 18, Article number: 84 (2017)

Single-Cell RNA Sequencing in Cancer: Lessons Learned and Emerging Challenges https://www.sciencedirect.com/science/article/pii/S1097276519303569 Molecular Cell Volume 75, Issue 1, 11 July 2019, Pages 7-12 Journal home page for Molecular Cell
Z
Data Repository: Single-cell mapper (scMappR): using scRNA-seq to infer...
data.niaid.nih.gov
Updated Feb 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dustin Sokolowski; Mariela Faykoo-Martinez; Lauren Erdman; Huayun Hou; Cadia Chan; Helen Zhu; Melissa M. Holmes; Anna Goldenberg; Michael D Wilson (2021). Data Repository: Single-cell mapper (scMappR): using scRNA-seq to infer cell-type specificities of differentially expressed genes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4278129
Explore at:
Dataset updated
Feb 12, 2021
Dataset provided by
Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada; Princess Margaret Cancer Center, University Health Network, Toronto, ON, M5G 2C1, Canada
Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, M5G 0A4, Canada; Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada; Vector Institute for Artificial Intelligence, MaRS Centre, Toronto, ON, M5G 1M1; CIFAR, MaRS Centre, Toronto, ON, M5G 1M1
Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada; Department of Psychology, University of Toronto Mississauga, Mississauga, ON, L5L 1C6
Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, M5G 0A4, Canada; Department of Cell and Systems Biology, University of Toronto, Toronto
Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, M5G 0A4, Canada; Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada
Department of Molecular Genetics, 2Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, M5G 0A4, CanadaUniversity of Toronto, Toronto, ON, M5S 1A8, Canada,
Authors
Dustin Sokolowski; Mariela Faykoo-Martinez; Lauren Erdman; Huayun Hou; Cadia Chan; Helen Zhu; Melissa M. Holmes; Anna Goldenberg; Michael D Wilson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data repository for the scMappR manuscript:

Abstract from biorXiv (https://www.biorxiv.org/content/10.1101/2020.08.24.265298v1.full).

RNA sequencing (RNA-seq) is widely used to identify differentially expressed genes (DEGs) and reveal biological mechanisms underlying complex biological processes. RNA-seq is often performed on heterogeneous samples and the resulting DEGs do not necessarily indicate the cell types where the differential expression occurred. While single-cell RNA-seq (scRNA-seq) methods solve this problem, technical and cost constraints currently limit its widespread use. Here we present single cell Mapper (scMappR), a method that assigns cell-type specificity scores to DEGs obtained from bulk RNA-seq by integrating cell-type expression data generated by scRNA-seq and existing deconvolution methods. After benchmarking scMappR using RNA-seq data obtained from sorted blood cells, we asked if scMappR could reveal known cell-type specific changes that occur during kidney regeneration. We found that scMappR appropriately assigned DEGs to cell-types involved in kidney regeneration, including a relatively small proportion of immune cells. While scMappR can work with any user supplied scRNA-seq data, we curated scRNA-seq expression matrices for ∼100 human and mouse tissues to facilitate its use with bulk RNA-seq data alone. Overall, scMappR is a user-friendly R package that complements traditional differential expression analysis available at CRAN.
m
Data from: CSS: cluster similarity spectrum integration of single-cell...
data.mendeley.com
Updated Aug 15, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhisong He (2020). CSS: cluster similarity spectrum integration of single-cell genomics data [Dataset]. http://doi.org/10.17632/3kthhpw2pd.2
Explore at:
Unique identifier
https://doi.org/10.17632/3kthhpw2pd.2
Dataset updated
Aug 15, 2020
Authors
Zhisong He
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
It is a major challenge to integrate single-cell sequencing data across experiments, conditions, batches, timepoints and other technical considerations. New computational methods are required that can integrate samples while simultaneously preserving biological information. Here, we propose an unsupervised reference-free data representation, Cluster Similarity Spectrum (CSS), where each cell is represented by its similarities to clusters independently identified across samples. We show that CSS can be used to assess cellular heterogeneity and enable reconstruction of differentiation trajectories from cerebral organoid and other single-cell transcriptomic data, and to integrate data across experimental conditions and human individuals.

The presented data set here includes 1) the seurat object of the published two-month-old human cerebral organoid scRNA-seq data (Kanton et al. 2019 Nature); 2) the single-cell RNA-seq data of cerebral organoid generated by inDrop; 3) the newly generated single-cell RNA-seq data of cerebral organoids with and without fixation conditions.
q
Single Cell Insights Into Cancer Transcriptomes: A Five-Part Single-Cell...
qubeshub.org
Updated Nov 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leigh Samsa*; Melissa Eslinger; Adam Kleinschmit; Amanda Solem; Carlos Goller* (2021). Single Cell Insights Into Cancer Transcriptomes: A Five-Part Single-Cell RNAseq Case Study Lesson [Dataset]. http://doi.org/10.24918/cs.2021.26
Explore at:
Unique identifier
https://doi.org/10.24918/cs.2021.26
Dataset updated
Nov 16, 2021
Dataset provided by
QUBES
Authors
Leigh Samsa*; Melissa Eslinger; Adam Kleinschmit; Amanda Solem; Carlos Goller*
Description
There is a growing need for integration of “Big Data” into undergraduate biology curricula. Transcriptomics is one venue to examine biology from an informatics perspective. RNA sequencing has largely replaced the use of microarrays for whole genome gene expression studies. Recently, single cell RNA sequencing (scRNAseq) has unmasked population heterogeneity, offering unprecedented views into the inner workings of individual cells. scRNAseq is transforming our understanding of development, cellular identity, cell function, and disease. As a ‘Big Data,’ scRNAseq can be intimidating for students to conceptualize and analyze, yet it plays an increasingly important role in modern biology. To address these challenges, we created an engaging case study that guides students through an exploration of scRNAseq technologies. Students work in groups to explore external resources, manipulate authentic data and experience how single cell RNA transcriptomics can be used for personalized cancer treatment. This five-part case study is intended for upper-level life science majors and graduate students in genetics, bioinformatics, molecular biology, cell biology, biochemistry, biology, and medical genomics courses. The case modules can be completed sequentially, or individual parts can be separately adapted. The first module can also be used as a stand-alone exercise in an introductory biology course. Students need an intermediate mastery of Microsoft Excel but do not need programming skills. Assessment includes both students’ self-assessment of their learning as answers to previous questions are used to progress through the case study and instructor assessment of final answers. This case provides a practical exercise in the use of high-throughput data analysis to explore the molecular basis of cancer at the level of single cells.
Additional file 1 of scRNASequest: an ecosystem of scRNA-seq analysis,...
figshare.com
datasetcatalog.nlm.nih.gov
+1more
xlsx
Updated Feb 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kejie Li; Yu H. Sun; Zhengyu Ouyang; Soumya Negi; Zhen Gao; Jing Zhu; Wanli Wang; Yirui Chen; Sarbottam Piya; Wenxing Hu; Maria I. Zavodszky; Hima Yalamanchili; Shaolong Cao; Andrew Gehrke; Mark Sheehan; Dann Huh; Fergal Casey; Xinmin Zhang; Baohong Zhang (2024). Additional file 1 of scRNASequest: an ecosystem of scRNA-seq analysis, visualization, and publishing [Dataset]. http://doi.org/10.6084/m9.figshare.22735488.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22735488.v1
Dataset updated
Feb 13, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Kejie Li; Yu H. Sun; Zhengyu Ouyang; Soumya Negi; Zhen Gao; Jing Zhu; Wanli Wang; Yirui Chen; Sarbottam Piya; Wenxing Hu; Maria I. Zavodszky; Hima Yalamanchili; Shaolong Cao; Andrew Gehrke; Mark Sheehan; Dann Huh; Fergal Casey; Xinmin Zhang; Baohong Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 1: Supplementary Table S1. Detailed comparison of multiple single-cell RNA-seq data processing workflows.
Benchmarking deep learning methods for biologically conserved single-cell...
zenodo.org
zip
Updated Oct 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chenxin Yi; Chenxin Yi (2025). Benchmarking deep learning methods for biologically conserved single-cell integration. [Dataset]. http://doi.org/10.5281/zenodo.14633468
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14633468
Dataset updated
Oct 30, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Chenxin Yi; Chenxin Yi
License
https://www.gnu.org/licenses/agpl.txthttps://www.gnu.org/licenses/agpl.txt
Description
scIB-E is a comprehensive deep learning-based benchmarking framework for evaluating single-cell RNA sequencing (scRNA-seq) data integration methods.

Unified Benchmarking Framework:

Evaluates 16 deep-learning single-cell integration methods using a unified variational autoencoder (VAE) framework.

Incorporates batch information, cell-type labels, and combined strategies across three integration levels.

Refined Metrics for Intra-cell-type Variation:

Extends the single-cell integration benchmarking (scIB) metrics by adding new metrics to better capture intra-cell-type biological conservation.

Novel Loss Function:

Introduces Corr-MSE Loss, a correlation-based loss function designed to preserve global cellular relationships and enhance intra-cell-type biological variation.

The preprocessed datasets are available at src/data.
f
primary mouse RT single cell RNA-seq
datasetcatalog.nlm.nih.gov
figshare.com
Updated Apr 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Han, Zhi-Yan; Lobon-Iglesias, Maria-Jesus; Bourdeaut, Franck; Servant, Nicolas; ANDRIANTERANAGNA, Mamy (2023). primary mouse RT single cell RNA-seq [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001044525
Explore at:
Dataset updated
Apr 27, 2023
Authors
Han, Zhi-Yan; Lobon-Iglesias, Maria-Jesus; Bourdeaut, Franck; Servant, Nicolas; ANDRIANTERANAGNA, Mamy
Description
Seurat object stored in RDS file format.Three fresh primary mouse RT tumor samples were prepared and loaded in 10x Chromium instrument (10x Genomics). Libraries were prepared using a Single Cell 3’ Reagent Kit (V2 chemistry, 10X Genomics) and sequenced on an Illumina HiSeq2500 using paired-end 26x98 bp as sequencing mode, targeting at least 50 000 reads par cell. Mapping and UMI counting per gene were performed using cellranger tool (version 3.1.0) and the hg19 reference genome version.Cells with both a low number of genes and a high proportion of mitochondrial RNA were discarded. The threshold of the minimum number of detected genes was set as the 5th percentile of the distribution of the number of detected genes in all cells while the maximum proportion of mitochondrial genes were set by visual inspection of the plot of the number of detected genes versus the percentage of mitochondrial gene of each sample.scRNA-seq data integration was performed using the CCA-based implemented in Seurat version 3. The clustering was conducted using the graph-based modularity optimization Louvain algorithm implemented in Seurat v3. The resolution 0.4 (integrated_snn_res.0.4) was choosen for the final result.
o
Data from: Integration of spatial and single-cell transcriptomic data...
idr-testing.openmicroscopy.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis [Dataset]. https://idr-testing.openmicroscopy.org/study/idr0138/
Explore at:
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
seqFISH study of sagittal sections of mouse embryos at 8-10 somite stage. An additional round of hybridisation to capture cell membrane is performed to accurately segment cell boundaries.
Additional file 3 of scRNASequest: an ecosystem of scRNA-seq analysis,...
springernature.figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated Feb 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kejie Li; Yu H. Sun; Zhengyu Ouyang; Soumya Negi; Zhen Gao; Jing Zhu; Wanli Wang; Yirui Chen; Sarbottam Piya; Wenxing Hu; Maria I. Zavodszky; Hima Yalamanchili; Shaolong Cao; Andrew Gehrke; Mark Sheehan; Dann Huh; Fergal Casey; Xinmin Zhang; Baohong Zhang (2024). Additional file 3 of scRNASequest: an ecosystem of scRNA-seq analysis, visualization, and publishing [Dataset]. http://doi.org/10.6084/m9.figshare.22735494.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22735494.v1
Dataset updated
Feb 13, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Kejie Li; Yu H. Sun; Zhengyu Ouyang; Soumya Negi; Zhen Gao; Jing Zhu; Wanli Wang; Yirui Chen; Sarbottam Piya; Wenxing Hu; Maria I. Zavodszky; Hima Yalamanchili; Shaolong Cao; Andrew Gehrke; Mark Sheehan; Dann Huh; Fergal Casey; Xinmin Zhang; Baohong Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 3: Supplementary Table S3. Detailed comparison of multiple single-cell RNA-seq data visualization software.
f
primary ATRT single cell RNA-seq
datasetcatalog.nlm.nih.gov
figshare.com
Updated Apr 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bourdeaut, Franck; Lobon-Iglesias, Maria-Jesus; ANDRIANTERANAGNA, Mamy; Han, Zhi-Yan; Servant, Nicolas (2023). primary ATRT single cell RNA-seq [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001044566
Explore at:
Dataset updated
Apr 27, 2023
Authors
Bourdeaut, Franck; Lobon-Iglesias, Maria-Jesus; ANDRIANTERANAGNA, Mamy; Han, Zhi-Yan; Servant, Nicolas
Description
Seurat object stored in RDS file format. Three fresh tumor samples (respectively from INI254, INI255 and INI267) were prepared and loaded in 10x Chromium instrument (10x Genomics). Libraries were prepared using a Single Cell 3’ Reagent Kit (V2 chemistry, 10X Genomics) and sequenced on an Illumina HiSeq2500 using paired-end 26x98 bp as sequencing mode, targeting at least 50 000 reads par cell. Mapping and UMI counting per gene were performed using cellranger tool (version 3.1.0) and the hg19 reference genome version. Cells with both a low number of genes and a high proportion of mitochondrial RNA were discarded. The threshold of the minimum number of detected genes was set as the 5th percentile of the distribution of the number of detected genes in all cells while the maximum proportion of mitochondrial genes were set by visual inspection of the plot of the number of detected genes versus the percentage of mitochondrial gene of each sample. scRNA-seq data integration was performed using the CCA-based implemented in Seurat version 3. The clustering was conducted using the graph-based modularity optimization Louvain algorithm implemented in Seurat v3. The resolution 0.2 (integrated_snn_res.0.2) was choosen for the final result.
G
Single-Cell Data Analysis Software Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Aug 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Single-Cell Data Analysis Software Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/single-cell-data-analysis-software-market
Explore at:
pptx, csv, pdfAvailable download formats
Dataset updated
Aug 29, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Single-Cell Data Analysis Software Market Outlook

According to our latest research, the global single-cell data analysis software market size reached USD 424.5 million in 2024. The market is demonstrating a robust upward trajectory, driven by technological advancements and expanding applications across life sciences. The market is projected to grow at a CAGR of 15.9% from 2025 to 2033, reaching an estimated USD 1,483.4 million by 2033. This impressive growth is primarily fueled by the increasing adoption of single-cell sequencing technologies in genomics, transcriptomics, and proteomics research, as well as the expanding demand from pharmaceutical and biotechnology companies for advanced data analytics solutions.

One of the primary growth factors for the single-cell data analysis software market is the rapid evolution and adoption of high-throughput single-cell sequencing technologies. Over the past decade, there has been a significant shift from bulk cell analysis to single-cell approaches, allowing researchers to unravel cellular heterogeneity with unprecedented resolution. This transition has generated massive volumes of complex data, necessitating sophisticated software tools for effective analysis, visualization, and interpretation. The need to extract actionable insights from these intricate datasets is compelling both academic and commercial entities to invest in advanced single-cell data analysis software, thus propelling market expansion.

Another major driver is the expanding application scope of single-cell data analysis across various omics fields, including genomics, transcriptomics, proteomics, and epigenomics. The integration of these multi-omics datasets is enabling deeper insights into disease mechanisms, biomarker discovery, and personalized medicine. Pharmaceutical and biotechnology companies are increasingly leveraging single-cell data analysis software to accelerate drug discovery and development processes, optimize clinical trials, and identify novel therapeutic targets. The continuous innovation in algorithms, machine learning, and artificial intelligence is further enhancing the capabilities of these software solutions, making them indispensable tools in modern biomedical research.

Single-cell Analysis is revolutionizing the field of life sciences by providing unprecedented insights into cellular diversity and function. This cutting-edge approach allows researchers to study individual cells in isolation, revealing intricate details about their genetic, transcriptomic, and proteomic profiles. By focusing on single cells, scientists can uncover rare cell types and understand complex biological processes that were previously masked in bulk analyses. The ability to perform Single-cell Analysis is transforming our understanding of diseases, enabling the identification of novel biomarkers and therapeutic targets, and paving the way for personalized medicine.

The surge in government and private funding for single-cell research, coupled with the rising prevalence of chronic and infectious diseases, is also contributing to market growth. Governments worldwide are launching initiatives to support precision medicine and genomics research, fostering collaborations between academic institutions and industry players. This supportive ecosystem is not only stimulating the development of new single-cell technologies but also driving the adoption of specialized data analysis software. Moreover, the increasing awareness of the importance of data reproducibility and standardization is prompting the adoption of advanced software platforms that ensure robust, scalable, and reproducible analysis workflows.

From a regional perspective, North America continues to dominate the single-cell data analysis software market, attributed to its strong research infrastructure, presence of leading biotechnology and pharmaceutical companies, and substantial funding for genomics research. However, the Asia Pacific region is emerging as a significant growth engine, driven by increasing investments in life sciences, growing collaborations between academia and industry, and the rapid adoption of advanced sequencing technologies. Europe also holds a considerable share, supported by robust research activities and supportive regulatory frameworks. The market landscape in Latin America and the Middle East & Africa r
f
Data_Sheet_1_CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq.PDF...
frontiersin.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenbo Yu; Ahmed Mahfouz; Marcel J. T. Reinders (2023). Data_Sheet_1_CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.644211.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2021.644211.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Wenbo Yu; Ahmed Mahfouz; Marcel J. T. Reinders
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The power of single-cell RNA sequencing (scRNA-seq) in detecting cell heterogeneity or developmental process is becoming more and more evident every day. The granularity of this knowledge is further propelled when combining two batches of scRNA-seq into a single large dataset. This strategy is however hampered by technical differences between these batches. Typically, these batch effects are resolved by matching similar cells across the different batches. Current approaches, however, do not take into account that we can constrain this matching further as cells can also be matched on their cell type identity. We use an auto-encoder to embed two batches in the same space such that cells are matched. To accomplish this, we use a loss function that preserves: (1) cell-cell distances within each of the two batches, as well as (2) cell-cell distances between two batches when the cells are of the same cell-type. The cell-type guidance is unsupervised, i.e., a cell-type is defined as a cluster in the original batch. We evaluated the performance of our cluster-guided batch alignment (CBA) using pancreas and mouse cell atlas datasets, against six state-of-the-art single cell alignment methods: Seurat v3, BBKNN, Scanorama, Harmony, LIGER, and BERMUDA. Compared to other approaches, CBA preserves the cluster separation in the original datasets while still being able to align the two datasets. We confirm that this separation is biologically meaningful by identifying relevant differential expression of genes for these preserved clusters.
D
Single-Cell Data Analysis Software Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Single-Cell Data Analysis Software Market Research Report 2033 [Dataset]. https://dataintelo.com/report/single-cell-data-analysis-software-market
Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Sep 30, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Single-Cell Data Analysis Software Market Outlook

According to our latest research, the global Single-Cell Data Analysis Software market size reached USD 498.6 million in 2024, driven by increasing demand for high-resolution cellular analysis in life sciences and healthcare. The market is experiencing robust expansion with a CAGR of 15.2% from 2025 to 2033, and is projected to reach USD 1,522.9 million by 2033. This impressive growth trajectory is primarily attributed to advancements in single-cell sequencing technologies, the proliferation of precision medicine, and the rising adoption of artificial intelligence and machine learning in bioinformatics.

The growth of the Single-Cell Data Analysis Software market is significantly propelled by the rapid evolution of next-generation sequencing (NGS) technologies and the increasing need for comprehensive single-cell analysis in both research and clinical settings. As researchers strive to unravel cellular heterogeneity and gain deeper insights into complex biological systems, the demand for robust data analysis tools has surged. Single-cell data analysis software enables scientists to process, visualize, and interpret large-scale datasets, facilitating the identification of rare cell populations, novel biomarkers, and disease mechanisms. The integration of advanced algorithms and user-friendly interfaces has further enhanced the accessibility and adoption of these solutions across various end-user segments, including academic and research institutes, biotechnology and pharmaceutical companies, and hospitals and clinics.

Another key driver for market growth is the expanding application of single-cell analysis in precision medicine and drug discovery. The ability to analyze gene expression, protein levels, and epigenetic modifications at the single-cell level has revolutionized the understanding of disease pathogenesis and therapeutic response. This has led to a surge in demand for specialized software capable of managing complex, multi-omics datasets and generating actionable insights for personalized treatment strategies. Furthermore, the ongoing trend of integrating artificial intelligence and machine learning in single-cell data analysis is enabling more accurate predictions and faster data processing, thus accelerating the pace of biomedical research and clinical diagnostics.

The increasing collaboration between academia, industry, and government agencies is also contributing to market expansion. Public and private investments in single-cell genomics research are fostering innovation in data analysis software, while strategic partnerships and acquisitions are facilitating the development of comprehensive, end-to-end solutions. Additionally, the growing awareness of the potential of single-cell analysis in oncology, immunology, and regenerative medicine is encouraging the adoption of advanced software platforms worldwide. However, challenges such as data privacy concerns, high implementation costs, and the need for skilled personnel may pose restraints to market growth, particularly in low-resource settings.

From a regional perspective, North America continues to dominate the Single-Cell Data Analysis Software market, owing to its well-established healthcare infrastructure, strong presence of leading biotechnology and pharmaceutical companies, and substantial investments in genomics research. Europe follows closely, supported by robust government funding and a thriving life sciences sector. The Asia Pacific region is emerging as a lucrative market, driven by rising healthcare expenditure, expanding research capabilities, and increasing adoption of advanced technologies in countries such as China, Japan, and India. Latin America and the Middle East & Africa are also witnessing gradual growth, albeit at a slower pace, due to improving healthcare infrastructure and growing awareness of single-cell analysis applications.

Component Analysis

The Single-Cell Data Analysis Software market by component is broadly segmented into software and services, each playing a pivotal role in the overall ecosystem. Software solutions form the backbone of this market, offering a wide array of functionalities such as data preprocessing, quality control, clustering, visualization, and integration of multi-omics data. The increasing complexity and volume of single-cell datasets have driven the development of sophisticated software platforms equipped with advanced analytics, machine learning algorithms, and intuitive user interfaces. These platfo
Features Multimodal Single-Cell Integration
kaggle.com
zip
Updated Nov 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Chervov (2022). Features Multimodal Single-Cell Integration [Dataset]. https://www.kaggle.com/datasets/alexandervc/features-multimodal-singlecell-integration
Explore at:
zip(1109224140 bytes)Available download formats
Dataset updated
Nov 9, 2022
Authors
Alexander Chervov
Description
Multimodal Single-Cell Integration

Features selected here (version 13 of the notebook): https://www.kaggle.com/code/visualcomments/mmscel-crossvalidation-schemes-features-select/data?scriptVersionId=110474725
Data from: Multimodal integration of single cell ATAC-seq data enables...
zenodo.org
application/gzip, bin +1
Updated Jun 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kewei Xiong; Kewei Xiong (2025). Multimodal integration of single cell ATAC-seq data enables highly accurate delineation of clinically relevant tumor cell subpopulations [Dataset]. http://doi.org/10.5281/zenodo.15621738
Explore at:
bin, csv, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15621738
Dataset updated
Jun 10, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kewei Xiong; Kewei Xiong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data used for tutorial.

fragments.tsv.gz(.tbi), singlecell.csv, filtered_peak_bc_matrix.h5: scATAC-seq pre-processing and cell annotation

peak.mat.rds: corrected chromatin accessibility profile

cancer.cnv.csv: copy number profile of cancer cells

snv.mat.rds, denoised.mat.rds: raw and denoised SNV matrix

Facebook

Twitter

Click to copy link

Link copied

Cite

David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34

Data from: Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.t4b8gtj34

Dataset updated

Dec 14, 2021

Dataset provided by

Cornell University

Authors

David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove

License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using

Clear search

Close search

Google apps

Main menu

Data from: Large-scale integration of single-cell transcriptomic data...

scRNA-seq + scATAC-seq Challenge at NeurIPS 2021

Context

Particular data

Related datasets:

Inspiration

Table1_Influence of single-cell RNA sequencing data integration on the...

single cell data integration code and dataset

CITE-seq=scRNA-seq+Proteins: Challenge NeurIPS2021

Context

Particular data

Related datasets:

Inspiration

CITE-seq = scRNA-seq + Proteins: Human PBMCs 2019

Data and Context

Related datasets:

Inspiration

Data Repository: Single-cell mapper (scMappR): using scRNA-seq to infer...

Data from: CSS: cluster similarity spectrum integration of single-cell...

Single Cell Insights Into Cancer Transcriptomes: A Five-Part Single-Cell...

Additional file 1 of scRNASequest: an ecosystem of scRNA-seq analysis,...

Benchmarking deep learning methods for biologically conserved single-cell...

primary mouse RT single cell RNA-seq

Data from: Integration of spatial and single-cell transcriptomic data...

Additional file 3 of scRNASequest: an ecosystem of scRNA-seq analysis,...

primary ATRT single cell RNA-seq

Single-Cell Data Analysis Software Market Research Report 2033

Single-Cell Data Analysis Software Market Outlook

Data_Sheet_1_CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq.PDF...

Single-Cell Data Analysis Software Market Research Report 2033

Single-Cell Data Analysis Software Market Outlook

Component Analysis

Features Multimodal Single-Cell Integration

Data from: Multimodal integration of single cell ATAC-seq data enables...

Data from: Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration