100+ datasets found

f
Table1_Influence of single-cell RNA sequencing data integration on the...
frontiersin.figshare.com
docx
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomasz Kujawa; Michał Marczyk; Joanna Polanska (2023). Table1_Influence of single-cell RNA sequencing data integration on the performance of differential gene expression analysis.docx [Dataset]. http://doi.org/10.3389/fgene.2022.1009316.s002
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2022.1009316.s002
Dataset updated
Jun 13, 2023
Dataset provided by
Frontiers
Authors
Tomasz Kujawa; Michał Marczyk; Joanna Polanska
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Large-scale comprehensive single-cell experiments are often resource-intensive and require the involvement of many laboratories and/or taking measurements at various times. This inevitably leads to batch effects, and systematic variations in the data that might occur due to different technology platforms, reagent lots, or handling personnel. Such technical differences confound biological variations of interest and need to be corrected during the data integration process. Data integration is a challenging task due to the overlapping of biological and technical factors, which makes it difficult to distinguish their individual contribution to the overall observed effect. Moreover, the choice of integration method may impact the downstream analyses, including searching for differentially expressed genes. From the existing data integration methods, we selected only those that return the full expression matrix. We evaluated six methods in terms of their influence on the performance of differential gene expression analysis in two single-cell datasets with the same biological study design that differ only in the way the measurement was done: one dataset manifests strong batch effects due to the measurements of each sample at a different time. Integrated data were visualized using the UMAP method. The evaluation was done both on individual gene level using parametric and non-parametric approaches for finding differentially expressed genes and on gene set level using gene set enrichment analysis. As an evaluation metric, we used two correlation coefficients, Pearson and Spearman, of the obtained test statistics between reference, test, and corrected studies. Visual comparison of UMAP plots highlighted ComBat-seq, limma, and MNN, which reduced batch effects and preserved differences between biological conditions. Most of the tested methods changed the data distribution after integration, which negatively impacts the use of parametric methods for the analysis. Two algorithms, MNN and Scanorama, gave very poor results in terms of differential analysis on gene and gene set levels. Finally, we highlight ComBat-seq as it led to the highest correlation of test statistics between reference and corrected dataset among others. Moreover, it does not distort the original distribution of gene expression data, so it can be used in all types of downstream analyses.
n
Data from: Large-scale integration of single-cell transcriptomic data...
data.niaid.nih.gov
dataone.org
+1more
zip
Updated Dec 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.t4b8gtj34
Dataset updated
Dec 14, 2021
Dataset provided by
Cornell University
Authors
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using
f
Data_Sheet_1_CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq.PDF...
frontiersin.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenbo Yu; Ahmed Mahfouz; Marcel J. T. Reinders (2023). Data_Sheet_1_CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.644211.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2021.644211.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Wenbo Yu; Ahmed Mahfouz; Marcel J. T. Reinders
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The power of single-cell RNA sequencing (scRNA-seq) in detecting cell heterogeneity or developmental process is becoming more and more evident every day. The granularity of this knowledge is further propelled when combining two batches of scRNA-seq into a single large dataset. This strategy is however hampered by technical differences between these batches. Typically, these batch effects are resolved by matching similar cells across the different batches. Current approaches, however, do not take into account that we can constrain this matching further as cells can also be matched on their cell type identity. We use an auto-encoder to embed two batches in the same space such that cells are matched. To accomplish this, we use a loss function that preserves: (1) cell-cell distances within each of the two batches, as well as (2) cell-cell distances between two batches when the cells are of the same cell-type. The cell-type guidance is unsupervised, i.e., a cell-type is defined as a cluster in the original batch. We evaluated the performance of our cluster-guided batch alignment (CBA) using pancreas and mouse cell atlas datasets, against six state-of-the-art single cell alignment methods: Seurat v3, BBKNN, Scanorama, Harmony, LIGER, and BERMUDA. Compared to other approaches, CBA preserves the cluster separation in the original datasets while still being able to align the two datasets. We confirm that this separation is biologically meaningful by identifying relevant differential expression of genes for these preserved clusters.
Data from: Benchmarking deep learning methods for biologically conserved...
zenodo.org
zip
Updated Jan 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chenxin Yi; Chenxin Yi (2025). Benchmarking deep learning methods for biologically conserved single-cell integration. [Dataset]. http://doi.org/10.5281/zenodo.14633468
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14633468
Dataset updated
Jan 12, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Chenxin Yi; Chenxin Yi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
scIB-E is a comprehensive deep learning-based benchmarking framework for evaluating single-cell RNA sequencing (scRNA-seq) data integration methods.

Unified Benchmarking Framework:

Evaluates 16 deep-learning single-cell integration methods using a unified variational autoencoder (VAE) framework.

Incorporates batch information, cell-type labels, and combined strategies across three integration levels.

Refined Metrics for Intra-cell-type Variation:

Extends the single-cell integration benchmarking (scIB) metrics by adding new metrics to better capture intra-cell-type biological conservation.

Novel Loss Function:

Introduces Corr-MSE Loss, a correlation-based loss function designed to preserve global cellular relationships and enhance intra-cell-type biological variation.

The preprocessed datasets are available at src/data.
MOESM1 of A benchmark of batch-effect correction methods for single-cell RNA...
springernature.figshare.com
xlsx
Updated Feb 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hoa Tran; Kok Ang; Marion Chevrier; Xiaomeng Zhang; Nicole Lee; Michelle Goh; Jinmiao Chen (2024). MOESM1 of A benchmark of batch-effect correction methods for single-cell RNA sequencing data [Dataset]. http://doi.org/10.6084/m9.figshare.11636385.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11636385.v1
Dataset updated
Feb 9, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Hoa Tran; Kok Ang; Marion Chevrier; Xiaomeng Zhang; Nicole Lee; Michelle Goh; Jinmiao Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 1: Table S1. Detailed description of datasets. The table lists the dataset sources, number of batches, number of cells per batch, and sequencing technology.
MOESM2 of A benchmark of batch-effect correction methods for single-cell RNA...
springernature.figshare.com
xlsx
Updated Feb 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hoa Tran; Kok Ang; Marion Chevrier; Xiaomeng Zhang; Nicole Lee; Michelle Goh; Jinmiao Chen (2024). MOESM2 of A benchmark of batch-effect correction methods for single-cell RNA sequencing data [Dataset]. http://doi.org/10.6084/m9.figshare.11636391.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11636391.v1
Dataset updated
Feb 9, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Hoa Tran; Kok Ang; Marion Chevrier; Xiaomeng Zhang; Nicole Lee; Michelle Goh; Jinmiao Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 2: Table S2. Cell count per cell type. Breakdown of cell count per cell type for each dataset.
m
Data from: CSS: cluster similarity spectrum integration of single-cell...
data.mendeley.com
Updated Aug 15, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhisong He (2020). CSS: cluster similarity spectrum integration of single-cell genomics data [Dataset]. http://doi.org/10.17632/3kthhpw2pd.2
Explore at:
Unique identifier
https://doi.org/10.17632/3kthhpw2pd.2
Dataset updated
Aug 15, 2020
Authors
Zhisong He
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
It is a major challenge to integrate single-cell sequencing data across experiments, conditions, batches, timepoints and other technical considerations. New computational methods are required that can integrate samples while simultaneously preserving biological information. Here, we propose an unsupervised reference-free data representation, Cluster Similarity Spectrum (CSS), where each cell is represented by its similarities to clusters independently identified across samples. We show that CSS can be used to assess cellular heterogeneity and enable reconstruction of differentiation trajectories from cerebral organoid and other single-cell transcriptomic data, and to integrate data across experimental conditions and human individuals.

The presented data set here includes 1) the seurat object of the published two-month-old human cerebral organoid scRNA-seq data (Kanton et al. 2019 Nature); 2) the single-cell RNA-seq data of cerebral organoid generated by inDrop; 3) the newly generated single-cell RNA-seq data of cerebral organoids with and without fixation conditions.
Single-cell datasets for temporal gene expression integration
zenodo.org
bin
Updated Aug 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jolene Ranek; Natalie Stanley; Jeremy Purvis; Jolene Ranek; Natalie Stanley; Jeremy Purvis (2022). Single-cell datasets for temporal gene expression integration [Dataset]. http://doi.org/10.5281/zenodo.6587903
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6587903
Dataset updated
Aug 12, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jolene Ranek; Natalie Stanley; Jeremy Purvis; Jolene Ranek; Natalie Stanley; Jeremy Purvis
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Contains loom files and preprocessed adata objects to compare methods for temporal gene expression integration. Loom files can be accessed using the 'read' function in Scvelo. Preprocessed adata objects can be accessed using the 'read_h5ad' function in Scanpy.

The raw single-cell RNA sequencing datasets can be found under the following accession codes.

Mouse embryonic cell cycle dataset from Ref. (https://doi.org/10.1038/nbt.3102) was originally downloaded from ArrayExpress with the accession code E-MTAB-2805

Hematopoiesis differentiation dataset from Ref. (https://doi.org/10.1182/blood-2016-05-716480) was originally downloaded from the Gene Expression Omnibus with the accession code GSE81682

NKT cell differentiation dataset from Ref. (https://doi.org/10.1038/ni.3437) was originally downloaded from the Gene Expression Omnibus with the accession code GSE74596.

Hematopoiesis differentiation dataset from Ref. (https://doi.org/10.1038/nature19348) was originally downloaded from the Gene Expression Omnibus with the accession codes GSE70236, GSE70240, GSE70244

LPS stimulation dataset from Ref. (https://doi.org/10.1016/j.cels.2017.03.010) was originally downloaded from the Gene Expression Omnibus with the accession code GSE94383.

INF-gamma stimulation dataset from Ref. (https://doi.org/10.1038/s41587-020-00803-5) was originally downloaded from the Gene Expression Omnibus with the accession code GSE161465.

AML chemotherapy dataset from Ref. (https://doi.org/10.1038/s41591-018-0233-1) was originally downloaded from the Gene Expression Omnibus with the accession code GSE116481.

AML diagnosis/relapse dataset from Ref. (https://doi.org/10.1038/s41375-021-01338-7) was originally downloaded from the Gene Expression Omnibus with the accession code GSE126068.

MS case control PBMC and CSF datasets from Ref. (https://doi.org/10.1038/s41467-019-14118-w) was originally downloaded from the Gene Expression Omnibus with the accession code GSE138266.
Data used in SeuratIntegrate paper
zenodo.org
application/gzip, bin +2
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Specque; Florian Specque; Macha Nikolski; Macha Nikolski; Domitille Chalopin; Domitille Chalopin (2025). Data used in SeuratIntegrate paper [Dataset]. http://doi.org/10.5281/zenodo.15496601
Explore at:
bin, pdf, txt, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15496601
Dataset updated
May 23, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Florian Specque; Florian Specque; Macha Nikolski; Macha Nikolski; Domitille Chalopin; Domitille Chalopin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository gathers the data and code used to generate hepatocellular carcinoma analyses in the paper presenting SeuratIntegrate. It contains the scripts to reproduce the figures presented in the article. Some figures are also available as pdf files.

To be able to fully reproduce the results from the paper, one shoud:

download all the files

install R 4.3.3, with correspondig base R packages (stats, graphics, grDevices, utils, datasets, methods and base)

install R packages listed in the file sessionInfo.txt

install the provided version of SeuratIntegrate. In an R session, run:

remotes::install_local("path/to/SeuratIntegrate_0.4.1.tar.gz")

install (mini)conda if necessary (we used miniconda version 23.11.0)

install the conda environments (if it fails with the *package-list.yml files, use the *package-list-from-history.yml files instead):

conda env create --file SeuratIntegrate_bbknn_package-list.yml conda env create --file SeuratIntegrate_scanorama_package-list.yml conda env create --file SeuratIntegrate_scvi-tools_package-list.yml conda env create --file SeuratIntegrate_trvae_package-list.yml

open an R session to make the conda environments usable by SeuratIntegrate:

library(SeuratIntegrate) UpdateEnvCache("bbknn", conda.env = "SeuratIntegrate_bbknn", conda.env.is.path = FALSE) UpdateEnvCache("scanorama", conda.env = "SeuratIntegrate_scanorama", conda.env.is.path = FALSE) UpdateEnvCache("scvi", conda.env = "SeuratIntegrate_scvi-tools", conda.env.is.path = FALSE) UpdateEnvCache("trvae", conda.env = "SeuratIntegrate_trvae", conda.env.is.path = FALSE)

Once done, running the code in integrate.R should produce reproducible results. Note that lines 3 to 6 from integrate.R should be adapted to the user's setup.
integrate.R is subdivided into six main parts:

Preparation: lines 1-56

Preprocessing: lines 58-74

Integration: lines 76-121

Processing of integration outputs: lines 126-267

Scoring of integration outputs: lines 269-353

Plotting: lines 380-507

Intermediate SeuratObjects have been saved between steps 3 and 4 and 5 and 6 (liver10k_integrated_object.RDS and liver10k_integrated_scored_object.RDS respectively). It is possible to start with these intermediate SeuratObjects to avoid the preceding steps, given that the Preparation step is always run before.
MOESM5 of A benchmark of batch-effect correction methods for single-cell RNA...
springernature.figshare.com
xlsx
Updated Feb 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hoa Tran; Kok Ang; Marion Chevrier; Xiaomeng Zhang; Nicole Lee; Michelle Goh; Jinmiao Chen (2024). MOESM5 of A benchmark of batch-effect correction methods for single-cell RNA sequencing data [Dataset]. http://doi.org/10.6084/m9.figshare.11636412.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11636412.v1
Dataset updated
Feb 9, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Hoa Tran; Kok Ang; Marion Chevrier; Xiaomeng Zhang; Nicole Lee; Michelle Goh; Jinmiao Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 5: Table S4. Evaluation metrics. Detailed assessment metric scores and F-score for all methods on all datasets.
s
Single-cell RNA-seq of the mouse and human lymph node lymphatic vasculature
purl.stanford.edu
Updated Jan 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Menglan Xiang (2020). Single-cell RNA-seq of the mouse and human lymph node lymphatic vasculature [Dataset]. https://purl.stanford.edu/xr811qy1057
Explore at:
Dataset updated
Jan 2, 2020
Authors
Menglan Xiang
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
Single-cell transcriptomics promises to revolutionize our understanding of the vasculature. Emerging computational methods applied to high dimensional single cell data allow integration of results between samples and species, and illuminate the diversity and underlying developmental and architectural organization of cell populations. Here, we illustrate these methods in analysis of mouse lymph node (LN) lymphatic endothelial cells (LEC) at single cell resolution. Clustering identifies five well-delineated subsets, including two medullary sinus subsets not recognized previously as distinct. Nearest neighbor alignments in trajectory space position the major subsets in a sequence that recapitulates known and suggests novel features of LN lymphatic organization, providing a transcriptional map of the lymphatic endothelial niches and of the transitions between them. Differences in gene expression reveal specialized programs for (1) subcapsular ceiling endothelial interactions with the capsule connective tissue and cells, (2) subcapsular floor regulation of lymph borne cell entry into the LN parenchyma and antigen presentation, and (3) medullary subset specialization for pathogen interactions and LN remodeling. LEC of the subcapsular sinus floor and medulla, which represent major sites of cell entry and exit from the LN parenchyma respectively, respond robustly to oxazolone inflammation challenge with enriched signaling pathways that converge on both innate and adaptive immune responses. Integration of mouse and human single-cell profiles reveals a conserved cross-species pattern of lymphatic vascular niches and gene expression, as well as specialized human subsets and genes unique to each species. The examples provided demonstrate the power of single-cell analysis in elucidating endothelial cell heterogeneity, vascular organization and endothelial cell responses. We discuss the findings from the perspective of LEC functions in relation to niche formations in the unique stromal and highly immunological environment of the LN.
MOESM6 of A benchmark of batch-effect correction methods for single-cell RNA...
springernature.figshare.com
xlsx
Updated Feb 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hoa Tran; Kok Ang; Marion Chevrier; Xiaomeng Zhang; Nicole Lee; Michelle Goh; Jinmiao Chen (2024). MOESM6 of A benchmark of batch-effect correction methods for single-cell RNA sequencing data [Dataset]. http://doi.org/10.6084/m9.figshare.11636418.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11636418.v1
Dataset updated
Feb 9, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Hoa Tran; Kok Ang; Marion Chevrier; Xiaomeng Zhang; Nicole Lee; Michelle Goh; Jinmiao Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 6: Table S5. Evaluation statistical test. Statistical significance test results of the batch correction methodâ€™s assessment metric scores.

Single Cell Analysis Market Analysis, Size, and Forecast 2025-2029: North...

technavio.com

Updated Apr 15, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio (2025). Single Cell Analysis Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, Italy, and UK), and APAC (China, India, Japan, and South Korea) [Dataset]. https://www.technavio.com/report/single-cell-analysis-market-industry-analysis

Explore at:

Dataset updated

Apr 15, 2025

Dataset provided by

TechNavio

Authors

Technavio

Time period covered

2021 - 2025

Area covered

Canada, United States, Global

Description

Snapshot img

Single Cell Analysis Market Size 2025-2029

The single cell analysis market size is forecast to increase by USD 4.63 billion at a CAGR of 18.2% between 2024 and 2029.

The market is experiencing significant growth due to the increasing prevalence of cancer and the rising incidence of chronic diseases and genetic disorders. This market is driven by the need for more precise and personalized diagnostic and therapeutic approaches, which single cell analysis provides. However, the high cost of single cell analysis products remains a major challenge for market expansion, limiting accessibility to this technology for many healthcare providers and research institutions. Despite this, the market's potential is vast, with opportunities in various end-user industries such as pharmaceuticals, biotechnology, and academia. This approach, which combines data from genomics, transcriptomics, proteomics, and metabolomics, among others, can provide valuable insights into cellular function and behavior.
Companies seeking to capitalize on this market's growth should focus on developing cost-effective solutions while maintaining the high-quality standards required for single cell analysis. Additionally, collaborations and partnerships with key opinion leaders and research institutions can help establish market presence and credibility. Overall, the market presents a compelling opportunity for companies to make a significant impact on the healthcare industry by enabling more accurate diagnoses and personalized treatments.

What will be the Size of the Single Cell Analysis Market during the forecast period?

Request Free Sample

Single-cell analysis, a cutting-edge technology, is revolutionizing the healthcare industry by enabling a more comprehensive knowledge of complex biological systems. This advanced approach allows for the examination of individual cells, providing insights into clinical trial design, tumor microenvironment, and patient stratification. Technologies such as single-cell spatial transcriptomics, microfluidic chips, and droplet microfluidics facilitate the analysis of cell diameter, morphology, immune cell infiltration, and cell cycle phase. Furthermore, single-cell lineage tracing, immune profiling, developmental trajectory analysis, and spatial proteomics offer valuable information on circulating tumor cells and tumor heterogeneity. Single-cell analysis software, genome-wide association studies, and epigenetic analysis contribute to the interpretation of vast amounts of data generated.
Drug response prediction, cell interactions, and biomarker validation are additional applications of this technology. Single-cell analysis services and consulting firms facilitate the implementation of this technology in research and clinical settings. Protein expression profiling, encapsulation, and cell-free DNA analysis through liquid biopsy further expand the scope of single-cell analysis. This technology's potential is vast, offering significant advancements in diagnostics, therapeutics, and fundamental research.

How is this Single Cell Analysis Industry segmented?

The single cell analysis industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Product

  Consumables
  Instrument


Type

  Human cells
  Animal cells


Technique

  Flow cytometry
  Next-generation sequencing (NGS)
  Polymerase chain reaction (PCR)
  Microscopy
  Mass spectrometry


Application

  Research
  Medical


Geography

  North America

    US
    Canada


  Europe

    France
    Germany
    Italy
    UK


  APAC

    China
    India
    Japan
    South Korea

By Product Insights

The consumables segment is estimated to witness significant growth during the forecast period. The market encompasses various technologies and applications, including cell stress analysis, omics data integration, cellular heterogeneity, cell engineering, single-cell immunophenotyping, single-cell DNA sequencing, cell proliferation assays, systems biology, precision medicine, cellular metabolism, single-cell proteomics, gene editing, imaging cytometry, academic research, mass cytometry, single-cell barcoding, single-cell spatial analysis, microarray analysis, single-cell sequencing, machine learning, biopharmaceutical industry, data visualization, next-generation sequencing, developmental biology, biotechnology industry, clinical diagnostics, cell cycle analysis, high-throughput screening, cell signaling, regenerative medicine, cell line development, cancer research, flow cytometry, drug discovery, stem cell research, cell culture, cell differentiation assays, biomarker discovery, personalized medicine, single-cell RNA sequencing, single-cell methylation analysis, single-cell data analysis, multiplexed analysi

N
Data from: Integrating multimodal data sets into a mathematical framework to...
data.niaid.nih.gov
Updated May 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johnson K; Howard GR; Morgan D; Brenner EA; Gardner AL; Durrett RE; Mo W; Al’Khafaji A; Sontag ED; Jarrett AM; Yankeelov TE; Brock A (2021). Integrating multimodal data sets into a mathematical framework to describe and predict therapeutic resistance in cancer [Dataset]. https://data.niaid.nih.gov/resources?id=gse154932
Explore at:
Dataset updated
May 10, 2021
Dataset provided by
University of Texas at Austin
Authors
Johnson K; Howard GR; Morgan D; Brenner EA; Gardner AL; Durrett RE; Mo W; Al’Khafaji A; Sontag ED; Jarrett AM; Yankeelov TE; Brock A
Description
A significant challenge in the field of biomedicine is the development of methods to integrate the multitude of dispersed data sets into comprehensive frameworks to be used to generate optimal clinical decisions. Recent technological advances in single cell analysis allow for high-dimensional molecular characterization of cells and populations, but to date, few mathematical models have attempted to integrate measurements from the single cell scale with other data types. Here, we present a framework that actionizes static outputs from a machine learning model and leverages these as measurements of state variables in a dynamic mechanistic model of treatment response. We apply this framework to breast cancer cells to integrate single cell transcriptomic data with longitudinal population-size data. We demonstrate that the explicit inclusion of the transcriptomic information in the parameter estimation is critical for identification of the model parameters and enables accurate prediction of new treatment regimens. Inclusion of the transcriptomic data improves predictive accuracy in new treatment response dynamics with a concordance correlation coefficient (CCC) of 0.89 compared to a prediction accuracy of CCC = 0.79 without integration of the single cell RNA sequencing (scRNA-seq) data directly into the model calibration. To the best our knowledge, this is the first work that explicitly integrates single cell clonally-resolved transcriptome datasets with longitudinal treatment response data into a mechanistic mathematical model of drug resistance dynamics. We anticipate this approach to be a first step that demonstrates the feasibility of incorporating multimodal data sets into identifiable mathematical models to develop optimized treatment regimens from data. Single cell RNA-seq of MDA-MB-231 cell line with chemotherapy treatment
Data from: CellFuse enables multi-modal integration of single-cell and...
zenodo.org
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhishek Koladiya; Abhishek Koladiya (2025). CellFuse enables multi-modal integration of single-cell and spatial proteomics data [Dataset]. http://doi.org/10.5281/zenodo.15858358
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15858358
Dataset updated
Jul 17, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Abhishek Koladiya; Abhishek Koladiya
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 19, 2025
Description
Fig 2

Bone marrow (Fig 2B, D, E, F, H, Supplementary Fig 1A, 2,3)

1. Fig 2/BM/Reference/ Fig2_BM_prepare_data.R: Prepare bone marrow for CellFuse

2. Fig 2/BM/ BM_CellFuse_Integration.R: Run CellFuse

3. Fig 2/BM/BM_Running_Benchmark_Methods.R: Run benchmarking methods (Harmony, Seurat, FastMNN)

4. Fig 2/BM/BM_scIB_Benchmarking.ipynb: evaluate performance of CellFuse and other benchmarking methods using scIB framework proposed by Luecken et al.

5. Fig 2/BM/ BM_scIB_prepare_figures.R: Visualize results of scIB framework

6. Fig 2/BM/Sequential_Feature_drop/Prepare_data.R: Prepare data for evaluating sequential feature drop

7. Fig 2/BM/Sequential_Feature_drop/Run_methods.R: Run CellFuse, Harmony, Seurat and FastMNN for sequential feature drop

8. Fig 2/BM/Sequential_Feature_drop/Evaluate_results.R: Evaluate results features drop and visualize data.

PBMC (Fig 2G,I, Supplementary Fig 1B and 4)

1. Fig 2/PBMC/Reference/ Fig2_PBMC_prepare_data.R: Prepare PBMC data for CellFuse

2. Fig 2/ PBMC / PBMC_CellFuse_Integration.R: Run CellFuse

3. Fig 2/ PBMC /PBMC_Running_Benchmark_Methods.R: Run benchmarking methods (Harmony, Seurat, FastMNN)

4. Fig 2/ PBMC /PBMC_scIB_Benchmarking.ipynb: evaluate performace of CellFuse and other benchmarking methods using scIB framework proposed by Luecken et al., 2021

5. Fig 2/ PBMC /PBMC_scIB_prepare_figures.R: Visualize results of scIB framework

6. Fig 2/ PBMC/ RunTime_benchmark/Run_Benchmark.R: Prepare data, run benchmarking method and evaluate results.

Fig 3 and Supplementary Fig 5

1. Fig 3/Reference/ Fig3_CyTOF_prepare_data.R: Prepare CyTOF and CITE-Seq data for CellFuse

2. Fig 3/CellFuse_Integration_CyTOF.R: Run CellFuse to remove batch effect and integrate CyTOF data from day 7 post-infusion

3. Fig 3/CellFuse_Integration_CITESeq.R: Run CellFuse to integrate CyTOF and CITE-Seq data

4. Fig 3/CART_Data_visualisation.R: Visualize data

Fig 4

HuBMAP CODEX data (Fig. 4A, B, C, D and Supplementary Fig 6)

1. Fig 4/CODEX_colorectal/Reference/ CODEX_HuBMAP_prepare_data.R: Prepare CODEX data from annotated and unannotated donor

2. Fig 4/ CODEX_colorectal/ CODEX_HuBMAP_CellFuse_Predict.R: Run CellFuse on cells from from annotated and unannotated donor

3. Fig 4/ CODEX_colorectal/CODEX_HuBMAP_Data_visualisation.R: Visualize data and prepare figures.

4. Fig 4/ CODEX_colorectal/ CODEX_HuBMAP_Benchmark.R: Benchmarking CellFuse against CELESTA, SVM and Seurat using cells from annotated donors and prepare figures.

a. Astir is python package so run following python notebook: Fig 4/ CODEX_colorectal/ Benchmarking/Astir/Astrir.ipynb

5. Fig 4/ CODEX_colorectal/CODEX_HuBMAP_Suppl_figure_heatmap.R: F1score calculation per celltype per Benchmarking methods and heatmap comparing celltypes from annotated and unannotated donors (Supplementary Fig 6)

IMC Breast cancer data (Fig. 4E,F, G and Supplementary Fig 7)

1. Fig 4/ IMC_Breast_Cancer/ IMC_prepare_data.R: Prepare CODEX data from annotated and unannotated donor

2. Fig 4/ IMC_Breast_Cancer/ IMC_CellFuse_Predict.R: Run CellFuse to predict cell types

3. Fig 4/ IMC_Breast_Cancer/ IMC_dat_visualization.R: Visualize data and prepare figures.

Fig 5

1. Fig5/ Reference/ Fig5_CyTOF_Data_prep.R: Prepare CyTOF data from healthy PBMC and healthy colon single cells

2. Fig5/ MIBI_CellFuse_Predict.R: Run CellFuse to predicte cells from colon cancer patients

3. Fig5/ MIBI_PostPrediction.R: Visualize data and prepare figures

4. Fig5/ Predicted_Data/ mask_generation.ipynb: Post CellFuse prediction annotated cell types in segmented images. This will generate Fig5C and D
S
Single Cell Sequencing Kit Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jan 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Single Cell Sequencing Kit Report [Dataset]. https://www.datainsightsmarket.com/reports/single-cell-sequencing-kit-212730
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Jan 19, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The single-cell sequencing kit market is projected to reach a value of USD 4,321.7 million by 2033, expanding at a CAGR of 15.8% during the forecast period (2025-2033). The growing demand for single-cell sequencing in various research applications, such as cancer research, immunology, and neurology, is driving market growth. The key trends influencing the market include the increasing adoption of single-cell RNA sequencing (scRNA-seq) technologies, the development of novel sample preparation methods, and the integration of single-cell sequencing with other omics technologies, such as genomics, transcriptomics, and proteomics. However, the high cost of single-cell sequencing and the need for specialized expertise in data analysis present challenges to the market's growth. The major players in the market include 10x Genomics, BD, BGI, Singleron Bio, Seekgene, ThunderBio, Tenk Genomics, MobiDrop, BioMarker, Dynamic Biosystems, M20 Genomics, Illumina, QIAGEN, Jingxin Biotechnology, TaKaRa, Bio-Rad, and Mission Bio.
o
Data from: MOJITOO: a fast and universal method for integration of...
explore.openaire.eu
Updated Mar 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mingbo Cheng; Zhijian Li; Ivan G. Costa (2022). MOJITOO: a fast and universal method for integration of multimodal single cell data [Dataset]. http://doi.org/10.5281/zenodo.6348128
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6348128
Dataset updated
Mar 12, 2022
Authors
Mingbo Cheng; Zhijian Li; Ivan G. Costa
Description
MOJITOO benchmarking seurat Robjects.
Input Data for Galaxy tutorial "Batch Correction and Integration" - Seurat...
zenodo.org
Updated Jan 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marisa Loach; Marisa Loach (2025). Input Data for Galaxy tutorial "Batch Correction and Integration" - Seurat version [Dataset]. http://doi.org/10.5281/zenodo.14747577
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.14747577
Dataset updated
Jan 27, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Marisa Loach; Marisa Loach
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data is used for the Seurat version of the batch correction and integration tutorial on the Galaxy Training Network.

The input data was provided by Seurat in the 'Integrative Analysis in Seurat v5' tutorial. The input dataset provided here has been filtered to include only cells for which nFeature_RNA > 1000.

The original dataset was published as: Ding, J., Adiconis, X., Simmons, S.K. et al. Systematic comparison of single-cell and single-nucleus RNA-sequencing methods. Nat Biotechnol 38, 737–746 (2020). https://doi.org/10.1038/s41587-020-0465-8.
pbmc single cell RNA-seq matrix
zenodo.org
csv
Updated May 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuel Buchet; Samuel Buchet; Francesco Carbone; Morgan Magnin; Morgan Magnin; Mickaël Ménager; Olivier Roux; Olivier Roux; Francesco Carbone; Mickaël Ménager (2021). pbmc single cell RNA-seq matrix [Dataset]. http://doi.org/10.5281/zenodo.4730807
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4730807
Dataset updated
May 4, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Samuel Buchet; Samuel Buchet; Francesco Carbone; Morgan Magnin; Morgan Magnin; Mickaël Ménager; Olivier Roux; Olivier Roux; Francesco Carbone; Mickaël Ménager
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Single cell RNA-sequencing dataset of peripheral blood mononuclear cells (pbmc: T, B, NK and monocytes) extracted from two healthy donors.

Cells labeled as C26 come from a 30 years old female and cells labeled as C27 come from a 53 years old male. Cells have been isolated from blood using ficoll. Samples were sequenced using standard 3' v3 chemistry protocols by 10x genomics. Cellranger v4.0.0 was used for the processing, and reads were aligned to the ensembl GRCg38 human genome (GRCg38_r98-ensembl_Sept2019). QC metrics were calculated on the count matrix generated by cellranger (filtered_feature_bc_matrix). Cells with less than 3 genes per cells, less than 500 reads per cell and more than 20% of mithocondrial genes were discarded.

The processing steps was performed with the R package Seurat (https://satijalab.org/seurat/), including sample integration, data normalisation and scaling, dimensional reduction, and clustering. SCTransform method was adopted for the normalisation and scaling steps. The clustered cells were manually annotated using known cell type markers.

Files content:

- raw_dataset.csv: raw gene counts

- normalized_dataset.csv: normalized gene counts (single cell matrix)

- cell_types.csv: cell types identified from annotated cell clusters

- cell_types_macro.csv: cell macro types

- UMAP_coordinates.csv: 2d cell coordinates computed with UMAP algorithm in Seurat
E
Emerging Singlecell Technology Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jun 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Emerging Singlecell Technology Report [Dataset]. https://www.datainsightsmarket.com/reports/emerging-singlecell-technology-1009139
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Jun 19, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The emerging single-cell technology market is experiencing rapid growth, driven by advancements in genomics, proteomics, and bioinformatics. This technology allows researchers to analyze individual cells, providing unprecedented insights into cellular heterogeneity and function across various biological systems. The market's expansion is fueled by increasing demand for personalized medicine, drug discovery, and disease diagnostics. Applications span oncology, immunology, neuroscience, and infectious diseases, with single-cell RNA sequencing (scRNA-seq) currently dominating the market share. The high cost of instrumentation and data analysis remains a barrier to wider adoption, but ongoing technological innovations are driving down costs and improving accessibility. Furthermore, the development of new analytical tools and bioinformatics pipelines is enhancing data interpretation and accelerating research progress. This burgeoning field is attracting significant investment and collaborative efforts from both established players and innovative startups, fostering a competitive yet collaborative landscape. The projected market growth signifies a transformative impact on healthcare and life sciences, promising significant advancements in disease understanding and treatment. The forecast period from 2025 to 2033 anticipates substantial market expansion, propelled by increasing adoption across research institutions, pharmaceutical companies, and biotechnology firms. Key growth drivers include the development of more affordable and user-friendly single-cell technologies, the integration of multi-omics approaches (combining genomics, proteomics, and metabolomics), and expanding collaborations between academia and industry. Competitive pressures are driving innovation in areas like sample preparation, data analysis software, and the development of novel single-cell applications, such as spatial transcriptomics. Although challenges such as data complexity and the need for specialized expertise persist, the potential for single-cell technologies to revolutionize biological research and healthcare remains immense. This is reflected in the continuous influx of funding and the emergence of new market participants. By 2033, the market is poised to be significantly larger and more diverse, with a wider range of applications and technological advancements shaping the future of biological research and medicine.

Facebook

Twitter

Click to copy link

Link copied

Cite

Tomasz Kujawa; Michał Marczyk; Joanna Polanska (2023). Table1_Influence of single-cell RNA sequencing data integration on the performance of differential gene expression analysis.docx [Dataset]. http://doi.org/10.3389/fgene.2022.1009316.s002

Table1_Influence of single-cell RNA sequencing data integration on the performance of differential gene expression analysis.docx

Explore at:

docxAvailable download formats

Unique identifier

https://doi.org/10.3389/fgene.2022.1009316.s002

Dataset updated

Jun 13, 2023

Dataset provided by

Frontiers

Authors

Tomasz Kujawa; Michał Marczyk; Joanna Polanska

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Large-scale comprehensive single-cell experiments are often resource-intensive and require the involvement of many laboratories and/or taking measurements at various times. This inevitably leads to batch effects, and systematic variations in the data that might occur due to different technology platforms, reagent lots, or handling personnel. Such technical differences confound biological variations of interest and need to be corrected during the data integration process. Data integration is a challenging task due to the overlapping of biological and technical factors, which makes it difficult to distinguish their individual contribution to the overall observed effect. Moreover, the choice of integration method may impact the downstream analyses, including searching for differentially expressed genes. From the existing data integration methods, we selected only those that return the full expression matrix. We evaluated six methods in terms of their influence on the performance of differential gene expression analysis in two single-cell datasets with the same biological study design that differ only in the way the measurement was done: one dataset manifests strong batch effects due to the measurements of each sample at a different time. Integrated data were visualized using the UMAP method. The evaluation was done both on individual gene level using parametric and non-parametric approaches for finding differentially expressed genes and on gene set level using gene set enrichment analysis. As an evaluation metric, we used two correlation coefficients, Pearson and Spearman, of the obtained test statistics between reference, test, and corrected studies. Visual comparison of UMAP plots highlighted ComBat-seq, limma, and MNN, which reduced batch effects and preserved differences between biological conditions. Most of the tested methods changed the data distribution after integration, which negatively impacts the use of parametric methods for the analysis. Two algorithms, MNN and Scanorama, gave very poor results in terms of differential analysis on gene and gene set levels. Finally, we highlight ComBat-seq as it led to the highest correlation of test statistics between reference and corrected dataset among others. Moreover, it does not distort the original distribution of gene expression data, so it can be used in all types of downstream analyses.

Clear search

Close search

Google apps

Main menu

Table1_Influence of single-cell RNA sequencing data integration on the...

Data from: Large-scale integration of single-cell transcriptomic data...

Data_Sheet_1_CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq.PDF...

Data from: Benchmarking deep learning methods for biologically conserved...

MOESM1 of A benchmark of batch-effect correction methods for single-cell RNA...

MOESM2 of A benchmark of batch-effect correction methods for single-cell RNA...

Data from: CSS: cluster similarity spectrum integration of single-cell...

Single-cell datasets for temporal gene expression integration

Data used in SeuratIntegrate paper

MOESM5 of A benchmark of batch-effect correction methods for single-cell RNA...

Single-cell RNA-seq of the mouse and human lymph node lymphatic vasculature

MOESM6 of A benchmark of batch-effect correction methods for single-cell RNA...

Single Cell Analysis Market Analysis, Size, and Forecast 2025-2029: North...

Snapshot img

Data from: Integrating multimodal data sets into a mathematical framework to...

Data from: CellFuse enables multi-modal integration of single-cell and...

Single Cell Sequencing Kit Report

Data from: MOJITOO: a fast and universal method for integration of...

Input Data for Galaxy tutorial "Batch Correction and Integration" - Seurat...

pbmc single cell RNA-seq matrix

Emerging Singlecell Technology Report

Table1_Influence of single-cell RNA sequencing data integration on the performance of differential gene expression analysis.docx