Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The power of single-cell RNA sequencing (scRNA-seq) in detecting cell heterogeneity or developmental process is becoming more and more evident every day. The granularity of this knowledge is further propelled when combining two batches of scRNA-seq into a single large dataset. This strategy is however hampered by technical differences between these batches. Typically, these batch effects are resolved by matching similar cells across the different batches. Current approaches, however, do not take into account that we can constrain this matching further as cells can also be matched on their cell type identity. We use an auto-encoder to embed two batches in the same space such that cells are matched. To accomplish this, we use a loss function that preserves: (1) cell-cell distances within each of the two batches, as well as (2) cell-cell distances between two batches when the cells are of the same cell-type. The cell-type guidance is unsupervised, i.e., a cell-type is defined as a cluster in the original batch. We evaluated the performance of our cluster-guided batch alignment (CBA) using pancreas and mouse cell atlas datasets, against six state-of-the-art single cell alignment methods: Seurat v3, BBKNN, Scanorama, Harmony, LIGER, and BERMUDA. Compared to other approaches, CBA preserves the cluster separation in the original datasets while still being able to align the two datasets. We confirm that this separation is biologically meaningful by identifying relevant differential expression of genes for these preserved clusters.
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, in...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Large-scale comprehensive single-cell experiments are often resource-intensive and require the involvement of many laboratories and/or taking measurements at various times. This inevitably leads to batch effects, and systematic variations in the data that might occur due to different technology platforms, reagent lots, or handling personnel. Such technical differences confound biological variations of interest and need to be corrected during the data integration process. Data integration is a challenging task due to the overlapping of biological and technical factors, which makes it difficult to distinguish their individual contribution to the overall observed effect. Moreover, the choice of integration method may impact the downstream analyses, including searching for differentially expressed genes. From the existing data integration methods, we selected only those that return the full expression matrix. We evaluated six methods in terms of their influence on the performance of differential gene expression analysis in two single-cell datasets with the same biological study design that differ only in the way the measurement was done: one dataset manifests strong batch effects due to the measurements of each sample at a different time. Integrated data were visualized using the UMAP method. The evaluation was done both on individual gene level using parametric and non-parametric approaches for finding differentially expressed genes and on gene set level using gene set enrichment analysis. As an evaluation metric, we used two correlation coefficients, Pearson and Spearman, of the obtained test statistics between reference, test, and corrected studies. Visual comparison of UMAP plots highlighted ComBat-seq, limma, and MNN, which reduced batch effects and preserved differences between biological conditions. Most of the tested methods changed the data distribution after integration, which negatively impacts the use of parametric methods for the analysis. Two algorithms, MNN and Scanorama, gave very poor results in terms of differential analysis on gene and gene set levels. Finally, we highlight ComBat-seq as it led to the highest correlation of test statistics between reference and corrected dataset among others. Moreover, it does not distort the original distribution of gene expression data, so it can be used in all types of downstream analyses.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Contains loom files and preprocessed adata objects to compare methods for temporal gene expression integration. Loom files can be accessed using the 'read' function in Scvelo. Preprocessed adata objects can be accessed using the 'read_h5ad' function in Scanpy.
The raw single-cell RNA sequencing datasets can be found under the following accession codes.
Mouse embryonic cell cycle dataset from Ref. (https://doi.org/10.1038/nbt.3102) was originally downloaded from ArrayExpress with the accession code E-MTAB-2805
Hematopoiesis differentiation dataset from Ref. (https://doi.org/10.1182/blood-2016-05-716480) was originally downloaded from the Gene Expression Omnibus with the accession code GSE81682
NKT cell differentiation dataset from Ref. (https://doi.org/10.1038/ni.3437) was originally downloaded from the Gene Expression Omnibus with the accession code GSE74596.
Hematopoiesis differentiation dataset from Ref. (https://doi.org/10.1038/nature19348) was originally downloaded from the Gene Expression Omnibus with the accession codes GSE70236, GSE70240, GSE70244
LPS stimulation dataset from Ref. (https://doi.org/10.1016/j.cels.2017.03.010) was originally downloaded from the Gene Expression Omnibus with the accession code GSE94383.
INF-gamma stimulation dataset from Ref. (https://doi.org/10.1038/s41587-020-00803-5) was originally downloaded from the Gene Expression Omnibus with the accession code GSE161465.
AML chemotherapy dataset from Ref. (https://doi.org/10.1038/s41591-018-0233-1) was originally downloaded from the Gene Expression Omnibus with the accession code GSE116481.
AML diagnosis/relapse dataset from Ref. (https://doi.org/10.1038/s41375-021-01338-7) was originally downloaded from the Gene Expression Omnibus with the accession code GSE126068.
MS case control PBMC and CSF datasets from Ref. (https://doi.org/10.1038/s41467-019-14118-w) was originally downloaded from the Gene Expression Omnibus with the accession code GSE138266.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 4. Table S3: Skin scRNA-seq datasets selected for atlas construction.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
It is a major challenge to integrate single-cell sequencing data across experiments, conditions, batches, timepoints and other technical considerations. New computational methods are required that can integrate samples while simultaneously preserving biological information. Here, we propose an unsupervised reference-free data representation, Cluster Similarity Spectrum (CSS), where each cell is represented by its similarities to clusters independently identified across samples. We show that CSS can be used to assess cellular heterogeneity and enable reconstruction of differentiation trajectories from cerebral organoid and other single-cell transcriptomic data, and to integrate data across experimental conditions and human individuals.
The presented data set here includes 1) the seurat object of the published two-month-old human cerebral organoid scRNA-seq data (Kanton et al. 2019 Nature); 2) the single-cell RNA-seq data of cerebral organoid generated by inDrop; 3) the newly generated single-cell RNA-seq data of cerebral organoids with and without fixation conditions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 2. Table S1: Cellxgene datasets used for annotation accuracy evaluation.
There is a growing need for integration of “Big Data” into undergraduate biology curricula. Transcriptomics is one venue to examine biology from an informatics perspective. RNA sequencing has largely replaced the use of microarrays for whole genome gene expression studies. Recently, single cell RNA sequencing (scRNAseq) has unmasked population heterogeneity, offering unprecedented views into the inner workings of individual cells. scRNAseq is transforming our understanding of development, cellular identity, cell function, and disease. As a ‘Big Data,’ scRNAseq can be intimidating for students to conceptualize and analyze, yet it plays an increasingly important role in modern biology. To address these challenges, we created an engaging case study that guides students through an exploration of scRNAseq technologies. Students work in groups to explore external resources, manipulate authentic data and experience how single cell RNA transcriptomics can be used for personalized cancer treatment. This five-part case study is intended for upper-level life science majors and graduate students in genetics, bioinformatics, molecular biology, cell biology, biochemistry, biology, and medical genomics courses. The case modules can be completed sequentially, or individual parts can be separately adapted. The first module can also be used as a stand-alone exercise in an introductory biology course. Students need an intermediate mastery of Microsoft Excel but do not need programming skills. Assessment includes both students’ self-assessment of their learning as answers to previous questions are used to progress through the case study and instructor assessment of final answers. This case provides a practical exercise in the use of high-throughput data analysis to explore the molecular basis of cancer at the level of single cells.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We proposed DCCA for accurately dissecting the cellular heterogeneity on joint-profiling multi-omics data from the same individual cell by transferring representation between each other.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
scIB-E is a comprehensive deep learning-based benchmarking framework for evaluating single-cell RNA sequencing (scRNA-seq) data integration methods.
Unified Benchmarking Framework:
Refined Metrics for Intra-cell-type Variation:
Novel Loss Function:
The preprocessed datasets are available at src/data.
A significant challenge in the field of biomedicine is the development of methods to integrate the multitude of dispersed data sets into comprehensive frameworks to be used to generate optimal clinical decisions. Recent technological advances in single cell analysis allow for high-dimensional molecular characterization of cells and populations, but to date, few mathematical models have attempted to integrate measurements from the single cell scale with other data types. Here, we present a framework that actionizes static outputs from a machine learning model and leverages these as measurements of state variables in a dynamic mechanistic model of treatment response. We apply this framework to breast cancer cells to integrate single cell transcriptomic data with longitudinal population-size data. We demonstrate that the explicit inclusion of the transcriptomic information in the parameter estimation is critical for identification of the model parameters and enables accurate prediction of new treatment regimens. Inclusion of the transcriptomic data improves predictive accuracy in new treatment response dynamics with a concordance correlation coefficient (CCC) of 0.89 compared to a prediction accuracy of CCC = 0.79 without integration of the single cell RNA sequencing (scRNA-seq) data directly into the model calibration. To the best our knowledge, this is the first work that explicitly integrates single cell clonally-resolved transcriptome datasets with longitudinal treatment response data into a mechanistic mathematical model of drug resistance dynamics. We anticipate this approach to be a first step that demonstrates the feasibility of incorporating multimodal data sets into identifiable mathematical models to develop optimized treatment regimens from data. Single cell RNA-seq of MDA-MB-231 cell line with chemotherapy treatment
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single cell RNA-sequencing dataset of peripheral blood mononuclear cells (pbmc: T, B, NK and monocytes) extracted from two healthy donors.
Cells labeled as C26 come from a 30 years old female and cells labeled as C27 come from a 53 years old male. Cells have been isolated from blood using ficoll. Samples were sequenced using standard 3' v3 chemistry protocols by 10x genomics. Cellranger v4.0.0 was used for the processing, and reads were aligned to the ensembl GRCg38 human genome (GRCg38_r98-ensembl_Sept2019). QC metrics were calculated on the count matrix generated by cellranger (filtered_feature_bc_matrix). Cells with less than 3 genes per cells, less than 500 reads per cell and more than 20% of mithocondrial genes were discarded.
The processing steps was performed with the R package Seurat (https://satijalab.org/seurat/), including sample integration, data normalisation and scaling, dimensional reduction, and clustering. SCTransform method was adopted for the normalisation and scaling steps. The clustered cells were manually annotated using known cell type markers.
Files content:
- raw_dataset.csv: raw gene counts
- normalized_dataset.csv: normalized gene counts (single cell matrix)
- cell_types.csv: cell types identified from annotated cell clusters
- cell_types_macro.csv: cell macro types
- UMAP_coordinates.csv: 2d cell coordinates computed with UMAP algorithm in Seurat
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 3: Supplementary Table S3. Detailed comparison of multiple single-cell RNA-seq data visualization software.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data is used for the Seurat version of the batch correction and integration tutorial on the Galaxy Training Network. The input data was provided by Seurat in the 'Integrative Analysis in Seurat v5' tutorial. The input dataset provided here has been filtered to include only cells for which nFeature_RNA > 1000. The other datasets were produced on Galaxy. The original dataset was published as: Ding, J., Adiconis, X., Simmons, S.K. et al. Systematic comparison of single-cell and single-nucleus RNA-sequencing methods. Nat Biotechnol 38, 737–746 (2020). https://doi.org/10.1038/s41587-020-0465-8.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Spatial transcriptomics and scRNA-seq datasets used for integration and prediction of un/spliced expression for spatially measured genes using SIRV, used to infer the RNA velocity in the spatial context
This dataset accompanies the study titled "Intestinal inflammation promotes gut commensal-specific CD4 T cell to initiate molecular mimicry-mediated neuroinflammation" and contains: Processed bulk T cell receptor sequencing data, scripts for clonotype assembly and overlap analysis using MiXCR. Processed single-cell RNA-sequencing data, reference annotations, code for preprocessing, integration, and analysis, and associated table outputs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single cells from a 3D human cell-based model comprising tumor cell line-derived spheroids, cancer-associated fibroblasts and primary monocytes were dissociated and analyzed using scRNAseq. 4 monocyte donors were used in the 3D model, and 3 monocyte donors were used for 2D differentiation of macrophages.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Single-cell transcriptomics promises to revolutionize our understanding of the vasculature. Emerging computational methods applied to high dimensional single cell data allow integration of results between samples and species, and illuminate the diversity and underlying developmental and architectural organization of cell populations. Here, we illustrate these methods in analysis of mouse lymph node (LN) lymphatic endothelial cells (LEC) at single cell resolution. Clustering identifies five well-delineated subsets, including two medullary sinus subsets not recognized previously as distinct. Nearest neighbor alignments in trajectory space position the major subsets in a sequence that recapitulates known and suggests novel features of LN lymphatic organization, providing a transcriptional map of the lymphatic endothelial niches and of the transitions between them. Differences in gene expression reveal specialized programs for (1) subcapsular ceiling endothelial interactions with the capsule connective tissue and cells, (2) subcapsular floor regulation of lymph borne cell entry into the LN parenchyma and antigen presentation, and (3) medullary subset specialization for pathogen interactions and LN remodeling. LEC of the subcapsular sinus floor and medulla, which represent major sites of cell entry and exit from the LN parenchyma respectively, respond robustly to oxazolone inflammation challenge with enriched signaling pathways that converge on both innate and adaptive immune responses. Integration of mouse and human single-cell profiles reveals a conserved cross-species pattern of lymphatic vascular niches and gene expression, as well as specialized human subsets and genes unique to each species. The examples provided demonstrate the power of single-cell analysis in elucidating endothelial cell heterogeneity, vascular organization and endothelial cell responses. We discuss the findings from the perspective of LEC functions in relation to niche formations in the unique stromal and highly immunological environment of the LN.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The single-cell RNA sequencing (scRNA-seq) technology market is experiencing robust growth, projected to reach a significant market size driven by advancements in technology and increasing applications across diverse fields. The market's Compound Annual Growth Rate (CAGR) of 10.2% from 2019 to 2024, coupled with a 2025 market size of $144 million, indicates strong future potential. This growth is fueled by the technology's ability to provide unprecedented insights into cellular heterogeneity and gene expression at a single-cell level, revolutionizing biological research and clinical diagnostics. Key drivers include the rising adoption of scRNA-seq in oncology for identifying cancer subtypes and developing personalized therapies, immunology for understanding immune cell responses, and neuroscience for dissecting complex brain functions. Furthermore, ongoing technological advancements, such as the development of more efficient and cost-effective platforms, are expanding the accessibility and affordability of scRNA-seq, further fueling market expansion. The market's competitive landscape is characterized by a mix of established players like Illumina, Thermo Fisher Scientific, and 10x Genomics, along with emerging companies like Dolomite Bio and Pacific Biosciences, which are driving innovation and expanding applications. Looking ahead to 2033, the continued high CAGR suggests a substantial market expansion. The increasing demand for high-throughput scRNA-seq platforms, combined with the growing integration of bioinformatics and data analysis tools, will be crucial drivers. Challenges like data analysis complexity and the high cost of assays might somewhat restrain growth, but ongoing technological advancements are expected to mitigate these hurdles. The market segmentation, while not explicitly provided, is likely to be diverse, based on technology (e.g., microfluidic, plate-based), application (e.g., oncology, immunology, neuroscience), and end-user (e.g., academic research, pharmaceutical companies, clinical labs). Regional market share distribution will likely show a significant contribution from North America and Europe initially, followed by increasing adoption in Asia-Pacific and other emerging regions.
10X multiomic scRNA-seq on Peripheral blood monocytes from MCG016
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The power of single-cell RNA sequencing (scRNA-seq) in detecting cell heterogeneity or developmental process is becoming more and more evident every day. The granularity of this knowledge is further propelled when combining two batches of scRNA-seq into a single large dataset. This strategy is however hampered by technical differences between these batches. Typically, these batch effects are resolved by matching similar cells across the different batches. Current approaches, however, do not take into account that we can constrain this matching further as cells can also be matched on their cell type identity. We use an auto-encoder to embed two batches in the same space such that cells are matched. To accomplish this, we use a loss function that preserves: (1) cell-cell distances within each of the two batches, as well as (2) cell-cell distances between two batches when the cells are of the same cell-type. The cell-type guidance is unsupervised, i.e., a cell-type is defined as a cluster in the original batch. We evaluated the performance of our cluster-guided batch alignment (CBA) using pancreas and mouse cell atlas datasets, against six state-of-the-art single cell alignment methods: Seurat v3, BBKNN, Scanorama, Harmony, LIGER, and BERMUDA. Compared to other approaches, CBA preserves the cluster separation in the original datasets while still being able to align the two datasets. We confirm that this separation is biologically meaningful by identifying relevant differential expression of genes for these preserved clusters.