CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
scIB-E is a comprehensive deep learning-based benchmarking framework for evaluating single-cell RNA sequencing (scRNA-seq) data integration methods.
Unified Benchmarking Framework:
Refined Metrics for Intra-cell-type Variation:
Novel Loss Function:
The preprocessed datasets are available at src/data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
It is a major challenge to integrate single-cell sequencing data across experiments, conditions, batches, timepoints and other technical considerations. New computational methods are required that can integrate samples while simultaneously preserving biological information. Here, we propose an unsupervised reference-free data representation, Cluster Similarity Spectrum (CSS), where each cell is represented by its similarities to clusters independently identified across samples. We show that CSS can be used to assess cellular heterogeneity and enable reconstruction of differentiation trajectories from cerebral organoid and other single-cell transcriptomic data, and to integrate data across experimental conditions and human individuals.
The presented data set here includes 1) the seurat object of the published two-month-old human cerebral organoid scRNA-seq data (Kanton et al. 2019 Nature); 2) the single-cell RNA-seq data of cerebral organoid generated by inDrop; 3) the newly generated single-cell RNA-seq data of cerebral organoids with and without fixation conditions.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Contains loom files and preprocessed adata objects to compare methods for temporal gene expression integration. Loom files can be accessed using the 'read' function in Scvelo. Preprocessed adata objects can be accessed using the 'read_h5ad' function in Scanpy.
The raw single-cell RNA sequencing datasets can be found under the following accession codes.
Mouse embryonic cell cycle dataset from Ref. (https://doi.org/10.1038/nbt.3102) was originally downloaded from ArrayExpress with the accession code E-MTAB-2805
Hematopoiesis differentiation dataset from Ref. (https://doi.org/10.1182/blood-2016-05-716480) was originally downloaded from the Gene Expression Omnibus with the accession code GSE81682
NKT cell differentiation dataset from Ref. (https://doi.org/10.1038/ni.3437) was originally downloaded from the Gene Expression Omnibus with the accession code GSE74596.
Hematopoiesis differentiation dataset from Ref. (https://doi.org/10.1038/nature19348) was originally downloaded from the Gene Expression Omnibus with the accession codes GSE70236, GSE70240, GSE70244
LPS stimulation dataset from Ref. (https://doi.org/10.1016/j.cels.2017.03.010) was originally downloaded from the Gene Expression Omnibus with the accession code GSE94383.
INF-gamma stimulation dataset from Ref. (https://doi.org/10.1038/s41587-020-00803-5) was originally downloaded from the Gene Expression Omnibus with the accession code GSE161465.
AML chemotherapy dataset from Ref. (https://doi.org/10.1038/s41591-018-0233-1) was originally downloaded from the Gene Expression Omnibus with the accession code GSE116481.
AML diagnosis/relapse dataset from Ref. (https://doi.org/10.1038/s41375-021-01338-7) was originally downloaded from the Gene Expression Omnibus with the accession code GSE126068.
MS case control PBMC and CSF datasets from Ref. (https://doi.org/10.1038/s41467-019-14118-w) was originally downloaded from the Gene Expression Omnibus with the accession code GSE138266.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 2: Table S2. Summary of datasets used in the study.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 1: Supplementary Table S1. Detailed comparison of multiple single-cell RNA-seq data processing workflows.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository stores the data used to generate hepatocellular carcinoma analyses in the paper presenting SeuratIntegrate. It contains the scripts to reproduce the figure 1 presented in the article.
To be able to fully reproduce the results from the paper, one shoud:
remotes::install_local("path/to/SeuratIntegrate_0.4.0.tar.gz")
conda create -n SeuratIntegrate_bbknn –file SeuratIntegrate_bbknn_package-list.txt
conda create -n SeuratIntegrate_scanorama –file SeuratIntegrate_scanorama_package-list.txt
library(SeuratIntegrate)
UpdateEnvCache("bbknn", conda.env = "SeuratIntegrate_bbknn", conda.env.is.path = FALSE)
UpdateEnvCache("scanorama", conda.env = "SeuratIntegrate_scanorama", conda.env.is.path = FALSE)
Once done, the file integrate.R should produce reproducible results. Note that lines 3 to 6 from integrate.R should be adapted to the user's setup.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Quantifying or labeling the sample type with high quality is a challenging task, which is a key step for understanding complex diseases. Reducing noise pollution to data and ensuring the extracted intrinsic patterns in concordance with the primary data structure are important in sample clustering and classification. Here we propose an effective data integration framework named as HCI (High-order Correlation Integration), which takes an advantage of high-order correlation matrix incorporated with pattern fusion analysis (PFA), to realize high-dimensional data feature extraction. On the one hand, the high-order Pearson's correlation coefficient can highlight the latent patterns underlying noisy input datasets and thus improve the accuracy and robustness of the algorithms currently available for sample clustering. On the other hand, the PFA can identify intrinsic sample patterns efficiently from different input matrices by optimally adjusting the signal effects. To validate the effectiveness of our new method, we firstly applied HCI on four single-cell RNA-seq datasets to distinguish the cell types, and we found that HCI is capable of identifying the prior-known cell types of single-cell samples from scRNA-seq data with higher accuracy and robustness than other methods under different conditions. Secondly, we also integrated heterogonous omics data from TCGA datasets and GEO datasets including bulk RNA-seq data, which outperformed the other methods at identifying distinct cancer subtypes. Within an additional case study, we also constructed the mRNA-miRNA regulatory network of colorectal cancer based on the feature weight estimated from HCI, where the differentially expressed mRNAs and miRNAs were significantly enriched in well-known functional sets of colorectal cancer, such as KEGG pathways and IPA disease annotations. All these results supported that HCI has extensive flexibility and applicability on sample clustering with different types and organizations of RNA-seq data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Processed PBMC data for integration tutorial in https://github.com/rpmccordlab/SMILE.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single cell RNA-sequencing dataset of peripheral blood mononuclear cells (pbmc: T, B, NK and monocytes) extracted from two healthy donors.
Cells labeled as C26 come from a 30 years old female and cells labeled as C27 come from a 53 years old male. Cells have been isolated from blood using ficoll. Samples were sequenced using standard 3' v3 chemistry protocols by 10x genomics. Cellranger v4.0.0 was used for the processing, and reads were aligned to the ensembl GRCg38 human genome (GRCg38_r98-ensembl_Sept2019). QC metrics were calculated on the count matrix generated by cellranger (filtered_feature_bc_matrix). Cells with less than 3 genes per cells, less than 500 reads per cell and more than 20% of mithocondrial genes were discarded.
The processing steps was performed with the R package Seurat (https://satijalab.org/seurat/), including sample integration, data normalisation and scaling, dimensional reduction, and clustering. SCTransform method was adopted for the normalisation and scaling steps. The clustered cells were manually annotated using known cell type markers.
Files content:
- raw_dataset.csv: raw gene counts
- normalized_dataset.csv: normalized gene counts (single cell matrix)
- cell_types.csv: cell types identified from annotated cell clusters
- cell_types_macro.csv: cell macro types
- UMAP_coordinates.csv: 2d cell coordinates computed with UMAP algorithm in Seurat
Multi-omics datasets, including scRNA-seq, scATAC-seq, and CITE-seq, are used for integration with BiCLUM
This data collection contains spatially resolved single-cell transcriptomics datasets acquired using MERFISH on the developing human heart (13 PCW heart and 15 PCW ventricles) collected by a collaboration of the Chi Lab and the Center for Epigenomics at the University of California, San Diego.
The heart sections were imaged with 238 genes using MERFISH with a 22-bit Hamming distance 4, Hamming weight 4, binary code. The 22 bits are imaged in 11 hybridization rounds with two-color imaging in each round. This human heart panel included 238 genes for MERFISH imaging and 20 genes for sequential, two-color FISH imaging following the MERFISH run.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset repository corresponds to the project Unsupervised neural network for single cell Multi-omics INTegration (UMINT): An application to health and disease.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data is used for the Seurat version of the batch correction and integration tutorial on the Galaxy Training Network.
The input data was provided by Seurat in the 'Integrative Analysis in Seurat v5' tutorial. The input dataset provided here has been filtered to include only cells for which nFeature_RNA > 1000. The other datasets were produced on Galaxy.
The original dataset was published as: Ding, J., Adiconis, X., Simmons, S.K. et al. Systematic comparison of single-cell and single-nucleus RNA-sequencing methods. Nat Biotechnol 38, 737–746 (2020). https://doi.org/10.1038/s41587-020-0465-8.
IgA nephropathy (IgAN), the most common primary mesangial proliferative glomerulonephritis (MsPGN), represents the main cause of renal failure, while the precise pathogenetic mechanisms have not been fully determined. In this study, we employed multi-module data integration and functional experiment to explore the pathogenic programs underlying IgAN progression. Protein profiling of 21 IgAN samples showing progression and 28 samples without progression revealed that protein CXCL12, complement C3, and macrophage markers MRC1, and CD163 were negatively correlated with estimated glomerular filtration rate (eGFR) value, and poor prognosis (30% eGFR decline). Analysis of the single-cell RNA-sequencing (scRNA-seq) revealed that IgAN macrophages expressed high levels of CXCR4, PDGFB, TREM2, TNF, and complement C3, while Monocle pseudotime analysis suggested that these cells derived from the differentiation of infiltrating blood monocytes. Cross-species intercellular crosstalk analysis in human IgAN and ddY-mice IgAN model revealed that mesangial cells (MCs) in IgAN expressed high levels of CXCL12, CSF1 and PDGFRB and interacted with macrophages via the CXCL12-CXCR4, PDGFB-PDGFRB, and ITGAX/ITGAM-C3 axes. Interestingly, analysis of anti-Thy1.1 MsPGN scRNA-seq atlas revealed an inflammatory MCs (iMCs) phenotype which expressed Pdgfrb, Cxcl12, Csf1, and Il34 was associated with MsPGN injury process. Functional experiments revealed that specific blockade of the Cxcl12-Cxcr4 pathway significantly attenuated inflammatory injury, fibrosis, and decline of renal function in the MsPGN model. This study provides new insights into IgAN progression and may aid in the refinement of IgAN diagnosis and the optimization of treatment strategies.
single cell RNA-seq on OSNs from SARS-Cov2 infected hamster 100pfu - 1 dpi
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract:
The visual signal processing in the retina requires the precise organization of diverse neuronal types working in concert. While single-cell omics studies have identified more than 120 different neuronal subtypes in the mouse retina1, little is known about their spatial organization. Here, we generated the first single-cell spatial atlas of the mouse retina using multiplexed error-robust fluorescence in situ hybridization (MERFISH). We profiled over 390,000 cells and identified all major cell types and nearly all subtypes through the integration with reference single-cell RNA sequencing (scRNA-seq) data. Our spatial atlas allowed simultaneous examination of nearly all cell subtypes in the retina, revealing 8 previously unknown displaced amacrine cell subtypes and establishing the first connection between the molecular classification of many cell subtypes and their spatial arrangement. Furthermore, we identified spatially dependent differential gene expression between subtypes, suggesting the possibility of functional tuning of neuronal types based on location.
Data description: 1. VZA105a_integrated_368genes.h5ad This file contains the raw MERFISH count matrix for four samples with 368 gene features. The "sampleid" column represents the unique sample ID, while the "region" column corresponds to the tissue section ID. The "majorclass" and "subclass" columns indicate annotated retinal cell types. Finally, the "center_x" and "center_y" columns provide the coordinates of the cell centers.
VA45_integrated.h5ad This file contains the raw MERFISH count matrix for six samples with 500 gene features. The "sampleid" column represents the unique sample ID, while the "region" column corresponds to the tissue section ID. The "majorclass" and "subclass" columns indicate annotated retinal cell types. Finally, the "center_x" and "center_y" columns provide the coordinates of the cell centers.
merfish_impute.h5ad This file contains the imputed count matrix for ten samples. The "sampleid" column represents the unique sample ID, while the "region" column corresponds to the tissue section ID. The "majorclass" and "subclass" columns indicate annotated retinal cell types. Finally, the "center_x" and "center_y" columns provide the coordinates of the cell centers.
single cell RNA-seq on Monocytes of bone marrow from donor MCG002
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data used in DRIECT-NET
Unpublished single cell RNAseq data from pan-GI integration study from healthy adult donors (20-70 years old; stomach, duodenum, ileum) and control samples from preterm infants (23-31 PCW; small intestine and colon). Details for sample processing can be found in the manuscript.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.