7 datasets found

n
Data from: Large-scale integration of single-cell transcriptomic data...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+2more
zip
Updated Dec 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.t4b8gtj34
Dataset updated
Dec 14, 2021
Dataset provided by
Cornell University
Authors
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using
f
Data Sheet 1_Celline: a flexible tool for one-step retrieval and integrative...
frontiersin.figshare.com
pdf
Updated Dec 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuya Sato; Toru Asahi; Kosuke Kataoka (2025). Data Sheet 1_Celline: a flexible tool for one-step retrieval and integrative analysis of public single-cell RNA sequencing data.pdf [Dataset]. http://doi.org/10.3389/fbinf.2025.1684227.s002
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fbinf.2025.1684227.s002
Dataset updated
Dec 11, 2025
Dataset provided by
Frontiers
Authors
Yuya Sato; Toru Asahi; Kosuke Kataoka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Single-cell RNA sequencing (scRNA-seq) has generated a rapidly expanding collection of public datasets that provide insight into development, disease, and therapy. However, researchers lack an end-to-end solution for seamlessly retrieving, preprocessing, integrating, and analyzing these data because existing tools address only isolated steps and require manual curation of accessions, metadata, and technical variability, known as batch effects. In this study, we developed Celline, a Python package that executes an entire workflow using a single-line commands per step. Celline automatically gathers raw single-cell RNA-seq data from multiple public repositories and extracts metadata using large language models. It then wraps established tools, including Scrublet for doublet removal, Seurat and Scanpy for quality control and cell-type annotation, Harmony and scVI for batch correction, and Slingshot for trajectory inference, into one-line commands, enabling seamless integrative analyses. To validate Celline-acquired data quality and the integrated framework’s practical utility, we applied it to 2 mouse brain cortex datasets from embryonic days 14.5 and 18. Technical validation demonstrated that Celline successfully retrieved data, standardized metadata, and enabled standard analyses that removed low-quality cells, annotated 11 major cell types, improved integration quality (scIB score +0.22), and completed trajectory analysis. Thus, Celline transforms scattered public scRNA-seq resources into unified, analysis-ready datasets with minimal effort. Its modular design allows pipeline extension, encourages community-driven advances, and accelerates the discovery of single-cell data.
f
Table 1_Celline: a flexible tool for one-step retrieval and integrative...
frontiersin.figshare.com
xlsx
Updated Dec 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuya Sato; Toru Asahi; Kosuke Kataoka (2025). Table 1_Celline: a flexible tool for one-step retrieval and integrative analysis of public single-cell RNA sequencing data.xlsx [Dataset]. http://doi.org/10.3389/fbinf.2025.1684227.s008
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fbinf.2025.1684227.s008
Dataset updated
Dec 11, 2025
Dataset provided by
Frontiers
Authors
Yuya Sato; Toru Asahi; Kosuke Kataoka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Single-cell RNA sequencing (scRNA-seq) has generated a rapidly expanding collection of public datasets that provide insight into development, disease, and therapy. However, researchers lack an end-to-end solution for seamlessly retrieving, preprocessing, integrating, and analyzing these data because existing tools address only isolated steps and require manual curation of accessions, metadata, and technical variability, known as batch effects. In this study, we developed Celline, a Python package that executes an entire workflow using a single-line commands per step. Celline automatically gathers raw single-cell RNA-seq data from multiple public repositories and extracts metadata using large language models. It then wraps established tools, including Scrublet for doublet removal, Seurat and Scanpy for quality control and cell-type annotation, Harmony and scVI for batch correction, and Slingshot for trajectory inference, into one-line commands, enabling seamless integrative analyses. To validate Celline-acquired data quality and the integrated framework’s practical utility, we applied it to 2 mouse brain cortex datasets from embryonic days 14.5 and 18. Technical validation demonstrated that Celline successfully retrieved data, standardized metadata, and enabled standard analyses that removed low-quality cells, annotated 11 major cell types, improved integration quality (scIB score +0.22), and completed trajectory analysis. Thus, Celline transforms scattered public scRNA-seq resources into unified, analysis-ready datasets with minimal effort. Its modular design allows pipeline extension, encourages community-driven advances, and accelerates the discovery of single-cell data.
EPI-Clone supplementary dataset: Single cell RNA-seq of clonally barcoded...
figshare.com
application/gzip
Updated Nov 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lars Velten; Michael Scherer; Alejo Rodriguez-Fraticelli; Indranil Singh (2024). EPI-Clone supplementary dataset: Single cell RNA-seq of clonally barcoded hematopoietic progenitors [Dataset]. http://doi.org/10.6084/m9.figshare.24260743.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24260743.v1
Dataset updated
Nov 26, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Lars Velten; Michael Scherer; Alejo Rodriguez-Fraticelli; Indranil Singh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the dataset supporting the EPI-Clone manuscript: scRNA-seq profiling of hematopoietic stem and progenitor cells (HSPCs) was performed with the 3' 10x Genomics profiling. Three experiments are included: Two where HSCs were clonally labeled with the LARRY system, transplanted to recipient mouse and profiled 4-5 months later (post-transplant hematopoiesis), and one where HSPCs were profiled straight from an unperturbed mouse.Dataset is a seurat (v4) object with the following assays, reductions and metadata:ASSAYS:AB: Antibody expression dataRNA: RNA expression profilesintegrated: Integration of DNA methylation data performed across experimental batches with two batch correction methods: CCA (https://satijalab.org/seurat/reference/runcca) and harmony (https://portals.broadinstitute.org/harmony/articles/quickstart.html).DIMENSIONALITY REDUCTIONpca_cca: PCA performed on the integrated data (CCA integration)umap_cca: UMAP computed on the integrated data (CCA integration)umap_harmony: UMAP computed on the integrated data (Harmony integration)METADATAExperiment: The experiment that the cell is from, values are "LARRY main experiment", "LARRY replicate" and "Native hematopoiesis"ProcessingBatch: Experiments were processed in several batches.CellType: Cell type annotationLARRY: Error corrected LARRY barcodepercent.mt: percentage of mitochondrial DNAnCount_RNA: Read count for the RNA modalitynFeature_RNA: Number of RNAs with at least one readnCount_AB: Read count for the surface protein modalitynFeature_AB: Number of ABs with at least one read
CellFuse enables Multi-modal Integration of Single-cell and Spatial...
zenodo.org
zip
Updated Dec 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhishek Koladiya; Abhishek Koladiya (2025). CellFuse enables Multi-modal Integration of Single-cell and Spatial Proteomics Data for Systems-level Analysis in Cancer [Dataset]. http://doi.org/10.5281/zenodo.18088974
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.18088974
Dataset updated
Dec 29, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Abhishek Koladiya; Abhishek Koladiya
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repo contains code and data to reproduce CellFuse manuscript's figure. As a starter install CellFuse pacakges from https://github.com/karadavis-lab/CellFuse and then download this repo.

Fig 2 Bone marrow (Fig 2A, C, D, E, I, Supplementary Fig 1 and 2)

Fig2/BM/Reference/ Fig2_BM_prepare_data.R: Prepare bone marrow for CellFuse

Fig2/BM/ BM_CellFuse_Integration.R: Run CellFuse

Fig2/BM/BM_Running_Benchmark_Methods.R: Run benchmarking methods (Harmony, Seurat, FastMNN)

Fig2/BM/BM_scVI_scnorama.ipynb: Run scanorama and scVI

Fig2/BM/BM_scIB.ipynb: Evaluate methods using scIB and save results

Fig2/BM/BM_Data_visualisation.R: tSNE visualization

Fig2/BM/Sequential_Feature_drop/Prepare_data.R: Prepare data for evaluating sequential feature drop

Fig2/BM/Sequential_Feature_drop/ Run_FastMNN_Seurat_Harmony.R: Run CellFuse, Harmony, Seurat and FastMNN for sequential feature drop

Fig2/BM/Sequential_Feature_drop/ BM_scVI_scnorama_feature_drop.ipynb: Run scVI and Scanorama for sequential feature drop

Fig2/BM/Sequential_Feature_drop/ BM_scIB_feature_drop.ipynb: Evaluate feature dropping methods using scIB and save results

Fig2/BM/Sequential_Feature_drop/ BM_scIB_Data_viz.R: visualize scIB results PBMC (Fig 2B,F,G, H, Supplementary Fig: 3 and 4)

Fig2/PBMC/Reference/ Fig2_PBMC_prepare_data.R: Prepare PBMC data for CellFuse

Fig2/ PBMC / PBMC_CellFuse_Integration.R: Run CellFuse

Fig2/ PBMC /PBMC_Running_Benchmark_Methods.R: Run benchmarking methods (Harmony, Seurat, FastMNN)

Fig2/PBMC/PBMC_scVI_scnorama_feature_drop.ipynb: Run scVI and Scanorama

Fig2/PBMC/PBMC_scIB.ipynb: Evaluate methods using scIB and save results

Fig2/PBMC/PBMC_Data_visualisation.R: tSNE visualization

Fig2/ PBMC/ RunTime_benchmark/ Prepare_data.R: Prepare data

Fig2/ PBMC/ RunTime_benchmark/ run_all_methods.txt.R: This file contain info how to run time and memory usage for each method. This file requires following files: a. cellfuse_run_measure.R b. fastmnn_run_measure.R c. seurat_run_measure.R d. harmony_run_measure.R e. scanorama_runtime.py f. scvi_scanvi_runtime.py

Fig2/ PBMC/ RunTime_benchmark/ Runtime_Data_viz.R: Visualize runtime and memory usage data

Fig 3 Good et al. CART: Fig 3A-F and Supplementary Fig 5, 6A and B

Fig3/ Good_et_al/Reference/ Fig3_CyTOF_prepare_data.R: Prepare CyTOF and CITE-Seq data for CellFuse

Fig3/ Good_et_al/CellFuse_Integration_CyTOF.R: Run CellFuse to remove batch effect and integrate CyTOF data from day 7 post-infusion

Fig3/ Good_et_al/CellFuse_Integration_CITESeq.R: Run CellFuse to integrate CyTOF and CITE-Seq data

Fig3/ Good_et_al/CART_Data_visualisation.R: Visualize data

Fig 3 Domizi et al. CART: Fig 3G and H and Supplementary Fig 6C

Fig3/Domizi_et_al/ Data_Analysis.R: this file contains all code for prepaprocessing, CellFuse run and data visualization

Fig 4 HuBMAP CODEX data (Fig. 4A, B, C, D and Supplementary Fig 7)

Fig4/CODEX_colorectal/Reference/ CODEX_HuBMAP_prepare_data.R: Prepare CODEX data from annotated and unannotated donor

Fig4/ CODEX_colorectal/ CODEX_HuBMAP_CellFuse_Predict.R: Run CellFuse on cells from from annotated and unannotated donor

Fig4/ CODEX_colorectal/CODEX_HuBMAP_Data_visualisation.R: Visualize data and prepare figures.

Fig4/ CODEX_colorectal/ Benchmarking/Astir/Astrir.ipynb: Run Astir

Fig4/ CODEX_colorectal/ Benchmarking/SpatialAnno.R: run SpatialAnno

Fig4/ CODEX_colorectal/ CODEX_HuBMAP_Benchmark.R: Benchmarking CellFuse against CELESTA, SVM, SpatialAnno, Astir and Seurat using cells from annotated donors and prepare figures.

Fig4/ CODEX_colorectal/CODEX_HuBMAP_Suppl_figure_heatmap.R: F1score calculation per celltype per Benchmarking methods and heatmap comparing celltypes from annotated and unannotated donors (Supplementary Fig 7) IMC Breast cancer data (Fig. 4E,F, G and Supplementary Fig 7)

Fig4/ IMC_Breast_Cancer/ IMC_prepare_data.R: Prepare CODEX data from annotated and unannotated donor

Fig4/ IMC_Breast_Cancer/ IMC_CellFuse_Predict.R: Run CellFuse to predict cell types

Fig4/ IMC_Breast_Cancer/ IMC_dat_visualization.R: Visualize data and prepare figures.

Fig4/ IMC_Breast_Cancer/ Suppl_Per_Patient_Confusion_Matrix.R: Suppl. Fig8

Fig4/ IMC_Breast_Cancer/ Benchmark_random_split.R: Suppl. Fig 9B

Fig4/ Concordance.R: Spatial concordance analysis for IMC and CODEX data

Fig 5

Fig5/ Reference/ Fig5_CyTOF_Data_prep.R: Prepare CyTOF data from healthy PBMC and healthy colon single cells

Fig5/ MIBI_CellFuse_Predict.R: Run CellFuse to predicte cells from colon cancer patients

Fig5/ MIBI_PostPrediction.R: Visualize data and prepare figures

Fig5/ Predicted_Data/ mask_generation.ipynb: Post CellFuse prediction annotated cell types in segmented images. This will generate Fig5C and D
Tubuloid kidney organoid - single cell RNA-seq
figshare.com
tar
Updated May 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Javier Perales Patón; Rafael Kramann (2022). Tubuloid kidney organoid - single cell RNA-seq [Dataset]. http://doi.org/10.6084/m9.figshare.11786238.v1
Explore at:
tarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11786238.v1
Dataset updated
May 16, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Javier Perales Patón; Rafael Kramann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
It is included data derived from the processing of single-cell and single-nuclei RNA-seq from several samples (see below). This data corresponds to the input and intermediate output files from https://github.com/saezlab/Xu_tubuloid . Data The data include:

Binary sparse matrices for the UMI gene expression quantification from cellranger (filtered feature-barcode matrices). These are TAR archive files named with the name of the sample. Seurat Objects with normalized data, embeddings of dimensionality reduction, clustering and cell cluster annotation. These are TAR archive files including final objects, grouped by sample type: SeuratObjects_[SortedCells | Organoids | Human Kidney Tissue]. The HumanKidneyTissue also includes the SeuratObject after Harmony integration. Exported barcode idents from unsupervised clustering and manual annotation ("barcodeIdents*.csv" files). Label transfer via Symphony mapping to tubuloid cells from each organoid to a integrated reference atlas of human kidney tissue (SymphonyMapped*.csv).

Samples The data corresponds to the following samples, which were profiled at the single-cell resolution:

CK5 early organoid (Healthy). Organoid generated from CD24+ sorted cells from human adult kidney tissue at an early stage. CK119 late organoid (Healthy). Organoid generated from CD24+ sorted cells from human adult kidney tissue at a late stage.

JX1 late organoid (Healthy). Organoid generated following Hans Clever's protocol for kidney organoids. JX2 PKD1-KO organoid (PKD). Organoid generated from CD24+ sorted cells from human adult kidney tissue, for which PKD1 was gene-edited to reproduce PKD phenotype, developed at a late stage. JX3 PKD2-KO organoid (PKD). Organoid generated from CD24+ sorted cells from human adult kidney tissue, for which PKD2 was gene-edited to reproduce PKD phenotype, developed at a late stage. CK120 CD13. CD13+ sorted cells from human adult kidney tissue. CK121 CD24. CD24+ sorted cells from human adult kidney tissue.

In addition, human adult kidney tissue were profiled in the context of ADPKD:

CK224 : human specimen with ADPKD (PKD2- genotype).

CK225 : human specimen with ADPKD (PKD1- genotype). ADPKD3: human specimen with ADPKD (ND genotype).

Control1 : human specimen with healthy tissue. Control2 : human specimen with healthy tissue.
Table 1_Single-cell/spatial integration reveals an MES2-like glioblastoma...
frontiersin.figshare.com
docx
Updated Oct 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chonghui Zhang; Lu Tan; Kaijian Zheng; Yifan Xu; Junshan Wan; Jinpeng Wu; Chao Wang; Pin Guo; Yugong Feng (2025). Table 1_Single-cell/spatial integration reveals an MES2-like glioblastoma program orchestrated by immune communication and regulatory networks.docx [Dataset]. http://doi.org/10.3389/fimmu.2025.1699134.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fimmu.2025.1699134.s001
Dataset updated
Oct 29, 2025
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Chonghui Zhang; Lu Tan; Kaijian Zheng; Yifan Xu; Junshan Wan; Jinpeng Wu; Chao Wang; Pin Guo; Yugong Feng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundGlioblastoma (GBM) exhibits marked plasticity and intense microenvironmental crosstalk. We aimed to delineate mesenchymal programs with spatial resolution, clinical relevance, and mechanistic anchors.MethodsWe integrated single-cell RNA-seq, bulk transcriptomes, and Visium spatial data. After rigorous QC and Harmony integration, we annotated 12 cell states using canonical markers, decoupler-based ORA, and AUCell. Tumor boundaries were defined by inferCNV/CopyKAT; developmental potential by CytoTRACE2 and PHATE. Post-translational modification (PTM) axes were scored from curated gene sets. A cell type-aware GNN linked bulk expression to a patient-similarity graph for survival modeling and gene-level hazard attribution. Network convergence combined bulk WGCNA (TCGA/CGGA), single-cell hdWGCNA, BayesPrism deconvolution, and external GEO validation. Ligand–receptor (LR) signaling was inferred with LIANA+, embedded in a signed causal network, and mapped spatially. ARRDC3 expression was assessed in GBM tissues; U251 gain- and loss-of-function assays evaluated proliferation and migration.ResultsWe resolved major GBM states, including two mesenchymal programs (MES1-like, MES2-like). CNV-high regions marked malignant cores, and CytoTRACE2 identified high-potency niches within MES2-like and Proliferation states along non-linear trajectories. PTM landscapes segregated by state; S-nitrosylation, glycosylation, and lactylation were enriched in mesenchymal programs. A GNN risk score stratified overall survival in TCGA (n=157) and generalized to CGGA-325 (n=85) and CGGA-693 (n=140). MES2-like abundance remained an independent adverse predictor (HR = 2.31; 95% CI, 1.04–5.10). MES2-high tumors upregulated EMT, TNFα/NF-κB, JAK/STAT, hypoxia, angiogenesis, and glycolysis; S-nitrosylation associated with increased hazard. Cross-modal convergence defined a conservative MES2 core enriched for ECM remodeling, collagen modification, focal adhesion, and TGF-β regulation. LR analysis prioritized a TAM-to-MES2 axis (e.g., GRN–TNFRSF1A, ADAM9/10/17–ITGB1, TGFB1–ITGB1/EGFR) converging on a CEBPD-centered module. Spatial mapping localized MES2 hotspots within CNV-defined territories and revealed a TNFRSF1A–CEBPD–ARRDC3 focus at an infiltrative rim. ARRDC3 was upregulated in GBM tissues; in U251 cells, knockdown promoted and overexpression suppressed proliferation and migration, indicating context-dependent roles.ConclusionsMES2-like GBM is an ECM-driven, stress-adapted state with strong prognostic impact. We nominate CEBPD and TNFRSF1A/ITGB1 as actionable nodes and identify ARRDC3 as a spatially restricted effector with context-dependent tumor-modulatory functions warranting therapeutic exploration.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34

Data from: Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.t4b8gtj34

Dataset updated

Dec 14, 2021

Dataset provided by

Cornell University

Authors

David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove

License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using

Clear search

Close search

Google apps

Main menu

Data from: Large-scale integration of single-cell transcriptomic data...

Data Sheet 1_Celline: a flexible tool for one-step retrieval and integrative...

Table 1_Celline: a flexible tool for one-step retrieval and integrative...

EPI-Clone supplementary dataset: Single cell RNA-seq of clonally barcoded...

CellFuse enables Multi-modal Integration of Single-cell and Spatial...

Tubuloid kidney organoid - single cell RNA-seq

Table 1_Single-cell/spatial integration reveals an MES2-like glioblastoma...

Data from: Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration