9 datasets found
  1. f

    Data_Sheet_1_CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq.PDF...

    • frontiersin.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenbo Yu; Ahmed Mahfouz; Marcel J. T. Reinders (2023). Data_Sheet_1_CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.644211.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Wenbo Yu; Ahmed Mahfouz; Marcel J. T. Reinders
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The power of single-cell RNA sequencing (scRNA-seq) in detecting cell heterogeneity or developmental process is becoming more and more evident every day. The granularity of this knowledge is further propelled when combining two batches of scRNA-seq into a single large dataset. This strategy is however hampered by technical differences between these batches. Typically, these batch effects are resolved by matching similar cells across the different batches. Current approaches, however, do not take into account that we can constrain this matching further as cells can also be matched on their cell type identity. We use an auto-encoder to embed two batches in the same space such that cells are matched. To accomplish this, we use a loss function that preserves: (1) cell-cell distances within each of the two batches, as well as (2) cell-cell distances between two batches when the cells are of the same cell-type. The cell-type guidance is unsupervised, i.e., a cell-type is defined as a cluster in the original batch. We evaluated the performance of our cluster-guided batch alignment (CBA) using pancreas and mouse cell atlas datasets, against six state-of-the-art single cell alignment methods: Seurat v3, BBKNN, Scanorama, Harmony, LIGER, and BERMUDA. Compared to other approaches, CBA preserves the cluster separation in the original datasets while still being able to align the two datasets. We confirm that this separation is biologically meaningful by identifying relevant differential expression of genes for these preserved clusters.

  2. n

    Data from: Large-scale integration of single-cell transcriptomic data...

    • data.niaid.nih.gov
    • dataone.org
    • +1more
    zip
    Updated Dec 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 14, 2021
    Dataset provided by
    Cornell University
    Authors
    David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

    Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

    Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

    Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

    Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

    Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

    Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

    Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

    Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

    Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using

  3. EPI-Clone supplementary dataset: Single cell RNA-seq of clonally barcoded...

    • figshare.com
    application/gzip
    Updated Nov 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lars Velten; Michael Scherer; Alejo Rodriguez-Fraticelli; Indranil Singh (2024). EPI-Clone supplementary dataset: Single cell RNA-seq of clonally barcoded hematopoietic progenitors [Dataset]. http://doi.org/10.6084/m9.figshare.24260743.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Nov 26, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Lars Velten; Michael Scherer; Alejo Rodriguez-Fraticelli; Indranil Singh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset supporting the EPI-Clone manuscript: scRNA-seq profiling of hematopoietic stem and progenitor cells (HSPCs) was performed with the 3' 10x Genomics profiling. Three experiments are included: Two where HSCs were clonally labeled with the LARRY system, transplanted to recipient mouse and profiled 4-5 months later (post-transplant hematopoiesis), and one where HSPCs were profiled straight from an unperturbed mouse.Dataset is a seurat (v4) object with the following assays, reductions and metadata:ASSAYS:AB: Antibody expression dataRNA: RNA expression profilesintegrated: Integration of DNA methylation data performed across experimental batches with two batch correction methods: CCA (https://satijalab.org/seurat/reference/runcca) and harmony (https://portals.broadinstitute.org/harmony/articles/quickstart.html).DIMENSIONALITY REDUCTIONpca_cca: PCA performed on the integrated data (CCA integration)umap_cca: UMAP computed on the integrated data (CCA integration)umap_harmony: UMAP computed on the integrated data (Harmony integration)METADATAExperiment: The experiment that the cell is from, values are "LARRY main experiment", "LARRY replicate" and "Native hematopoiesis"ProcessingBatch: Experiments were processed in several batches.CellType: Cell type annotationLARRY: Error corrected LARRY barcodepercent.mt: percentage of mitochondrial DNAnCount_RNA: Read count for the RNA modalitynFeature_RNA: Number of RNAs with at least one readnCount_AB: Read count for the surface protein modalitynFeature_AB: Number of ABs with at least one read

  4. Data from: CellFuse enables multi-modal integration of single-cell and...

    • zenodo.org
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhishek Koladiya; Abhishek Koladiya (2025). CellFuse enables multi-modal integration of single-cell and spatial proteomics data [Dataset]. http://doi.org/10.5281/zenodo.15858358
    Explore at:
    Dataset updated
    Jul 17, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Abhishek Koladiya; Abhishek Koladiya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 19, 2025
    Description

    Fig 2

    Bone marrow (Fig 2B, D, E, F, H, Supplementary Fig 1A, 2,3)

    1. Fig 2/BM/Reference/ Fig2_BM_prepare_data.R: Prepare bone marrow for CellFuse

    2. Fig 2/BM/ BM_CellFuse_Integration.R: Run CellFuse

    3. Fig 2/BM/BM_Running_Benchmark_Methods.R: Run benchmarking methods (Harmony, Seurat, FastMNN)

    4. Fig 2/BM/BM_scIB_Benchmarking.ipynb: evaluate performance of CellFuse and other benchmarking methods using scIB framework proposed by Luecken et al.

    5. Fig 2/BM/ BM_scIB_prepare_figures.R: Visualize results of scIB framework

    6. Fig 2/BM/Sequential_Feature_drop/Prepare_data.R: Prepare data for evaluating sequential feature drop

    7. Fig 2/BM/Sequential_Feature_drop/Run_methods.R: Run CellFuse, Harmony, Seurat and FastMNN for sequential feature drop

    8. Fig 2/BM/Sequential_Feature_drop/Evaluate_results.R: Evaluate results features drop and visualize data.

    PBMC (Fig 2G,I, Supplementary Fig 1B and 4)

    1. Fig 2/PBMC/Reference/ Fig2_PBMC_prepare_data.R: Prepare PBMC data for CellFuse

    2. Fig 2/ PBMC / PBMC_CellFuse_Integration.R: Run CellFuse

    3. Fig 2/ PBMC /PBMC_Running_Benchmark_Methods.R: Run benchmarking methods (Harmony, Seurat, FastMNN)

    4. Fig 2/ PBMC /PBMC_scIB_Benchmarking.ipynb: evaluate performace of CellFuse and other benchmarking methods using scIB framework proposed by Luecken et al., 2021

    5. Fig 2/ PBMC /PBMC_scIB_prepare_figures.R: Visualize results of scIB framework

    6. Fig 2/ PBMC/ RunTime_benchmark/Run_Benchmark.R: Prepare data, run benchmarking method and evaluate results.

    Fig 3 and Supplementary Fig 5

    1. Fig 3/Reference/ Fig3_CyTOF_prepare_data.R: Prepare CyTOF and CITE-Seq data for CellFuse

    2. Fig 3/CellFuse_Integration_CyTOF.R: Run CellFuse to remove batch effect and integrate CyTOF data from day 7 post-infusion

    3. Fig 3/CellFuse_Integration_CITESeq.R: Run CellFuse to integrate CyTOF and CITE-Seq data

    4. Fig 3/CART_Data_visualisation.R: Visualize data

    Fig 4

    HuBMAP CODEX data (Fig. 4A, B, C, D and Supplementary Fig 6)

    1. Fig 4/CODEX_colorectal/Reference/ CODEX_HuBMAP_prepare_data.R: Prepare CODEX data from annotated and unannotated donor

    2. Fig 4/ CODEX_colorectal/ CODEX_HuBMAP_CellFuse_Predict.R: Run CellFuse on cells from from annotated and unannotated donor

    3. Fig 4/ CODEX_colorectal/CODEX_HuBMAP_Data_visualisation.R: Visualize data and prepare figures.

    4. Fig 4/ CODEX_colorectal/ CODEX_HuBMAP_Benchmark.R: Benchmarking CellFuse against CELESTA, SVM and Seurat using cells from annotated donors and prepare figures.

    a. Astir is python package so run following python notebook: Fig 4/ CODEX_colorectal/ Benchmarking/Astir/Astrir.ipynb

    5. Fig 4/ CODEX_colorectal/CODEX_HuBMAP_Suppl_figure_heatmap.R: F1score calculation per celltype per Benchmarking methods and heatmap comparing celltypes from annotated and unannotated donors (Supplementary Fig 6)

    IMC Breast cancer data (Fig. 4E,F, G and Supplementary Fig 7)

    1. Fig 4/ IMC_Breast_Cancer/ IMC_prepare_data.R: Prepare CODEX data from annotated and unannotated donor

    2. Fig 4/ IMC_Breast_Cancer/ IMC_CellFuse_Predict.R: Run CellFuse to predict cell types

    3. Fig 4/ IMC_Breast_Cancer/ IMC_dat_visualization.R: Visualize data and prepare figures.

    Fig 5

    1. Fig5/ Reference/ Fig5_CyTOF_Data_prep.R: Prepare CyTOF data from healthy PBMC and healthy colon single cells

    2. Fig5/ MIBI_CellFuse_Predict.R: Run CellFuse to predicte cells from colon cancer patients

    3. Fig5/ MIBI_PostPrediction.R: Visualize data and prepare figures

    4. Fig5/ Predicted_Data/ mask_generation.ipynb: Post CellFuse prediction annotated cell types in segmented images. This will generate Fig5C and D

  5. f

    Tubuloid kidney organoid - single cell RNA-seq

    • figshare.com
    tar
    Updated May 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Javier Perales Patón; Rafael Kramann (2022). Tubuloid kidney organoid - single cell RNA-seq [Dataset]. http://doi.org/10.6084/m9.figshare.11786238.v1
    Explore at:
    tarAvailable download formats
    Dataset updated
    May 16, 2022
    Dataset provided by
    figshare
    Authors
    Javier Perales Patón; Rafael Kramann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    It is included data derived from the processing of single-cell and single-nuclei RNA-seq from several samples (see below). This data corresponds to the input and intermediate output files from https://github.com/saezlab/Xu_tubuloid . Data The data include:

    Binary sparse matrices for the UMI gene expression quantification from cellranger (filtered feature-barcode matrices). These are TAR archive files named with the name of the sample. Seurat Objects with normalized data, embeddings of dimensionality reduction, clustering and cell cluster annotation. These are TAR archive files including final objects, grouped by sample type: SeuratObjects_[SortedCells | Organoids | Human Kidney Tissue]. The HumanKidneyTissue also includes the SeuratObject after Harmony integration. Exported barcode idents from unsupervised clustering and manual annotation ("barcodeIdents*.csv" files). Label transfer via Symphony mapping to tubuloid cells from each organoid to a integrated reference atlas of human kidney tissue (SymphonyMapped*.csv).

    Samples The data corresponds to the following samples, which were profiled at the single-cell resolution:

    CK5 early organoid (Healthy). Organoid generated from CD24+ sorted cells from human adult kidney tissue at an early stage. CK119 late organoid (Healthy). Organoid generated from CD24+ sorted cells from human adult kidney tissue at a late stage.

    JX1 late organoid (Healthy). Organoid generated following Hans Clever's protocol for kidney organoids. JX2 PKD1-KO organoid (PKD). Organoid generated from CD24+ sorted cells from human adult kidney tissue, for which PKD1 was gene-edited to reproduce PKD phenotype, developed at a late stage. JX3 PKD2-KO organoid (PKD). Organoid generated from CD24+ sorted cells from human adult kidney tissue, for which PKD2 was gene-edited to reproduce PKD phenotype, developed at a late stage. CK120 CD13. CD13+ sorted cells from human adult kidney tissue. CK121 CD24. CD24+ sorted cells from human adult kidney tissue.

    In addition, human adult kidney tissue were profiled in the context of ADPKD:

    CK224 : human specimen with ADPKD (PKD2- genotype).

    CK225 : human specimen with ADPKD (PKD1- genotype). ADPKD3: human specimen with ADPKD (ND genotype).

    Control1 : human specimen with healthy tissue. Control2 : human specimen with healthy tissue.

  6. Analysis Products: Transcription factor stoichiometry, motif affinity and...

    • zenodo.org
    tsv, zip
    Updated Nov 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Surag Nair; Surag Nair; Mohamed Ameen; Kevin Wang; Kevin Wang; Anshul Kundaje; Anshul Kundaje; Mohamed Ameen (2023). Analysis Products: Transcription factor stoichiometry, motif affinity and syntax regulate single cell chromatin dynamics during fibroblast reprogramming to pluripotency [Dataset]. http://doi.org/10.5281/zenodo.8313962
    Explore at:
    zip, tsvAvailable download formats
    Dataset updated
    Nov 11, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Surag Nair; Surag Nair; Mohamed Ameen; Kevin Wang; Kevin Wang; Anshul Kundaje; Anshul Kundaje; Mohamed Ameen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This record contains analysis products for the paper "Transcription factor stoichiometry, motif affinity and syntax regulate single cell chromatin dynamics during fibroblast reprogramming to pluripotency" by Nair, Ameen et al. Please refer to the READMEs in the directories, which are summarized below.

    The record contains the following files:

    `clusters.tsv`: contains the cluster id, name and colour of clusters in the paper

    scATAC.zip

    Analysis products for the single-cell ATAC-seq data. Contains:

    - `cells.tsv`: list of barcodes that pass QC. Columns include:
    - `barcode`
    - `sample`: (time point)
    - `umap1`
    - `umap2`
    - `cluster`
    - `dpt_pseudotime_fibr_root`: pseudotime values treating a fibroblast cell as root
    - `dpt_pseudotime_xOSK_root`: pseudotime values treating xOSK cell as root
    - `peaks.bed`: list of peaks of 500bp across all cell states. 4th column contains the peak set label. Note that ~5000 peaks are not assigned to any peak set and are marked as NA.
    - `features.tsv`: 50 dimensional representation of each cell
    - `cell_x_peak.mtx.gz`: sparse matrix of fragment counts within peaks. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (combine sample + barcode). Rows correspond to peaks in `peaks.bed`

    scATAC_clusters.zip

    Analysis products corresponding to cluster pseudo-bulks of the single-cell ATAC-seq data.

    - `clusters.tsv`: contains the cluster id, name and colour used in the paper
    - `peaks`: contains `overlap_reproducibilty/overlap.optimal_peak` peaks called using ENCODE bulk ATAC-seq pipeline in the narrowPeak format.
    - `fragments`: contains per cluster fragment files

    scATAC_scRNA_integration.zip

    Analysis products from the integration of scATAC with scRNA. Contains:

    - `peak_gene_links_fdr1e-4.tsv`: file with peak gene links passing FDR 1e-4. For analyses in the paper, we filter to peaks with absolute correlation >0.45.
    - `harmony.cca.30.feat.tsv`: 30 dimensional co-embedding for scATAC and scRNA cells obtained by CCA followed by applying Harmony over assay type.
    - `harmony.cca.metadata.tsv`: UMAP coordinates for scATAC and scRNA cells derived from the Harmony CCA embedding. First column contains barcode.

    scRNA.zip

    Analysis products for the single-cell RNA-seq data. Contains:

    - `seurat.rds`: seurat object that contains expression data (raw counts, normalized, and scaled), reductions (umap, pca), knn graphs, all associated metadata. Note that barcode suffix (1-9 corresponds to samples D0, D2, ..., D14, iPSC)
    - `genes.txt`: list of all genes
    - `cells.tsv`: list of barcodes that pass QC across samples. Contains:
    - `barcode_sample`: barcode with index of sample (1-9 corresponding to D0, D2, ..., D14, iPSC)
    - `sample`: sample name (D0, D2, .., D14, iPSC)
    - `umap1`
    - `umap2`
    - `nCount_RNA`
    - `nFeature_RNA`
    - `cluster`
    - `percent.mt`: percent of mitochondrial transcripts in cell
    - `percent.oskm`: percent of OSKM transcripts in cell
    - `gene_x_cell.mtx.gz`: sparse matrix of gene counts. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (barcode suffix contains sample information). Rows correspond to genes in `genes.txt`
    - `pca.tsv`: first 50 PC of each cell
    - `oskm_endo_sendai.tsv`: estimated raw counts (cts, may not be integers) and log(1+ tp10k) normalized expression (norm) for endogenous and exogenous (Sendai derived) counts of POU5F1 (OCT4), SOX2, KLF4 and MYC genes. Rows are consistent with `seurat.rds` and `cells.tsv`

    multiome.zip

    multiome/snATAC:

    These files are derived from the integration of nuclei from multiome (D1M and D2M), with cells from day 2 of scATAC-seq (labeled D2).

    - `cells.tsv`: This is the list of nuclei barcodes that pass QC from multiome AND also cell barcodes from D2 of scATAC-seq. Includes:
    - `barcode`
    - `umap1`: These are the coordinates used for the figures involving multiome in the paper.
    - `umap2`: ^^^
    - `sample`: D1M and D2M correspond to multiome, D2 corresponds to day 2 of scATAC-seq
    - `cluster`: For multiome barcodes, these are labels transfered from scATAC-seq. For D2 scATAC-seq, it is the original cluster labels.
    - `peaks.bed`: This is the same file as scATAC/peaks.bed. List of peaks of 500bp. 4th column contains the peak set label. Note that ~5000 peaks are not assigned to any peak set and are marked as NA.
    - `cell_x_peak.mtx.gz`: sparse matrix of fragment counts within peaks. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (combine sample + barcode). Rows correspond to peaks in `peaks.bed`.
    - `features.no.harmony.50d.tsv`: 50 dimensional representation of each cell prior to running Harmony (to correct for batch effect between D2 scATAC and D1M,D2M snMultiome). Rows correspond to cells from `cells.tsv`.
    - `features.harmony.10d.tsv`: 10 dimensional representation of each cell after running Harmony. Rows correspond to cells from `cells.tsv`.

    multiome/snRNA:

    - `seurat.rds`: seurat object that contains expression data (raw counts, normalized, and scaled), reductions (umap, pca),associated metadata. Note that barcode suffix (1,2 corresponds to samples D1M, D2M). Please use the UMAP/features from snATAC/ for consistency.
    - `genes.txt`: list of all genes (this is different from the list in scRNA analysis)
    - `cells.tsv`: list of barcodes that pass QC across samples. Contains:
    - `barcode_sample`: barcode with index of sample (1,2 corresponding to D1M, D2M respectively)
    - `sample`: sample name (D1M, D2M)
    - `nCount_RNA`
    - `nFeature_RNA`
    - `percent.oskm`: percent of OSKM genes in cell
    - `gene_x_cell.mtx.gz`: sparse matrix of gene counts. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (barcode suffix contains sample information). Rows correspond to genes in `genes.txt`

  7. f

    DataSheet_1_Molecular mechanisms regulating natural menopause in the female...

    • frontiersin.figshare.com
    xlsx
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Quan Liu; Fangqin Wei; Jiannan Wang; Haiyan Liu; Hua Zhang; Min Liu; Kaili Liu; Zheng Ye (2023). DataSheet_1_Molecular mechanisms regulating natural menopause in the female ovary: a study based on transcriptomic data.xlsx [Dataset]. http://doi.org/10.3389/fendo.2023.1004245.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Frontiers
    Authors
    Quan Liu; Fangqin Wei; Jiannan Wang; Haiyan Liu; Hua Zhang; Min Liu; Kaili Liu; Zheng Ye
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionNatural menopause is an inevitable biological process with significant implications for women's health. However, the molecular mechanisms underlying menopause are not well understood. This study aimed to investigate the molecular and cellular changes occurring in the ovary before and after perimenopause.MethodsSingle-cell sequencing data from the GTEx V8 cohort (30-39: 14 individuals; 40-49: 37 individuals; 50-59: 61 individuals) and transcriptome sequencing data from ovarian tissue were analyzed. Seurat was used for single-cell sequencing data analysis, while harmony was employed for data integration. Cell differentiation trajectories were inferred using CytoTrace. CIBERSORTX assessed cell infiltration scores in ovarian tissue. WGCNA evaluated co-expression network characteristics in pre- and post-perimenopausal ovarian tissue. Functional enrichment analysis of co-expression modules was conducted using ClusterprofileR and Metascape. DESeq2 performed differential expression analysis. Master regulator analysis and signaling pathway activity analysis were carried out using MsViper and Progeny, respectively. Machine learning models were constructed using Orange3.ResultsWe identified the differentiation trajectory of follicular cells in the ovary as ARID5B+ Granulosa -> JUN+ Granulosa -> KRT18+ Granulosa -> MT-CO2+ Granulosa -> GSTA1+ Granulosa -> HMGB1+ Granulosa. Genes driving Granulosa differentiation, including RBP1, TMSB10, SERPINE2, and TMSB4X, were enriched in ATP-dependent activity regulation pathways. Genes involved in maintaining the Granulosa state, such as DCN, ARID5B, EIF1, and HSP90AB1, were enriched in the response to unfolded protein and chaperone-mediated protein complex assembly pathways. Increased contents of terminally differentiated HMGB1+ Granulosa and GSTA1+ Granulosa were observed in the ovaries of individuals aged 50-69. Signaling pathway activity analysis indicated a gradual decrease in TGFb and MAPK pathway activity with menopause progression, while p53 pathway activity increased. Master regulator analysis revealed significant activation of transcription factors FOXR1, OTX2, MYBL2, HNF1A, and FOXN4 in the 30-39 age group, and GLI1, SMAD1, SMAD7, APP, and EGR1 in the 40-49 age group. Additionally, a diagnostic model based on 16 transcription factors (Logistic Regression L2) achieved reliable performance in determining ovarian status before and after perimenopause.ConclusionThis study provides insights into the molecular and cellular mechanisms underlying natural menopause in the ovary. The findings contribute to our understanding of perimenopausal changes and offer a foundation for health management strategies for women during this transition.

  8. Visium Spatial and snRNA data of Brain section from Parkinson Mouse Model...

    • zenodo.org
    bin, csv, zip
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jaehyun Lee; Jaehyun Lee (2025). Visium Spatial and snRNA data of Brain section from Parkinson Mouse Model based on inducible expression of human a-syn constructs: 20-months + snRNA 23 months dataset [Dataset]. http://doi.org/10.5281/zenodo.14988055
    Explore at:
    csv, bin, zipAvailable download formats
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jaehyun Lee; Jaehyun Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Using 23-months old mice of a inducible expression of human a-syn constructs based Parkinson mouse model, we produced a single nucleus RNA dataset by cutting 0mm Bregma to -5mm Bregma. The Chromium 3’ Single Cell Library Kit (10x Genomics) was used and Sequencing was performed on a NovaSeq 6000. From the same model we also used 20-months old mice with the Visium Spatial V1 platform (10x Genomics). Sequencing was performed on a NovaSeq 6000. Both were PE150.

    snRNA pipeline: For the alignment of reads, a custom reference was created by adding the sequences of the V1S/SV2 transgene and the Camk2a promoter to the mm10 mouse reference genome. Count matrices generated by cellranger count 7.1 were loaded into an AnnData object and processed using the Python-based framework Scanpy 1.10.2. Integration with R, where needed, was facilitated through the rpy2 package. Raw count matrices were corrected for ambient RNA contamination using the SoupX 1.6.2. To remove potential doublets, scDblFinder 1.18.0 was employed with a fixed seed (123). Nuclei with nUMI and nGenes values exceeding three median absolute deviations (MADs) from the median were excluded. Genes detected in fewer than five nuclei across the dataset were excluded. The resulting dataset was normalized via scanpy.pp.normalize_total and scanpy.pp.log1p. Highly variable genes were identified using the function scanpy.pp.highly_variable_genes with the Seurat v3 flavor, selecting the top 4,000 genes. Dimensionality reduction was performed using principal component analysis (PCA) and batch effects were corrected using the python-implemented version of Harmony via the function scanpy.external.pp.harmony_integrate. Harmony embeddings were then used to construct a k-nearest neighbor (kNN) graph with scanpy.pp.neighbors. Clustering was performed using Leiden clustering with standard parameters via the function scanpy.tl.leiden. Clusters were annotated using literature, the mousebrain.org, and markers identified via the FindConservedMarkers function in Seurat. First, neurons and non-neuronal cells were distinguished using mainly canonical markers, such as but not limited to Rbfox3 (neurons), Mbp (oligodendrocytes), Acsbg1 (astrocytes), Pdgfra (oligodendrocyte precursor cells), Inpp5d (microglia), Colec12 (vascular cells), and Ttr (choroid plexus cells). Neurons were further classified into Vglut1 (Slc17a7), Vglut2 (Slc17a6), GABA (Gad2), cholinergic (Scube1), and dopaminergic (Th) neurons. Vglut1 and GABA neurons were further annotated into subtypes based on subclustering and FindConservedMarkers markers.

    visium spatial pipeline: Sequences were fiducially aligned to spots using Loupe Browser ver. 8. All aligned sequences were mapped using spaceranger count 3.0.1 with a custom refence, which included sequences for the promotor and transgene (Camk2aTTA, V1S/SV2) to the mouse genome mm39. We filtered each sample of the Visium Spatial dataset based on the MAD filtering of number of reads (nUMI), number of genes (nGene), and percentage of mitochondrial genes (percent.mt). A spot was filtered out if it was outside of 3x MAD value in at least two metrics. Filtered samples were merged into one Seurat 5.1.0 object and we obtained normalized counts by the SCTransform function of Seurat. Integration was performed using Harmony 1.2.0 on 50 PCA embeddings and clustering was done using Leiden clustering based on 30 harmony embeddings. Integrated clusters were visualized using the UMAP method. Samples that were not successfully integrated (based on similarity measures of the harmony embeddings) and showed high percentage.mt or low nUMI levels compared to other samples, were removed from subsequent analysis. A final integration and clustering were performed after filtering. Regions were first annotated based on a 0.1 resolution clustering to get high level region annotation (Cortex, Hippocampus, Subcortex). Each high-level region was further annotated based on either more granular resolutions or subclustering. Marker genes from mousebrain.org and literature were used in combination with the Allen mouse brain atlas to obtain anatomically relevant annotations.

  9. f

    Data Sheet 1_Single-cell transcriptome and multi-omics integration reveal...

    • frontiersin.figshare.com
    docx
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yushun Wu; Jing Liu; Wenying Yu; Xiaoding Wang; Jian Li; Weiquan Zeng (2025). Data Sheet 1_Single-cell transcriptome and multi-omics integration reveal ferroptosis-driven immune microenvironment remodeling in knee osteoarthritis.docx [Dataset]. http://doi.org/10.3389/fimmu.2025.1608378.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 25, 2025
    Dataset provided by
    Frontiers
    Authors
    Yushun Wu; Jing Liu; Wenying Yu; Xiaoding Wang; Jian Li; Weiquan Zeng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundKnee osteoarthritis (KOA) is a chronic inflammatory joint disorder marked by cartilage degradation and immune microenvironment dysregulation. While transcriptomic studies have identified key pathways in KOA, the interplay between ferroptosis (an iron-dependent cell death mechanism) and immune dysfunction at single-cell resolution remains unexplored. This study integrates single-cell and bulk transcriptomics to dissect ferroptosis-driven immune remodeling and identify diagnostic biomarkers in KOA.MethodsWe analyzed scRNA-seq data (GSE255460, n = 11) and bulk RNA-seq cohorts (GSE114007: 20 KOA/18 controls; GSE246425: 8 KOA/4 controls). Single-cell data were processed via Seurat (QC: mitochondrial genes >3 MAD; normalization: LogNormalize; batch correction: Harmony) and annotated using CellMarker/PanglaoDB. CellChat decoded intercellular communication, SCENIC reconstructed transcriptional networks, and Monocle2 for pseudotime trajectory mapping. Immune infiltration (CIBERSORT) and a LASSO-SVM diagnostic model were validated by ROC curves. Functional enrichment (GSEA/GSVA) and immunometabolic profiling were performed.ResultsTwelve chondrocyte clusters were identified, including ferroptosis-active homeostasis chondrocytes (HomC) (p < 0.01), which exhibited 491 DEGs linked to lipid peroxidation. HomC orchestrated synovitis via FGF signaling (ligand-receptor pairs: FGF1-FGFR1), amplifying ECM degradation and inflammatory cascades (CellChat). SCENIC revealed 10 HomC-specific regulons (e.g., SREBF1, YY1) driving matrix metalloproteinase activation. A 7-gene diagnostic panel (IFT88, MIEF2, ABCC10, etc.) achieved AUC = 1.0 (training) and 0.78 (validation). Immune profiling showed reduced resting mast cells (p = 0.003) and monocytes (p = 0.02), with ABCC10 correlating negatively with CD8+ T cells (r = -0.65) and M1 macrophages. GSEA/GSVA implicated HIF-1, NF-κB, and oxidative phosphorylation pathways in KOA progression. Pseudotime analysis revealed fibrotic transitions (COL1A1↑, TNC↑) in late-stage KOA cells.ConclusionThis study establishes ferroptosis as one of the key drivers immune-metabolic dysfunction in KOA, with HomC acting as a hub for FGF-mediated synovitis and ECM remodeling. The diagnostic model and regulon network (SREBF1/YY1) offer translational tools for early detection, while impaired mast cell homeostasis highlights novel immunotherapeutic targets. Our findings bridge ferroptosis, immune dysregulation, and metabolic stress, advancing precision strategies for KOA management.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Wenbo Yu; Ahmed Mahfouz; Marcel J. T. Reinders (2023). Data_Sheet_1_CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.644211.s001

Data_Sheet_1_CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq.PDF

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Wenbo Yu; Ahmed Mahfouz; Marcel J. T. Reinders
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The power of single-cell RNA sequencing (scRNA-seq) in detecting cell heterogeneity or developmental process is becoming more and more evident every day. The granularity of this knowledge is further propelled when combining two batches of scRNA-seq into a single large dataset. This strategy is however hampered by technical differences between these batches. Typically, these batch effects are resolved by matching similar cells across the different batches. Current approaches, however, do not take into account that we can constrain this matching further as cells can also be matched on their cell type identity. We use an auto-encoder to embed two batches in the same space such that cells are matched. To accomplish this, we use a loss function that preserves: (1) cell-cell distances within each of the two batches, as well as (2) cell-cell distances between two batches when the cells are of the same cell-type. The cell-type guidance is unsupervised, i.e., a cell-type is defined as a cluster in the original batch. We evaluated the performance of our cluster-guided batch alignment (CBA) using pancreas and mouse cell atlas datasets, against six state-of-the-art single cell alignment methods: Seurat v3, BBKNN, Scanorama, Harmony, LIGER, and BERMUDA. Compared to other approaches, CBA preserves the cluster separation in the original datasets while still being able to align the two datasets. We confirm that this separation is biologically meaningful by identifying relevant differential expression of genes for these preserved clusters.

Search
Clear search
Close search
Google apps
Main menu