60 datasets found

n
Data from: Large-scale integration of single-cell transcriptomic data...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+2more
zip
Updated Dec 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.t4b8gtj34
Dataset updated
Dec 14, 2021
Dataset provided by
Cornell University
Authors
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using
r
cellCounts
researchdata.edu.au
opal.latrobe.edu.au
Updated Dec 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shi Wei; Mielke Lisa; Pal Bhupinder; Raghu Dinesh; Liao Yang; Yang Liao; Wei Shi; Lisa Mielke; Dinesh Raghu; Bhupinder Pal (2022). cellCounts [Dataset]. http://doi.org/10.26181/21588276.V3
Explore at:
Unique identifier
https://doi.org/10.26181/21588276.V3
Dataset updated
Dec 19, 2022
Dataset provided by
La Trobe University
Authors
Shi Wei; Mielke Lisa; Pal Bhupinder; Raghu Dinesh; Liao Yang; Yang Liao; Wei Shi; Lisa Mielke; Dinesh Raghu; Bhupinder Pal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This page includes the data and code necessary to reproduce the results of the following paper:

Yang Liao, Dinesh Raghu, Bhupinder Pal, Lisa Mielke and Wei Shi. cellCounts: fast and accurate quantification of 10x Chromium single-cell RNA sequencing data. Under review.

A Linux computer running an operating system of CentOS 7 (or later) or Ubuntu 20.04 (or later) is recommended for running this analysis. The computer should have >2 TB of disk space and >64 GB of RAM. The following software packages need to be installed before running the analysis. Software executables generated after installation should be included in the $PATH environment variable.

R (v4.0.0 or newer) https://www.r-project.org/

Rsubread (v2.12.2 or newer) http://bioconductor.org/packages/3.16/bioc/html/Rsubread.html

CellRanger (v6.0.1) https://support.10xgenomics.com/single-cell-gene-expression/software/overview/welcome

STARsolo (v2.7.10a) https://github.com/alexdobin/STAR

sra-tools (v2.10.0 or newer) https://github.com/ncbi/sra-tools

Seurat (v3.0.0 or newer) https://satijalab.org/seurat/

edgeR (v3.30.0 or newer) https://bioconductor.org/packages/edgeR/

limma (v3.44.0 or newer) https://bioconductor.org/packages/limma/

mltools (v0.3.5 or newer) https://cran.r-project.org/web/packages/mltools/index.html

Reference packages generated by 10x Genomics are also required for this analysis and they can be downloaded from the following link (2020-A version for individual human and mouse reference packages should be selected):

https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest

After all these are done, you can simply run the shell script ‘test-all-new.bash’ to perform all the analyses carried out in the paper. This script will automatically download the mixture scRNA-seq data from the SRA database, and it will output a text file called ‘test-all.log’ that contains all the screen outputs and speed/accuracy results of CellRanger, STARsolo and cellCounts.
scRNA-seq Human Pluripotent Stem Cells Messmer2019
kaggle.com
zip
Updated May 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Chervov (2022). scRNA-seq Human Pluripotent Stem Cells Messmer2019 [Dataset]. https://www.kaggle.com/datasets/alexandervc/scrnaseq-human-pluripotent-stem-cells-messmer2019
Explore at:
zip(57267380 bytes)Available download formats
Dataset updated
May 1, 2022
Authors
Alexander Chervov
Description
Remark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev

Data and Context

Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

Particular data: https://pubmed.ncbi.nlm.nih.gov/30673604/ Cell Rep. 2019 Jan 22;26(4):815-824.e4. doi: 10.1016/j.celrep.2018.12.099. Transcriptional Heterogeneity in Naive and Primed Human Pluripotent Stem Cells at Single-Cell Resolution Tobias Messmer 1, Ferdinand von Meyenn 2, Aurora Savino 3, Fátima Santos 3, Hisham Mohammed 3, Aaron Tin Long Lun 4, John C Marioni 5, Wolf Reik 6

Data in two variants: 1) scRNA-seq count matrix, downloaded from database of R-package "scRNAseq", see script: https://www.kaggle.com/alexandervc/rpackage-scrnaseq-downloads-datasets 2) Directly uploaded from E-MTAB-6819 https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-6819/

Related datasets:

Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"

Inspiration

Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x

Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles

(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833

Challenges in unsupervised clustering of single-cell RNA-seq data https://www.nature.com/articles/s41576-018-0088-9 Review Article 07 January 2019 Vladimir Yu Kiselev, Tallulah S. Andrews & Martin Hemberg Nature Reviews Genetics volume 20, pages273–282 (2019)

Challenges and emerging directions in single-cell analysis https://link.springer.com/article/10.1186/s13059-017-1218-y Published: 08 May 2017 Guo-Cheng Yuan, Long Cai, Michael Elowitz, Tariq Enver, Guoping Fan, Guoji Guo, Rafael Irizarry, Peter Kharchenko, Junhyong Kim, Stuart Orkin, John Quackenbush, Assieh Saadatpour, Timm Schroeder, Ramesh Shivdasani & Itay Tirosh Genome Biology volume 18, Article number: 84 (2017)

Single-Cell RNA Sequencing in Cancer: Lessons Learned and Emerging Challenges https://www.sciencedirect.com/science/article/pii/S1097276519303569 Molecular Cell Volume 75, Issue 1, 11 July 2019, Pages 7-12 Journal home page for Molecular Cell
Z
Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset
data.niaid.nih.gov
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hsu, Jonathan; Stoop, Allart (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10011621
Explore at:
Dataset updated
Nov 20, 2023
Authors
Hsu, Jonathan; Stoop, Allart
Description
Table of Contents

Main Description File Descriptions Linked Files Installation and Instructions

1. Main Description

This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data. The following libraries are required for script execution:

Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap

File Descriptions

The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

Linked Files

This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)

Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719

Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)

Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

Installation and Instructions

The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

Ensure you have R version 4.1.2 or higher for compatibility.

Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).

Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.

Set your working directory to where the following files are located:

marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt

You can use the following code to set the working directory in R:

setwd(directory)

Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.

Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.

Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.

Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.
o
Repository for the single cell RNA sequencing data analysis for the human...
explore.openaire.eu
Updated Aug 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan; Andrew; Pierre; Allart; Adrian (2023). Repository for the single cell RNA sequencing data analysis for the human manuscript. [Dataset]. http://doi.org/10.5281/zenodo.8286134
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8286134
Dataset updated
Aug 26, 2023
Authors
Jonathan; Andrew; Pierre; Allart; Adrian
Description
This is the GitHub repository for the single cell RNA sequencing data analysis for the human manuscript. The following essential libraries are required for script execution: Seurat scReportoire ggplot2 dplyr ggridges ggrepel ComplexHeatmap Linked File: -------------------------------------- This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. Provided below are descriptions of the linked datasets: 1. Gene Expression Omnibus (GEO) ID: GSE229626 - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the matrix.mtx, barcodes.tsv, and genes.tsv files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token"(https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). 2. Sequence read archive (SRA) repository - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the "raw sequencing" or .fastq.gz files, which are tab delimited text files. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token" (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). Please note that since the GSE submission is private, the raw data deposited at SRA may not be accessible until the embargo on GSE229626 has been lifted. Installation and Instructions -------------------------------------- The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation: > Ensure you have R version 4.1.2 or higher for compatibility. > Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code. The following code can be used to set working directory in R: > setwd(directory) Steps: 1. Download the "Human_code_April2023.R" and "Install_Packages.R" R scripts, and the processed data from GSE229626. 2. Open "R-Studios"(https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R. 3. Set your working directory to where the following files are located: - Human_code_April2023.R - Install_Packages.R 4. Open the file titled Install_Packages.R and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies. 5. Open the Human_code_April2023.R R script and execute commands as necessary.
scRNA-seq Kolodziejczyk et al. (2015)
kaggle.com
zip
Updated Apr 30, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Chervov (2022). scRNA-seq Kolodziejczyk et al. (2015) [Dataset]. https://www.kaggle.com/datasets/alexandervc/scrnaseq-kolodziejczyk-et-al-2015
Explore at:
zip(13439744 bytes)Available download formats
Dataset updated
Apr 30, 2022
Authors
Alexander Chervov
Description
Remark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev

Data and Context

Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

Particular data: Data from the paper: Kolodziejczyk, A. A., J. K. Kim, J. C. Tsang, T. Ilicic, J. Henriksson, K. N. Natarajan, A. C. Tuck, et al. 2015. “Single cell RNA-Sequencing of pluripotent states unlocks modular transcriptional variation.” Cell Stem Cell 17 (4): 471–85. https://pubmed.ncbi.nlm.nih.gov/26431182/

scRNA-seq count matrix, downloaded from database of R-package "scRNAseq", see script: https://www.kaggle.com/alexandervc/rpackage-scrnaseq-downloads-datasets

Related datasets:

Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"

Inspiration

Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x

Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles

(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833

Challenges in unsupervised clustering of single-cell RNA-seq data https://www.nature.com/articles/s41576-018-0088-9 Review Article Published: 07 January 2019 Vladimir Yu Kiselev, Tallulah S. Andrews & Martin Hemberg Nature Reviews Genetics volume 20, pages273–282 (2019)

Challenges and emerging directions in single-cell analysis https://link.springer.com/article/10.1186/s13059-017-1218-y Published: 08 May 2017 Guo-Cheng Yuan, Long Cai, Michael Elowitz, Tariq Enver, Guoping Fan, Guoji Guo, Rafael Irizarry, Peter Kharchenko, Junhyong Kim, Stuart Orkin, John Quackenbush, Assieh Saadatpour, Timm Schroeder, Ramesh Shivdasani & Itay Tirosh Genome Biology volume 18, Article number: 84 (2017)

Single-Cell RNA Sequencing in Cancer: Lessons Learned and Emerging Challenges https://www.sciencedirect.com/science/article/pii/S1097276519303569 Molecular Cell Volume 75, Issue 1, 11 July 2019, Pages 7-12 Journal home page for Molecular Cell
E
Breast Cancer Single-Cell RNA-Seq Dataset
ega-archive.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Breast Cancer Single-Cell RNA-Seq Dataset [Dataset]. https://ega-archive.org/datasets/EGAD00001007495
Explore at:
License
https://ega-archive.org/dacs/EGAC00001001974https://ega-archive.org/dacs/EGAC00001001974
Description
Single-cell RNA-Sequencing of 26 primary breast cancers from Wu et al. (2021) study. Data was generated using the Chromium controller (10X Genomics) and sequenced on the NextSeq 500 platform.
f
scPerturb Single-Cell Perturbation Data: RNA and protein h5ad files
plus.figshare.com
hdf
Updated Sep 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefan Peidli; Tessa D. Green; Ciyue Shen; Torsten Gross; Joseph Min; Samuele Garda; Bo Yuan; Linus J. Schumacher; Jake P. Taylor-King; Debora S. Marks; Augustin Luna; Nils Blüthgen; Chris Sander (2023). scPerturb Single-Cell Perturbation Data: RNA and protein h5ad files [Dataset]. http://doi.org/10.25452/figshare.plus.24160713.v1
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.25452/figshare.plus.24160713.v1
Dataset updated
Sep 29, 2023
Dataset provided by
Figshare+
Authors
Stefan Peidli; Tessa D. Green; Ciyue Shen; Torsten Gross; Joseph Min; Samuele Garda; Bo Yuan; Linus J. Schumacher; Jake P. Taylor-King; Debora S. Marks; Augustin Luna; Nils Blüthgen; Chris Sander
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This repository contains the single-cell RNA and protein datasets as h5ad files scRNA-seq and protein datasets within the scperturb database as h5ad files (saved with scanpy v1.9.1.) from the scperturb database.In order to facilitate development and benchmarking of computational methods in systems biology, we collected a set of 44 publicly available single-cell perturbation-response datasets with molecular readouts, including transcriptomics, proteomics and epigenomics. We applied uniform quality control pipelines and harmonize feature annotations. The resulting information resource enables efficient development and testing of computational analysis methods, and facilitates direct comparison and integration across datasets. In addition, we describe E-statistics for perturbation effect quantification and significance testing, and demonstrate E-distance as a general distance measure for single-cell data as both a python (scperturb on PyPI) and R (scperturbR on CRAN) package.See the associated publication for info on how the data was handled. We also have an interactive table (Data Explorer) on our website with metadata per dataset.
Protocol data (R version)
figshare.com
application/gzip
Updated Oct 16, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jesse Gillis (2020). Protocol data (R version) [Dataset]. http://doi.org/10.6084/m9.figshare.13020569.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13020569.v2
Dataset updated
Oct 16, 2020
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Jesse Gillis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We published 3 protocols illustrating how MetaNeighbor can be used to quantify cell type replicability across single cell transcriptomic datasets.The data files included here are needed to run the R version of the protocols available on Github (https://github.com/gillislab/MetaNeighbor-Protocol) in RMarkdown (.Rmd) and Jupyter (.ipynb) notebook format. To run the protocols, download the protocols on Github, download the data on Figshare, place the data and protocol files in the same directory, then run the notebooks in Rstudio or Jupyter.The scripts used to generate the data are included in the Github directory. Briefly: - full_biccn_hvg.rds contains a single cell transcriptomic dataset published by the Brain Initiative Cell Census Network (in SingleCellExperiment format). It combines data from 7 datasets obtained in the mouse primary motor cortex (https://www.biorxiv.org/content/10.1101/2020.02.29.970558v2). Note that this dataset only contains highly variable genes. - biccn_hvgs.txt: highly variable genes from the BICCN dataset described above (computed with the MetaNeighbor library). - biccn_gaba.rds: same dataset as full_biccn_hvg.rds, but restricted to GABAergic neurons. The dataset contains all genes common to the 7 BICCN datasets (not just highly variable genes). - go_mouse.rds: gene ontology annotations, stored as a list of gene symbols (one element per gene set).- functional_aurocs.txt: results of the MetaNeighbor functional analysis in protocol 3.
m
Queryable single-cell RNA-seq (10x Genomics) datasets of Human and Mouse...
data.mendeley.com
Updated Nov 7, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Hermann (2018). Queryable single-cell RNA-seq (10x Genomics) datasets of Human and Mouse spermatogenic cells [Dataset]. http://doi.org/10.17632/kxd5f8vpt4.1
Explore at:
Unique identifier
https://doi.org/10.17632/kxd5f8vpt4.1
Dataset updated
Nov 7, 2018
Authors
Brian Hermann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To reveal distinct transcriptomes associated with various spermatogenic cells in both mouse and human testes, including spermatogonial stem cells (SSCs) and all of their subsequent progeny, we used the 10x Genomics Chromium (a commercialized Drop-Seq variant) to perform single-cell RNA-seq on various cell populations. Raw data and analyzed data (gene expression matrices) are deposited into the NIH GEO database. Here we include queryable, annotated and interactive files that can be used to compare single-cell transcriptomes.

Spermatogonia from immature (P6) and adult Id4-Egfp transgenic mice were used. The GFP-bright and dim phenotypes exhibit distinct fates when assayed by transplantation, with ID4-EGFPbright cells highly enriched for SSCs, and ID4-EGFPdim cells enriched for progenitors. Corresponding human spermatogonia were enriched from human testicular tissue by multi-parameter FACS. For both human and mouse, StaPut gravity sedimentation enriched for meiotic spermatocytes and post-meiotic spermatids and we profiled unselected steady-state spermatogenic cells.

The data from these experiments are stored in Loupe Cell Browser files (.cloupe) which are generated during analysis of 10x Genomics Single-cell data and can be opened and queried with the Loupe Cell Browser (10X Genomics). This software can be downloaded for free from https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest. It is important to note that the companion manuscript for these data used additional analyses that are not represented in these files.

The following datasets are available:

Unselected or sorted P6 ID4-EGFP+ spermatogonia (sorted separately as EGFP-bright or EGFP-dim) were used for this study. Data are from 13094 cells and can be found in the following file: P6 Mouse Spermatogonia.cloupe (aggregate of three datasets, P6 ID4-EGFP bright/dim/unselected)

Unselected or sorted Adult ID4-EGFP+ spermatogonia (sorted separately as EGFP-bright or EGFP-dim), three replicate preparations of steady-state unselected spermatogenic cells, and StaPut-enriched adult spermatocytes and spermatids were used for this study. Data are from 17491 cells and can be found in the following files: Adult Mouse Sorted Spermatogonia.cloupe (Aggregated Ad Spg- ID4-EGFP bright/dim/CD9bright) Mouse Unselected Spermatogenic cells.cloupe (3 replicates of steady-state spermatogenic cells) Mouse StaPut Spermatocytes.cloupe Mouse StaPut Spermatids.cloupe

Sorted adult Human spermatogonia, three replicates of steady-state unselected spermatogenic cells, and StaPut-enriched adult spermatocytes and spermatids were used. Data are from 32727 cells and can be found in the following files: Human Sorted Spermatogonia.cloupe (3 replicates) Human Unselected Spermatogenic Cells.cloupe (3 replicates of steady-state spermatogenic cells) Human StaPut Spermatocytes.cloupe (2 replicates) Human StaPut Spermatids.cloupe (2 replicates)
h
gene-expression-single-cell-mouse
huggingface.co
Updated Jun 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
2025 Longevity x AI Hackathon (2025). gene-expression-single-cell-mouse [Dataset]. https://huggingface.co/datasets/longevity-db/gene-expression-single-cell-mouse
Explore at:
Dataset updated
Jun 17, 2025
Dataset authored and provided by
2025 Longevity x AI Hackathon
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
A single-cell transcriptomic atlas characterizes ageing tissues in the mouse

Code to download and process this dataset is available in: https://github.com/seanome/2025-longevity-x-ai-hackathon Dataset structure is originally from AnnData. Descriptions of each data file is below.

Data Files

This dataset contains multiple parquet files, one for each sheet in the original Excel file: gene-expression-single-cell-mouse_*.parquet - Data files containing gene expression and… See the full description on the dataset page: https://huggingface.co/datasets/longevity-db/gene-expression-single-cell-mouse.
Data from "Single-cell integration and multi-modal profiling reveals...
zenodo.org
bin, xz
Updated Nov 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valentin Marteau; Valentin Marteau; Niloofar Nemati; Niloofar Nemati; Kristina Handler; Kristina Handler; Deeksha Raju; Alexander Kirchmair; Alexander Kirchmair; Dietmar Rieder; Dietmar Rieder; Erika Kvalem Soto; Erika Kvalem Soto; Georgios Fotakis; Georgios Fotakis; Glenn De Lange; Glenn De Lange; Sandro Carollo; Nina Boeck; Nina Boeck; Alessia Rossi; Sophia Daum; Alexandra Scheiber; Alexandra Scheiber; Arno Amann; Andreas Seeber; Andreas Seeber; Elisabeth Gasser; Elisabeth Gasser; Steffen Ormanns; Steffen Ormanns; Michael Günther; Agnieszka Martowicz; Agnieszka Martowicz; Zuzana Loncova; Zuzana Loncova; Giorgia Lamberti; Giorgia Lamberti; Anne Krogsdam; Anne Krogsdam; Michela Carlet; Lena Horvath; Lena Horvath; Marie Theres Eling; Hassan Fazilaty; Hassan Fazilaty; Tomas Valenta; Tomas Valenta; Gregor Sturm; Gregor Sturm; Sieghart Sopper; Sieghart Sopper; Andreas Pircher; Andreas Pircher; Patrizia Stoitzner; Patrizia Stoitzner; Peter J. Wild; Peter J. Wild; Patrick Welker; Pascal J. May; Paul Ziegler; Paul Ziegler; Markus Tschurtschenthaler; Markus Tschurtschenthaler; Daniel Neureiter; Daniel Neureiter; Florian Huemer; Florian Huemer; Richard Greil; Richard Greil; Lukas Weiss; Lukas Weiss; Marieke Ijsselsteijn; Marieke Ijsselsteijn; Noel F.C.C. de Miranda; Noel F.C.C. de Miranda; Dominik Wolf; Dominik Wolf; Isabelle C. Arnold; Isabelle C. Arnold; Stefan Salcher; Stefan Salcher; Zlatko Trajanoski; Zlatko Trajanoski; Deeksha Raju; Sandro Carollo; Alessia Rossi; Sophia Daum; Arno Amann; Michael Günther; Michela Carlet; Marie Theres Eling; Patrick Welker; Pascal J. May (2025). Data from "Single-cell integration and multi-modal profiling reveals phenotypes and spatial organization of neutrophils in colorectal cancer" [Dataset]. http://doi.org/10.5281/zenodo.16631519
Explore at:
xz, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.16631519
Dataset updated
Nov 13, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Valentin Marteau; Valentin Marteau; Niloofar Nemati; Niloofar Nemati; Kristina Handler; Kristina Handler; Deeksha Raju; Alexander Kirchmair; Alexander Kirchmair; Dietmar Rieder; Dietmar Rieder; Erika Kvalem Soto; Erika Kvalem Soto; Georgios Fotakis; Georgios Fotakis; Glenn De Lange; Glenn De Lange; Sandro Carollo; Nina Boeck; Nina Boeck; Alessia Rossi; Sophia Daum; Alexandra Scheiber; Alexandra Scheiber; Arno Amann; Andreas Seeber; Andreas Seeber; Elisabeth Gasser; Elisabeth Gasser; Steffen Ormanns; Steffen Ormanns; Michael Günther; Agnieszka Martowicz; Agnieszka Martowicz; Zuzana Loncova; Zuzana Loncova; Giorgia Lamberti; Giorgia Lamberti; Anne Krogsdam; Anne Krogsdam; Michela Carlet; Lena Horvath; Lena Horvath; Marie Theres Eling; Hassan Fazilaty; Hassan Fazilaty; Tomas Valenta; Tomas Valenta; Gregor Sturm; Gregor Sturm; Sieghart Sopper; Sieghart Sopper; Andreas Pircher; Andreas Pircher; Patrizia Stoitzner; Patrizia Stoitzner; Peter J. Wild; Peter J. Wild; Patrick Welker; Pascal J. May; Paul Ziegler; Paul Ziegler; Markus Tschurtschenthaler; Markus Tschurtschenthaler; Daniel Neureiter; Daniel Neureiter; Florian Huemer; Florian Huemer; Richard Greil; Richard Greil; Lukas Weiss; Lukas Weiss; Marieke Ijsselsteijn; Marieke Ijsselsteijn; Noel F.C.C. de Miranda; Noel F.C.C. de Miranda; Dominik Wolf; Dominik Wolf; Isabelle C. Arnold; Isabelle C. Arnold; Stefan Salcher; Stefan Salcher; Zlatko Trajanoski; Zlatko Trajanoski; Deeksha Raju; Sandro Carollo; Alessia Rossi; Sophia Daum; Arno Amann; Michael Günther; Michela Carlet; Marie Theres Eling; Patrick Welker; Pascal J. May
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This archive provides all datasets needed to reproduce the single‐cell data integration detailed in the paper

Single-cell integration and multi-modal profiling reveals phenotypes and spatial organization of neutrophils in colorectal cancer

DOI: 10.1101/2024.08.26.609563

The archive comprises the following files:

MUI_Innsbruck-adata.h5ad: In-house scRNA-seq dataset from CRC cohort I (n = 12) comprising matched peripheral blood, adjacent normal, and tumor samples generated using the BD Rhapsody platform.

input_datasets.tar.xz: Preprocessed input datasets in .h5ad format required to build the CRC scRNA-seq atlas.

crc_atlas_scanvi_model.tar.xz: Pretrained scArches reference model and matching .h5ad file (highly variable genes only), enabling projection of external data onto the CRC atlas.

downstream_analyses_de_analysis.tar.xz: DESeq2-based differential expression analyses on pseudobulked data by cell type for various matched comparisons within the CRC atlas. Includes RDS files, result TSV tables, and short summaries for each comparison.

The CRC atlas is publicly available for download and interactive exploration through a cell-x-gene instance with standardized metadata, which allows custom analyses of the atlas. For more information, check out the

project website and

our github repository.
scRNA-seq "Tabula sapiens" - human, 500 000+ cells
kaggle.com
zip
Updated Feb 5, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Chervov (2022). scRNA-seq "Tabula sapiens" - human, 500 000+ cells [Dataset]. https://www.kaggle.com/datasets/alexandervc/scrnaseq-tabula-sapiens-human-500-000-cells
Explore at:
zip(14395870367 bytes)Available download formats
Dataset updated
Feb 5, 2022
Authors
Alexander Chervov
Description
Remark 1: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev

Remark 2: Second part of the data see in https://www.kaggle.com/alexandervc/scrnaseq-tabula-sapiens-human-part-2

Data and Context

Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

Particular data: "Tabula Sapiens" project: https://tabula-sapiens-portal.ds.czbiohub.org/ Data section for download: https://figshare.com/articles/dataset/Tabula_Sapiens_release_1_0/14267219 Paper: https://www.science.org/doi/10.1126/science.abl4896 https://www.biorxiv.org/content/10.1101/2021.07.19.452956v2

Tabula Sapiens is a benchmark, first-draft human cell atlas of nearly 500,000 cells from 24 organs of 15 normal human subjects. This work is the product of the Tabula Sapiens Consortium. Special thanks to the Chan Zuckerberg Initiative for funding this project and to the CZI Science Technology team for creating cellxgene, the tool that makes the visualization of this research possible.

See also tutorials:

Course at Sanger's institute https://scrnaseq-course.cog.sanger.ac.uk/website/tabula-muris.html

Course at CZ-hub: https://chanzuckerberg.github.io/scRNA-python-workshop/intro/about

On kaggle - copies of the notebooks and data from the course above https://www.kaggle.com/aayush9753/singlecell-rnaseq-data-from-mouse-brain

Inspiration

Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Spotiphy enables single-cell spatial whole transcriptomics across the entire...
zenodo.org
bin, csv, jpeg
Updated Dec 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiyuan Yang; Jiyuan Yang; Ziqian Zheng; Ziqian Zheng; Jiyang Yu; Jiyang Yu (2024). Spotiphy enables single-cell spatial whole transcriptomics across the entire section [Dataset]. http://doi.org/10.5281/zenodo.10520022
Explore at:
bin, jpeg, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10520022
Dataset updated
Dec 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jiyuan Yang; Jiyuan Yang; Ziqian Zheng; Ziqian Zheng; Jiyang Yu; Jiyang Yu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Spatial transcriptomics (ST) has advanced our understanding of tissue regionalization by enabling the visualization of gene expression within whole tissue sections, but the approach remains dogged by the challenge of achieving single-cell resolution without sacrificing whole genome coverage. Here we present Spotiphy (Spot imager with pseudo single-cell resolution histology), a novel computational toolkit that transforms sequencing-based ST data into single-cell-resolved whole-transcriptome images. In evaluations with Alzheimer’s disease (AD) and normal mouse brains, Spotiphy delivers the most precise cellular compositions. For the first time, Spotiphy reveals novel astrocyte regional specification in mouse brains. It distinguishes sub-populations of DAM (Disease-Associated Microglia) located in different AD mouse brain regions. Spotiphy also identifies multiple spatial domains as well as changes in the patterns of tumor-tumor microenvironment interactions using human breast ST data. Spotiphy enables visualization of cell localization and gene expression in tissue sections, offering key insights into the function of complex biological systems.
Comparison of ScRDAVis and other popular single cell data analysis tools.
figshare.com
xls
Updated Nov 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sankarasubramanian Jagadesan; Chittibabu Guda (2025). Comparison of ScRDAVis and other popular single cell data analysis tools. [Dataset]. http://doi.org/10.1371/journal.pcbi.1013721.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1013721.t001
Dataset updated
Nov 18, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Sankarasubramanian Jagadesan; Chittibabu Guda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of ScRDAVis and other popular single cell data analysis tools.
u
Data from: A single-cell immune atlas of primary and secondary lymphoid...
agdatacommons.nal.usda.gov
hdf
Updated Sep 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jayne Wiarda; Muskan Kapoor; Sathesh K. Sivasankaran; Kristen A. Byrne; Crystal L. Loving; Christopher K. Tuggle (2025). Data from: A single-cell immune atlas of primary and secondary lymphoid organs in pigs [Dataset]. http://doi.org/10.15482/USDA.ADC/29492726.v1
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/29492726.v1
Dataset updated
Sep 8, 2025
Dataset provided by
Ag Data Commons
Authors
Jayne Wiarda; Muskan Kapoor; Sathesh K. Sivasankaran; Kristen A. Byrne; Crystal L. Loving; Christopher K. Tuggle
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Data objects used for analysis in "A single-cell immune atlas of primary and secondary lymphoid organs in pigs" by Wiarda et al. Data objects include .cloupe files for interactive query in Loupe Cell Browser (10X Genomics), .rds files to download for use with a Shiny app for data query, and .h5seurat files for computational query of data. Briefly, cells were isolated from bone marrow, thymus, lymph node, and spleen of two pigs and processed for single-cell RNA sequencing. Single-cell RNA sequencing data was analyzed to identify cell types in each tissue and perform comparisons across tissues and across datasets.
PIAS: an interactive visualization platform for integrative analysis of...
figshare.com
zip
Updated May 31, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhu Sheng; Yibo Zhuang; Lishan Ye; Feng Zeng; Xiaohui Wu; Guoli Ji (2021). PIAS: an interactive visualization platform for integrative analysis of multi-source single-cell RNA-seq datasets [Dataset]. http://doi.org/10.6084/m9.figshare.13205924.v4
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13205924.v4
Dataset updated
May 31, 2021
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Zhu Sheng; Yibo Zhuang; Lishan Ye; Feng Zeng; Xiaohui Wu; Guoli Ji
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
Download and unzip the RData data and place it under the path www/task/public of PIAS.PIAS is a web-based interactive platform for integrative analysis of multi-source single-cell RNA-seq datasets. Different from many other single-cell RNA-seq analysis platforms or pipelines that mainly focus on preprocessing or analysis of one single-cell RNA-seq dataset, PIAS has the unique feature of integrating multi-source datasets and incorporates various metrics for comprehensively evaluating the result of data integration. Moreover, PIAS provides rich functions for data preprocessing, comprehensive analyses and visualization, including gene name transfer, quality control, normalization, highly variable genes identification, batch-effect removal, dimen-sionality reduction, clustering, differentially expressed, cluster annotation, enrichment analysis, and sin-gle-cell trajectories construction. Users can freely choose to perform desired functions, visualize results, and transfer data through interactive operations with PIAS.
Datasets accompanying scANANSE
zenodo.org
data.niaid.nih.gov
application/gzip, bin
Updated Mar 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J.A. Arts; J.A. Arts; J.G.A. Smits; J.G.A. Smits (2023). Datasets accompanying scANANSE [Dataset]. http://doi.org/10.5281/zenodo.7446267
Explore at:
application/gzip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7446267
Dataset updated
Mar 13, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
J.A. Arts; J.A. Arts; J.G.A. Smits; J.G.A. Smits
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The preprocessed Seurat object and the two Scanpy objects that can be used to run the scANANSE pipeline with.

Seurat object: preprocessed_PBMC.Rds

Scanpy objects: rna_PBMC.h5ad, atac_PBMC.h5ad

Additional raw data, used to construct the preprocessed objects, supplemented from:

https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_filtered_feature_bc_matrix.h5, https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_atac_fragments.tsv.gz, https://cf.10xgenomics.com/samples/cell-arc/1.0.0/pbmc_granulocyte_sorted_10k/pbmc_granulocyte_sorted_10k_atac_fragments.tsv.gz.tbi,
https://atlas.fredhutch.org/data/nygc/multimodal/pbmc_multimodal.h5seurat
allen_brain.h5ad
figshare.com
hdf
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Dimitrov (2023). allen_brain.h5ad [Dataset]. http://doi.org/10.6084/m9.figshare.20338089.v4
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20338089.v4
Dataset updated
Jun 6, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Daniel Dimitrov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Anndata format of the adult mouse brain atlas generated by reasearchers the Allen institute (Tasic et al.), together with inferred cell type colocalization information, as described in Dimitrov et al, 2022.

Tasic, B. et al. Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nat. Neurosci. 19, 335–346 (2016).

Dimitrov, D., Türei, D., Garrido-Rodriguez, M., Burmedi, P.L., Nagai, J.S., Boys, C., Ramirez Flores, R.O., Kim, H., Szalai, B., Costa, I.G. and Valdeolivas, A., 2022. Comparison of methods and resources for cell-cell communication inference from single-cell RNA-Seq data. Nature Communications, 13(1), pp.1-13.
Bulk-RNA-sequencing-and-single-nuclei-transcriptomics-and-epigenomics-of-brain-tissue-from-mice-flown-on-the-RR-10-mission...
osdr.nasa.gov
data.nasa.gov
Updated Jul 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lauren Sanders; Eduardo Almeida; Sylvain Costes; Samrawit Gebre; Yi-Chun Chen; Valery Boyko; San-Huei Lai Polo; Kristen Peach; Amanda Saravia-Butler; Jonathan Oribello (2025). Bulk-RNA-sequencing-and-single-nuclei-transcriptomics-and-epigenomics-of-brain-tissue-from-mice-flown-on-the-RR-10-mission [Dataset]. https://osdr.nasa.gov/bio/repo/data/studies/OSD-612
Explore at:
Dataset updated
Jul 21, 2025
Dataset provided by
NASAhttp://nasa.gov/
Authors
Lauren Sanders; Eduardo Almeida; Sylvain Costes; Samrawit Gebre; Yi-Chun Chen; Valery Boyko; San-Huei Lai Polo; Kristen Peach; Amanda Saravia-Butler; Jonathan Oribello
License
Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
Description
The objective of the Rodent Research-10 mission (RR-10) was to investigate how spaceflight affects the cellular and molecular mechanisms of normal bone tissue regeneration in space. To this end, ten (10) 14-15 weeks-old female B6129SF2/J Wild Type (WT), and ten (10) 14-15 weeks-old female B6;129S2-Cdkn1atm1Tyj/J (p21-null) mice received a pre-flight subcutaneous injection of the bone marker (Alizarin Red), and were then delivered to the ISS aboard SpaceX-21. At 7 days before euthanasia, all 20 mice received an intraperitoneal (IP) injection with a bone formation marker (Calcein). At 48 +/- 2 hours before euthanasia, all 20 mice received an IP injection with a second dose of Calcein as well as a cell proliferation marker (BrdU). Then, following 28-29 days in microgravity, the Flight mice were euthanized. Following removal of hindlimbs, carcasses were wrapped in aluminum foil, preserved in the CryoChiller, and stored at -80 C or colder until return to Earth. In addition to the Flight group, three ground control groups were also part of the study: Basal (representing the pre-launch state), Vivarium (standard vivarium housing for the same duration of time as flight), and Ground (flight habitat in the International Space Station Environment Simulator, ISSES). Twenty mice (10 of each strain) were included in each of these control groups (except Vivarium which included 12 of each strain). These were treated, euthanized and processed on the same schedule and in the same manner as the flight samples. This study includes bulk RNA sequencing data from left cerebral hemispheres from 4 WT flight animals and 5 WT ground control animals, and single nuclei transcriptomics and epigenomics data from left cerebral hemispheres from 5 WT flight animals, and 5 WT ground control animals.

Facebook

Twitter

Click to copy link

Link copied

Cite

David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34

Data from: Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.t4b8gtj34

Dataset updated

Dec 14, 2021

Dataset provided by

Cornell University

Authors

David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove

License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using

Clear search

Close search

Google apps

Main menu

Data from: Large-scale integration of single-cell transcriptomic data...

cellCounts

scRNA-seq Human Pluripotent Stem Cells Messmer2019

Data and Context

Related datasets:

Inspiration

Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

1. Main Description

File Descriptions

Linked Files

Installation and Instructions

Repository for the single cell RNA sequencing data analysis for the human...

scRNA-seq Kolodziejczyk et al. (2015)

Data and Context

Related datasets:

Inspiration

Breast Cancer Single-Cell RNA-Seq Dataset

scPerturb Single-Cell Perturbation Data: RNA and protein h5ad files

Protocol data (R version)

Queryable single-cell RNA-seq (10x Genomics) datasets of Human and Mouse...

gene-expression-single-cell-mouse

Data from "Single-cell integration and multi-modal profiling reveals...

scRNA-seq "Tabula sapiens" - human, 500 000+ cells

Data and Context

See also tutorials:

Inspiration

Spotiphy enables single-cell spatial whole transcriptomics across the entire...

Comparison of ScRDAVis and other popular single cell data analysis tools.

Data from: A single-cell immune atlas of primary and secondary lymphoid...

PIAS: an interactive visualization platform for integrative analysis of...

Datasets accompanying scANANSE

allen_brain.h5ad

Bulk-RNA-sequencing-and-single-nuclei-transcriptomics-and-epigenomics-of-brain-tissue-from-mice-flown-on-the-RR-10-mission...

Data from: Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration