40 datasets found

b
Chan Zuckerberg CELLxGENE Collection
bioregistry.io
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Chan Zuckerberg CELLxGENE Collection [Dataset]. https://bioregistry.io/cellxgene.collection
Explore at:
Dataset updated
May 7, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Assigns identifiers to collections of datasets indexed by CELLxGENE.

CELLxGENE is an interactive data visualization and exploration tool developed by the Chan Zuckerberg Initiative that enables researchers to analyze and share single-cell genomics datasets. It provides a user-friendly interface for biologists and computational scientists to interrogate gene expression patterns across different cell types.
d
CZ CELLxGENE Discover
dknet.org
scicrunch.org
+1more
Updated Jan 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). CZ CELLxGENE Discover [Dataset]. http://identifiers.org/RRID:SCR_024894/resolver?q=&i=rrid
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_024894 https://identifiers.org/RRID:SCR_024894/resolver?q=&i=rrid
Dataset updated
Jan 17, 2024
Description
Portal used to find and download any of data sets published on CELLxGENE. Allows to download and visually explore data to understand functionality of human tissues at cellular level. Optimized for finding, exploring, and reusing single cell data. Collections Page lists collections hosted on CELLxGENE Discover and metadata that define tissue, assay, disease, organism, and cell count for each collection. Once you find published dataset of interest on CELLxGENE Discover, you can click on the explore button below the dataset description to explore the cells of that dataset using the CELLxGENE Explorer.
CellxGene-100K
kaggle.com
zip
Updated Apr 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darien Schettler (2025). CellxGene-100K [Dataset]. https://www.kaggle.com/datasets/dschettler8845/cellxgene-100k
Explore at:
zip(5611138387 bytes)Available download formats
Dataset updated
Apr 5, 2025
Authors
Darien Schettler
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
CellXGene-100K

GCS LINK: gs://kds-6860773353013302b6e19605df3e5195ee14d269d4d746edb218f8ff

DATASET OVERVIEW

A curated dataset of approximately 700,000 healthy human single cells (approx. 100,000 per tissue) sourced from the CellXGene Census, covering seven major tissues: * heart * blood * brain * lung * kidney * intestine * pancreas.

This is 1 of 4 datasets focusing on providing progressively larger, ready-to-use collections of healthy human single-cell RNA sequencing data in the H5AD format.

The goal is to offer standardized benchmarks/datasets derived from CellXGene for exploring fundamental scRNA-seq analysis, understanding multi-tissue cellular composition, developing and testing computational models, and evaluating method scalability across different orders of magnitude.

This dataset provides a focused collection of single-cell transcriptomic profiles representing healthy human tissues, curated from the comprehensive CZ CELLxGENE Discover Census (CellXGene) from the latest (Jan 2025) stable release. It includes data exclusively from Homo sapiens cells annotated as 'normal' or 'healthy' and in 'cell' suspension.

With its somewhat manageable size (approx. 700k total cells), this dataset serves as an excellent middle ground for exploration, model development, and scaling to larger use-cases.
b
Chan Zuckerberg CELLxGENE Dataset
bioregistry.io
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Chan Zuckerberg CELLxGENE Dataset [Dataset]. https://bioregistry.io/cellxgene.dataset
Explore at:
Dataset updated
May 7, 2025
Description
Assigns identifiers to datasets indexed by CELLxGENE, such those resulting from scRNA-seq experiments
S
Pretrained checkpoints of models by scCompass and CELLxGENE--scGPT
scidb.cn
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pengfei Wang (2025). Pretrained checkpoints of models by scCompass and CELLxGENE--scGPT [Dataset]. http://doi.org/10.57760/sciencedb.22054
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.22054
Dataset updated
Mar 14, 2025
Dataset provided by
Science Data Bank
Authors
Pengfei Wang
License
https://mit-license.orghttps://mit-license.org
Description
This project utilizes the scCompass and CELLxGENE datasets with data scales of 100K, 200K, 500K, 1M, 2M, and 5M to pre-train model: scGPT.
Z
Cellxgene VIP snRNA-seq demo dataset for visualization and DE analysis
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Apr 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KEJIE LI; Zhengyu Ouyang (2022). Cellxgene VIP snRNA-seq demo dataset for visualization and DE analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6425901
Explore at:
Dataset updated
Apr 9, 2022
Dataset provided by
BioinfoRx
Biogen
Authors
KEJIE LI; Zhengyu Ouyang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
H5ad file can be used as demo input for Cellxgene VIP. Dataset was the re-process from Schirmer et al Nature 2019 paper by using the raw fastq files. In order to reproduce the h5ad file, details could be found in https://github.com/interactivereport/cellxgene_VIP/blob/master/notebook/MS_Nature_Rowitch_snRNAseq.ipynb Two rds files are also included here which are the input files for sample differential expression (DE) analysis scripts (glmmTMB and Nebula)
Z
10X Genomics Human Visium Spatial Transcriptomics Demo Dataset for Cellxgene...
data-staging.niaid.nih.gov
zenodo.org
Updated Dec 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li, Kejie (2021). 10X Genomics Human Visium Spatial Transcriptomics Demo Dataset for Cellxgene VIP [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_5524882
Explore at:
Dataset updated
Dec 8, 2021
Dataset provided by
Biogen
Authors
Li, Kejie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
4 Visium Spatial Transcriptomics datasets downloaded 10X Genomics data site ,and organized in the way to be used for Cellxgene VIP input.

10X_demo_data_Breast_Cancer_Block_A_Section_1 10X_demo_data_Breast_Cancer_Block_A_Section_2 10X_demo_data_Human_Heart 10X_demo_data_Human_Lymph_Node
S
ScCompass and CELLxGENE Training Datasets--scGPT
scidb.cn
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pengfei Wang (2025). ScCompass and CELLxGENE Training Datasets--scGPT [Dataset]. http://doi.org/10.57760/sciencedb.22043
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.22043
Dataset updated
Mar 14, 2025
Dataset provided by
Science Data Bank
Authors
Pengfei Wang
License
https://mit-license.orghttps://mit-license.org
Description
ScCompass and CELLxGENE Training Datasets: Human and Mouse for scGPT.
scdrs.cellxgene
figshare.com
hdf
Updated Sep 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Jinye Zhang (2021). scdrs.cellxgene [Dataset]. http://doi.org/10.6084/m9.figshare.15065061.v1
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.15065061.v1
Dataset updated
Sep 6, 2021
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Martin Jinye Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
h5ad objects for cellxgene visualization of scDRS results: - scdrs_tmsfacs_thin.h5ad: scDRS results for the TMS FACS data of 110,096 cells (gene count matrix removed to save space)- scdrs_demo.h5ad: demo scDRS results for 3 TMS FACS cell types and 3 diseases (gene count matrix removed to save space)
Cell_Gene_Expression_Metadata
kaggle.com
zip
Updated Sep 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazi Aishikuzzaman (2025). Cell_Gene_Expression_Metadata [Dataset]. https://www.kaggle.com/datasets/kaziaishikuzzaman/cell-gene-expression-metadata
Explore at:
zip(845887409 bytes)Available download formats
Dataset updated
Sep 24, 2025
Authors
Kazi Aishikuzzaman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview This dataset contains comprehensive metadata from single-cell gene expression studies, providing researchers with structured information about cellular phenotypes, experimental conditions, and sample characteristics. The data is particularly valuable for bioinformatics research, machine learning applications in genomics, and comparative studies across different cell types and conditions.

Dataset Description: The dataset comprises metadata associated with single-cell RNA sequencing (scRNA-seq) experiments, including: Cell Type Information: Classification of different cell types and subtypes Experimental Metadata: Details about experimental conditions, protocols, and methodologies Sample Characteristics: Information about biological samples, including tissue origin, developmental stages, and treatment conditions Quality Metrics: Data quality indicators and filtering parameters Annotation Details: Standardized cell type annotations and biological classifications

Data Source and Licensing This dataset is derived from publicly available single-cell gene expression data, potentially sourced from: CELLxGENE Data Portal (https://cellxgene.cziscience.com/) Gene Expression Omnibus (GEO) European Bioinformatics Institute (EBI) Other public genomics repositories

License: Creative Commons CC BY 4.0 (or specify the actual license) ✅ Commercial use allowed ✅ Modification allowed ✅ Distribution allowed ✅ Private use allowed ❗ Attribution required

Research Applications Cell Type Discovery: Identify novel cell types and subtypes Comparative Genomics: Study cellular differences across conditions, tissues, or species Disease Research: Investigate cellular changes in disease states Developmental Biology: Analyze cellular differentiation and development patterns

Machine Learning Applications Classification Tasks: Predict cell types from gene expression data Clustering Analysis: Discover cellular subpopulations and states Dimensionality Reduction: Apply PCA, t-SNE, UMAP for visualization Biomarker Discovery: Identify genes characteristic of specific cell types

Educational Use : Teaching bioinformatics and computational biology concepts. Demonstrating single-cell analysis workflows. Training in data preprocessing and quality control.

Data Quality and Preprocessing : Quality Control: Metadata has been curated and standardized Missing Values: [Specify how missing values are handled] Standardization: Cell type annotations follow established ontologies (e.g., Cell Ontology) Validation: Data has been cross-referenced with original publications

Usage Guidelines : Getting Started- Load the metadata files using pandas or your preferred data analysis tool. Explore the cell type distributions and experimental conditions. Filter data based on quality metrics as needed. Join with corresponding gene expression data for comprehensive analysis.

Best Practices Always cite original data sources and publications. Consider batch effects when combining data from different experiments. Validate findings with independent datasets when possible. Follow established bioinformatics workflows for single-cell analysis.

Citation and Acknowledgments : If you use this dataset in your research, please: Cite this dataset:[Kazi Aishikuzzaman]. (2024). Cell Gene Expression Metadata. Kaggle. https://www.kaggle.com/datasets/kaziaishikuzzaman/cell-gene-expression-metadata

File Structure : dataset- ─ metadata_summary.csv # Main metadata file ─ cell_type_annotations.csv # Detailed cell type information
─ experimental_conditions.csv # Experiment-specific metadata ─ quality_metrics.csv # Data quality indicators ─ README.txt # Detailed file descriptions

Technical Specifications : File Encoding: UTF-8 Separator: Comma-separated values (CSV) Missing Values: Represented as 'NA' or empty cells Data Types: Mixed (categorical, numerical, text)

Contact and Support : For questions about this dataset: Kaggle Profile: @kaziaishikuzzaman Dataset Issues: Use Kaggle's discussion section Collaboration: Open to research collaborations and improvements

Version History : v1.0: Initial release with comprehensive metadata collection [Future versions]: Updates and additional annotations as available

Related Datasets: Consider exploring these complementary datasets- Single-cell gene expression data (companion to this metadata) Cell atlas datasets from major consortiums Disease-specific single-cell studies Multi-omics datasets with matching cell types

Keywords: single-cell, RNA-seq, genomics, cell types, metadata, bioinformatics, machine learning, computational biology Category: Biology > Genomics
Mouse Brain snRNASeq Demo Dataset for Cellxgene VIP
data.niaid.nih.gov
Updated Jun 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li, KEJIE; Sheehan, Mark; Zhang, Baohong (2022). Mouse Brain snRNASeq Demo Dataset for Cellxgene VIP [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6626455
Explore at:
Dataset updated
Jun 10, 2022
Dataset provided by
Biogenhttp://biogen.com/
Authors
Li, KEJIE; Sheehan, Mark; Zhang, Baohong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
snRNASeq data generated at Biogen from 3 control mouse brains. Each brain picked 3 brain regions.

Animal IDs 1, 4 and 7

Brain region codes: W: WhiteMatter H: Hippo G: GreyMatter

10X standard mm10 (3.0.0) reference was used, on cellranger 5.0.0 with --include-introns on.

Single-Cell RNA Data Portal for Alzheimer's Disease

zenodo.org

zip

Updated Apr 30, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Theodoros Siozos; Theodoros Siozos; Christos Petrou; Christos Petrou; ATHANASIOS BALOMENOS; ATHANASIOS BALOMENOS; Yannis Kopsinis; Yannis Kopsinis (2025). Single-Cell RNA Data Portal for Alzheimer's Disease [Dataset]. http://doi.org/10.5281/zenodo.15295744

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15295744

Dataset updated

Apr 30, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Theodoros Siozos; Theodoros Siozos; Christos Petrou; Christos Petrou; ATHANASIOS BALOMENOS; ATHANASIOS BALOMENOS; Yannis Kopsinis; Yannis Kopsinis

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Single-Cell RNA Data Portal for Alzheimer's Disease

The single cell Alzheimer's Disease Data Portal is an aggregated data portal created as part of the Enfield EU Funded program for the single-cell Generative Pretrained Transformer (scGPT-AD) model research. The data portal contains data from the ssREAD data portal, along with single-cell AD data from latest studies (dharsini et al, pan et al, rexach et al). The data from the individual studies where accessed through the cellXgene data portal, a vast portal for single cell data. The data have been uploaded in two seperate .zip files (part1, part2).

The single cell data follow the Annotated Data format. The core data for each sample is the gene-expression matrix, which refers to the level of expression of each gene in a single cell. Additionally, the dataset contains the `.obs` attributed which includes core cell metadata for each of the sample (cell type, brain region, braak stage, donor age, disease condition, donor gender, etc.), along with the gene names accessed via `.var` attribute.

The source data have been processed to create a unified data portal ready to be used as training dataset for a Transformer model. The main processing steps were:

convert ssREAD data from `.qsave` format to `.h5ad` format that aligns with the AnnData framework
discard some unprocessable data samples
standardize metadata column names
process categorical data to create a unified namespace (e.g.: merge `microglia` and `microgrial` cell type names into one)
standardize all gene names to be upper-cased
discard dimensionality reduction and clustering attributes, to make a lightweight version of the data portal, since they are not meant to be used in Transformer model training

Aggregated Data Statistics

Total Cells	2.3M
AD Cells	1.2M
Control Cells	1.1M
Unique Genes	91k
Donors	166

Characteristics of Dataset grouped by Data Source

Data Source

Unique Genes

Total Cells

AD Cells

Control Cells

Donors

Cell Type Label

Brain Region

Tissue Type

Braak Stage

Donors Id

Donor Gender

Donor Age

rexach et al

30k

217k

118k

99k

✅

✘

✅

✘

✅

pan et al

61k

43k

11k

32k

✅

dharsini et al

61k

425k

311k

114k

✅

ssREAD

62k

2.42M

1.14M

1.28M

135

✅

✘

✅

h
tabula-muris-senis-bladder-smartseq2
huggingface.co
Updated Dec 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
2025 Longevity x AI Hackathon (2025). tabula-muris-senis-bladder-smartseq2 [Dataset]. https://huggingface.co/datasets/longevity-db/tabula-muris-senis-bladder-smartseq2
Explore at:
Dataset updated
Dec 26, 2025
Dataset authored and provided by
2025 Longevity x AI Hackathon
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Bladder Tissue from Tabula Muris Senis

Tabula Muris Senis is a mammalian aging single-cell gene expression dataset, downloaded from https://cellxgene.cziscience.com/collections/0b9d8a04-bb9d-44da-aa27-705bb65b54eb. This dataset represents the Bladder tissue, using the SmartSeq2 full-length mRNA library preparation method for single cells. Code to download and process this dataset is available in: https://github.com/seanome/2025-longevity-x-ai-hackathon

Ageing is characterized by a… See the full description on the dataset page: https://huggingface.co/datasets/longevity-db/tabula-muris-senis-bladder-smartseq2.
S
Single Cell Analysis Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Single Cell Analysis Software Report [Dataset]. https://www.datainsightsmarket.com/reports/single-cell-analysis-software-1963380
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Jun 25, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2026 - 2034
Area covered
Global
Variables measured
Market Size
Description
Discover the booming single-cell analysis software market! Our in-depth report reveals key trends, growth drivers, leading companies (Cellenics, BioTuring Browser, 10x Genomics Loupe Browser, etc.), and future projections through 2033. Learn about market segmentation and regional analysis to gain a competitive edge.
Human Retina Cell Atlas reference model
zenodo.org
bin, csv
Updated Nov 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jin Li; Jin Li; Rui Chen; Rui Chen (2024). Human Retina Cell Atlas reference model [Dataset]. http://doi.org/10.5281/zenodo.14014720
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14014720
Dataset updated
Nov 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jin Li; Jin Li; Rui Chen; Rui Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset hosts files needed to reproduce the Human Retina Cell Atlas (HRCA) reference model using scArches. The HRCA data can be accessed through several interactive browsers, including HCA Data Portal, CELLxGENE, UCSC Cell Browser, and the Broad Single Cell Portal. Please use these browsers for atlas exploration and visualization. For more information on HRCA, please refer to the HRCA paper (Li et al., bioRxiv 2023) and the Github repository at https://github.com/RCHENLAB/HRCA_reproducibility. This dataset has been used in the tutorial for the HRCA reference model at https://github.com/RCHENLAB/HRCA_reproducibility/tree/main/scArches.

Data description:

1. HRCA_snRNA_allcells_rawcounts.h5ad

This file contains the cell-by-gene count matrix for over 3.1 million single nuclei and more than 36,000 gene features of the HRCA. Gene features are represented by gene symbols. Please refer to the interactive browsers for atlas exploration, where gene features are mapped to Ensembl IDs. In the cell metadata, "sampleid" indicates sample batches of cells, and "celltype" specifies 123 retina cell types.

2. model.pt

This file is the trained reference model using scArches, incorporating 10,000 highly variable features from the full count matrix. It can be directly used for cell type annotation of new retina samples.

3. HRCA_snRNA_allcells_rawcounts_latent.h5ad

This file contains the embeddings of all 3.1 million reference single nuclei generated by the trained reference model using scArches. These embeddings can be used to compare with the embeddings of query data for exploration.

4. HRCA_reference_model_gene_id_and_symbol.csv

This file contains the mapping of Ensembl IDs to gene symbols for the 10,000 features used in the reference model. This mapping can be used to convert the gene features in a query .h5ad file from gene IDs to gene symbols, allowing cell type labels to be predicted using the trained reference model, which uses gene symbols as gene features.

5. query.h5ad

This file contains a cell-by-gene count matrix for a query dataset, designed to support reproducibility in the HRCA reference model tutorial. The "majorclass" column includes pre-annotated major cell classes. Additional details on the tutorial are available at https://github.com/RCHENLAB/HRCA_reproducibility/tree/main/scArches.

6. query_latent.h5ad

This file contains the embeddings of the query data against the trained reference model. These embeddings can be compared with the reference data embeddings for exploration and visualization.
scRNA-seq "Tabula sapiens" - human, Part 2
kaggle.com
zip
Updated Feb 5, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Chervov (2022). scRNA-seq "Tabula sapiens" - human, Part 2 [Dataset]. https://www.kaggle.com/datasets/alexandervc/scrnaseq-tabula-sapiens-human-part-2
Explore at:
zip(7504637468 bytes)Available download formats
Dataset updated
Feb 5, 2022
Authors
Alexander Chervov
Description
Remark 1: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev

Remark 2: The first of the data see in https://www.kaggle.com/alexandervc/scrnaseq-tabula-sapiens-human-500-000-cells

Data and Context

Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

Particular data: "Tabula Sapiens" project: https://tabula-sapiens-portal.ds.czbiohub.org/ Data section for download: https://figshare.com/articles/dataset/Tabula_Sapiens_release_1_0/14267219 Paper: https://www.science.org/doi/10.1126/science.abl4896 https://www.biorxiv.org/content/10.1101/2021.07.19.452956v2

Tabula Sapiens is a benchmark, first-draft human cell atlas of nearly 500,000 cells from 24 organs of 15 normal human subjects. This work is the product of the Tabula Sapiens Consortium. Special thanks to the Chan Zuckerberg Initiative for funding this project and to the CZI Science Technology team for creating cellxgene, the tool that makes the visualization of this research possible.

See also tutorials:

Course at Sanger's institute https://scrnaseq-course.cog.sanger.ac.uk/website/tabula-muris.html

Course at CZ-hub: https://chanzuckerberg.github.io/scRNA-python-workshop/intro/about

On kaggle - copies of the notebooks and data from the course above https://www.kaggle.com/aayush9753/singlecell-rnaseq-data-from-mouse-brain

Inspiration

Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Z
Single-cell atlas of human kidneys in health, chronic kidney disease and...
datasetcatalog.nlm.nih.gov
Updated Jan 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jafree, Daniyal J; Long, David A; Stewart, Benjamin J; Clatworthy, Menna R (2023). Single-cell atlas of human kidneys in health, chronic kidney disease and transplant rejection [Dataset]. http://doi.org/10.5281/zenodo.7566982
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7566982
Dataset updated
Jan 25, 2023
Authors
Jafree, Daniyal J; Long, David A; Stewart, Benjamin J; Clatworthy, Menna R
Description
This single-cell RNA-sequencing (scRNA-seq) dataset comprises two files: an RData file (combined_data.RData), which can be loaded into RStudio to generate a Seurat object, and an h5ad object (annotated_combined_adata_full.h5ad) for downstream analysis in Scanpy or cellxgene. The dataset contains previously published data and five new samples derived from kidney allografts undergoing graft nephrectomies. Overall, 217,411 human kidney cells are included, including 151,038 ‘control’ cells from living donor biopsies or non-tumorous regions of tumour nephrectomies and 66,373 cells from diseased samples, including chronic kidney disease and different aetiologies of transplant rejection. For full information on generation of the dataset, please see the associated preprint, which has been uploaded to bioRxiv and is available at: https://www.biorxiv.org/content/10.1101/2022.10.28.514222v2. The code used for scRNA-seq analysis is available at: https://github.com/daniyal-jafree1995/
Stack-CellxGene45M
huggingface.co
Updated Jan 9, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arc Institute (2026). Stack-CellxGene45M [Dataset]. https://huggingface.co/datasets/arcinstitute/Stack-CellxGene45M
Explore at:
Dataset updated
Jan 9, 2026
Dataset authored and provided by
Arc Institute
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CellxGene 45M Collection

A curated subset of CellxGene (~45M cells) used to align the Stack model after pretraining on full human scBaseCount.

Selection Criteria

≥ 50,000 cells per dataset ≥ 5 donors per dataset

Cell Type Annotations

Author-annotated coarse-grained cell type labels were heuristically identified and transferred to adata.obs["author_cell_type"].
Single cell and spatial analysis of immune-hot and immune-cold tumours...
zenodo.org
bin
Updated Dec 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Jenkins; Benjamin Jenkins; Gareth Thomas; Gareth Thomas (2024). Single cell and spatial analysis of immune-hot and immune-cold tumours identifies fibroblast subtypes associated with distinct immunological niches and positive immunotherapy response | scRNA-Seq data [Dataset]. http://doi.org/10.5281/zenodo.14284357
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14284357
Dataset updated
Dec 6, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Benjamin Jenkins; Benjamin Jenkins; Gareth Thomas; Gareth Thomas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains scRNA-Seq data related to the Jenkins et al. 2024 study "Single cell and spatial analysis of immune-hot and immune-cold tumours identifies fibroblast subtypes associated with distinct immunological niches and positive immunotherapy response".

HNSCC_fibroblasts_integ_srt.RDS - Seurat object containing fibroblasts from integrated analysis of EPG dataset (https://cellxgene.cziscience.com/collections/3c34e6f1-6827-47dd-8e19-9edcd461893f) with GSE164690 - Relating to Figure 2.

PCFA_srt_obj.RDS - Seurat object containing Pan-Cancer Fibroblast Atlas (PCFA) - Relating to Figures 5-7.
h
single-cell-lung-zarr
huggingface.co
Updated Feb 4, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fahad alghanim (2026). single-cell-lung-zarr [Dataset]. https://huggingface.co/datasets/KokosDev/single-cell-lung-zarr
Explore at:
Dataset updated
Feb 4, 2026
Authors
Fahad alghanim
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Single-cell lung (CellxGene Census) — Zarr

This dataset was exported from the CellxGene Census as a chunked + compressed Zarr store intended for easy streaming access.

Source: CellxGene Census API Organism: Homo sapiens Filter: tissue_general == 'lung' and is_primary_data == True Shape: 100,000 cells × 61,497 genes Zarr path: lung.zarr

Compression

Uncompressed (dense float32): 22.91 GB Compressed Zarr: ~307 MB (322 MB on Hub) Compression ratio: ~76× (Blosc zstd on… See the full description on the dataset page: https://huggingface.co/datasets/KokosDev/single-cell-lung-zarr.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Chan Zuckerberg CELLxGENE Collection [Dataset]. https://bioregistry.io/cellxgene.collection

Chan Zuckerberg CELLxGENE Collection

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

May 7, 2025

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Assigns identifiers to collections of datasets indexed by CELLxGENE.

CELLxGENE is an interactive data visualization and exploration tool developed by the Chan Zuckerberg Initiative that enables researchers to analyze and share single-cell genomics datasets. It provides a user-friendly interface for biologists and computational scientists to interrogate gene expression patterns across different cell types.

Clear search

Close search

Google apps

Main menu

Chan Zuckerberg CELLxGENE Collection

CZ CELLxGENE Discover

CellxGene-100K

CellXGene-100K

DATASET OVERVIEW

Chan Zuckerberg CELLxGENE Dataset

Pretrained checkpoints of models by scCompass and CELLxGENE--scGPT

Cellxgene VIP snRNA-seq demo dataset for visualization and DE analysis

10X Genomics Human Visium Spatial Transcriptomics Demo Dataset for Cellxgene...

ScCompass and CELLxGENE Training Datasets--scGPT

scdrs.cellxgene

Cell_Gene_Expression_Metadata

Mouse Brain snRNASeq Demo Dataset for Cellxgene VIP

Single-Cell RNA Data Portal for Alzheimer's Disease

Single-Cell RNA Data Portal for Alzheimer's Disease

Aggregated Data Statistics

Characteristics of Dataset grouped by Data Source

tabula-muris-senis-bladder-smartseq2

Single Cell Analysis Software Report

Human Retina Cell Atlas reference model

scRNA-seq "Tabula sapiens" - human, Part 2

Data and Context

See also tutorials:

Inspiration

Single-cell atlas of human kidneys in health, chronic kidney disease and...

Stack-CellxGene45M

Single cell and spatial analysis of immune-hot and immune-cold tumours...

single-cell-lung-zarr

Chan Zuckerberg CELLxGENE Collection