4 datasets found
  1. Single-Cell RNA Data Portal for Alzheimer's Disease

    • zenodo.org
    zip
    Updated Apr 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theodoros Siozos; Theodoros Siozos; Christos Petrou; Christos Petrou; ATHANASIOS BALOMENOS; ATHANASIOS BALOMENOS; Yannis Kopsinis; Yannis Kopsinis (2025). Single-Cell RNA Data Portal for Alzheimer's Disease [Dataset]. http://doi.org/10.5281/zenodo.15295744
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 30, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Theodoros Siozos; Theodoros Siozos; Christos Petrou; Christos Petrou; ATHANASIOS BALOMENOS; ATHANASIOS BALOMENOS; Yannis Kopsinis; Yannis Kopsinis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Single-Cell RNA Data Portal for Alzheimer's Disease

    The single cell Alzheimer's Disease Data Portal is an aggregated data portal created as part of the Enfield EU Funded program for the single-cell Generative Pretrained Transformer (scGPT-AD) model research. The data portal contains data from the ssREAD data portal, along with single-cell AD data from latest studies (dharsini et al, pan et al, rexach et al). The data from the individual studies where accessed through the cellXgene data portal, a vast portal for single cell data. The data have been uploaded in two seperate .zip files (part1, part2).

    The single cell data follow the Annotated Data format. The core data for each sample is the gene-expression matrix, which refers to the level of expression of each gene in a single cell. Additionally, the dataset contains the `.obs` attributed which includes core cell metadata for each of the sample (cell type, brain region, braak stage, donor age, disease condition, donor gender, etc.), along with the gene names accessed via `.var` attribute.

    The source data have been processed to create a unified data portal ready to be used as training dataset for a Transformer model. The main processing steps were:

    • convert ssREAD data from `.qsave` format to `.h5ad` format that aligns with the AnnData framework
    • discard some unprocessable data samples
    • standardize metadata column names
    • process categorical data to create a unified namespace (e.g.: merge `microglia` and `microgrial` cell type names into one)
    • standardize all gene names to be upper-cased
    • discard dimensionality reduction and clustering attributes, to make a lightweight version of the data portal, since they are not meant to be used in Transformer model training

    Aggregated Data Statistics

    Total Cells

    2.3M

    AD Cells

    1.2M

    Control Cells

    1.1M

    Unique Genes

    91k

    Donors

    166

    Characteristics of Dataset grouped by Data Source

    Data Source

    Unique Genes

    Total Cells

    AD Cells

    Control Cells

    Donors

    Cell Type Label

    Brain Region

    Tissue Type

    Braak Stage

    Donors Id

    Donor Gender

    Donor Age

    rexach et al

    30k

    217k

    118k

    99k

    20

    pan et al

    61k

    43k

    11k

    32k

    7

    dharsini et al

    61k

    425k

    311k

    114k

    46

    ssREAD

    62k

    2.42M

    1.14M

    1.28M

    135

  2. Tabula sapiens filtered data

    • zenodo.org
    bin
    Updated Feb 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Can Ergen; Can Ergen (2023). Tabula sapiens filtered data [Dataset]. http://doi.org/10.5281/zenodo.7587774
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 2, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Can Ergen; Can Ergen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contains pre-filtered files of Tabula sapiens per tissue used to generate scvi models stored in scvi-hub. Due to inconsistencies in the cell-type resolution across donors data was filtered. Please refer to pre-processed files as adata object for the trained scvi models which contains gene filtered and minified data for the models.

    Data is preprocessed data downloaded from https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5. Please refer to their data usage guide before reusing the data.

  3. f

    Kuppe snRNA-seq Human Heart 2022 control

    • figshare.com
    hdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Single-cell best practices (2023). Kuppe snRNA-seq Human Heart 2022 control [Dataset]. http://doi.org/10.6084/m9.figshare.22133015.v2
    Explore at:
    hdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Single-cell best practices
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset published by Kuppe et al. 2022, which contains data from human cardiac remodelling after myocardial infarction using single-cell gene expression, chromatin accessibility and spatial transcriptomic profiling of multiple physiological zones in myocardium from patients with myocardial infarction and controls.

    The dataset available under this link contains the single-cell gene expression part from all control patients. Citation Kuppe, C., Ramirez Flores, R.O., Li, Z. et al. Spatial multi-omic map of human myocardial infarction. Nature 608, 766–777 (2022). Manuscript link https://www.nature.com/articles/s41586-022-05060-x Original data link https://cellxgene.cziscience.com/collections/8191c283-0816-424b-9b61-c3e1d6258a77

  4. Tabula sapiens scvi-tools models for scvi hub

    • zenodo.org
    application/gzip, bin
    Updated Dec 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Can Ergen; Can Ergen (2024). Tabula sapiens scvi-tools models for scvi hub [Dataset]. http://doi.org/10.5281/zenodo.14286626
    Explore at:
    bin, application/gzipAvailable download formats
    Dataset updated
    Dec 6, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Can Ergen; Can Ergen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 5, 2024
    Description

    These are pre-trained models and AnnData datasets based on Tabula sapiens. Models were subsequentially uploaded to scvi-hub and this repository is there to restore the models on hugging face.

    Data is preprocessed data downloaded from https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5. Please refer to their data usage guide before reusing the data.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Theodoros Siozos; Theodoros Siozos; Christos Petrou; Christos Petrou; ATHANASIOS BALOMENOS; ATHANASIOS BALOMENOS; Yannis Kopsinis; Yannis Kopsinis (2025). Single-Cell RNA Data Portal for Alzheimer's Disease [Dataset]. http://doi.org/10.5281/zenodo.15295744
Organization logo

Single-Cell RNA Data Portal for Alzheimer's Disease

Explore at:
zipAvailable download formats
Dataset updated
Apr 30, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Theodoros Siozos; Theodoros Siozos; Christos Petrou; Christos Petrou; ATHANASIOS BALOMENOS; ATHANASIOS BALOMENOS; Yannis Kopsinis; Yannis Kopsinis
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Single-Cell RNA Data Portal for Alzheimer's Disease

The single cell Alzheimer's Disease Data Portal is an aggregated data portal created as part of the Enfield EU Funded program for the single-cell Generative Pretrained Transformer (scGPT-AD) model research. The data portal contains data from the ssREAD data portal, along with single-cell AD data from latest studies (dharsini et al, pan et al, rexach et al). The data from the individual studies where accessed through the cellXgene data portal, a vast portal for single cell data. The data have been uploaded in two seperate .zip files (part1, part2).

The single cell data follow the Annotated Data format. The core data for each sample is the gene-expression matrix, which refers to the level of expression of each gene in a single cell. Additionally, the dataset contains the `.obs` attributed which includes core cell metadata for each of the sample (cell type, brain region, braak stage, donor age, disease condition, donor gender, etc.), along with the gene names accessed via `.var` attribute.

The source data have been processed to create a unified data portal ready to be used as training dataset for a Transformer model. The main processing steps were:

  • convert ssREAD data from `.qsave` format to `.h5ad` format that aligns with the AnnData framework
  • discard some unprocessable data samples
  • standardize metadata column names
  • process categorical data to create a unified namespace (e.g.: merge `microglia` and `microgrial` cell type names into one)
  • standardize all gene names to be upper-cased
  • discard dimensionality reduction and clustering attributes, to make a lightweight version of the data portal, since they are not meant to be used in Transformer model training

Aggregated Data Statistics

Total Cells

2.3M

AD Cells

1.2M

Control Cells

1.1M

Unique Genes

91k

Donors

166

Characteristics of Dataset grouped by Data Source

Data Source

Unique Genes

Total Cells

AD Cells

Control Cells

Donors

Cell Type Label

Brain Region

Tissue Type

Braak Stage

Donors Id

Donor Gender

Donor Age

rexach et al

30k

217k

118k

99k

20

pan et al

61k

43k

11k

32k

7

dharsini et al

61k

425k

311k

114k

46

ssREAD

62k

2.42M

1.14M

1.28M

135

Search
Clear search
Close search
Google apps
Main menu