100+ datasets found
  1. u

    Data from: Reference transcriptomics of porcine peripheral immune cells...

    • agdatacommons.nal.usda.gov
    • datasets.ai
    • +1more
    zip
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juber Herrera-Uribe; Jayne Wiarda; Sathesh K. Sivasankaran; Lance Daharsh; Haibo Liu; Kristen A. Byrne; Timothy P. L. Smith; Joan K. Lunney; Crystal L. Loving; Christopher K. Tuggle (2025). Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing [Dataset]. http://doi.org/10.15482/USDA.ADC/1522411
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    Juber Herrera-Uribe; Jayne Wiarda; Sathesh K. Sivasankaran; Lance Daharsh; Haibo Liu; Kristen A. Byrne; Timothy P. L. Smith; Joan K. Lunney; Crystal L. Loving; Christopher K. Tuggle
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows:

    matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz)

    *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include:

    nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().

  2. SCimilarity Tutorial Data

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    bin
    Updated Sep 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Graham Heimberg; Graham Heimberg; Tony Kuo; Nathaniel Diamant; Nathaniel Diamant; Omar Salem; Omar Salem; Héctor Corrada Bravo; Héctor Corrada Bravo; Jason Vander Heiden; Jason Vander Heiden; Tony Kuo (2024). SCimilarity Tutorial Data [Dataset]. http://doi.org/10.5281/zenodo.13685881
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 4, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Graham Heimberg; Graham Heimberg; Tony Kuo; Nathaniel Diamant; Nathaniel Diamant; Omar Salem; Omar Salem; Héctor Corrada Bravo; Héctor Corrada Bravo; Jason Vander Heiden; Jason Vander Heiden; Tony Kuo
    Description

    SCimilarity is a unifying representation of single-cell expression profiles that quantifies similarity between expression states and generalizes to represent new studies without additional training. This enables a novel cell search capability, which sifts through millions of profiles to find cells similar to a query cell state and allows researchers to quickly and systematically leverage massive public scRNA-seq atlases to learn about a cell state of interest.

    This repository contains public datasets for SCimilarity tutorials, specifically:

    1. A subsample of single-cell data from Adams, et al. Science Advances, 2020 (GSE136831) as an AnnData object in h5ad format.

    Terms of GSE136831:

    Used with permission. Research developed by TLC4PF and the Yale School of Medicine led by Dr. Naftali Kaminski. © 2023 Pulmonary Fibrosis Cell Atlas website and associated content. All rights reserved. Please see the project website for more information: www.IPFCellAtlas.com

    In addition, please cite (https://www.science.org/doi/10.1126/sciadv.aba1983 and for a description of the website creation methodology please cite (https://doi.org/10.1152/ajplung.00451.2020).

  3. Single-Cell RNA Data Portal for Alzheimer's Disease

    • zenodo.org
    zip
    Updated Apr 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theodoros Siozos; Theodoros Siozos; Christos Petrou; Christos Petrou; ATHANASIOS BALOMENOS; ATHANASIOS BALOMENOS; Yannis Kopsinis; Yannis Kopsinis (2025). Single-Cell RNA Data Portal for Alzheimer's Disease [Dataset]. http://doi.org/10.5281/zenodo.15295744
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 30, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Theodoros Siozos; Theodoros Siozos; Christos Petrou; Christos Petrou; ATHANASIOS BALOMENOS; ATHANASIOS BALOMENOS; Yannis Kopsinis; Yannis Kopsinis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Single-Cell RNA Data Portal for Alzheimer's Disease

    The single cell Alzheimer's Disease Data Portal is an aggregated data portal created as part of the Enfield EU Funded program for the single-cell Generative Pretrained Transformer (scGPT-AD) model research. The data portal contains data from the ssREAD data portal, along with single-cell AD data from latest studies (dharsini et al, pan et al, rexach et al). The data from the individual studies where accessed through the cellXgene data portal, a vast portal for single cell data. The data have been uploaded in two seperate .zip files (part1, part2).

    The single cell data follow the Annotated Data format. The core data for each sample is the gene-expression matrix, which refers to the level of expression of each gene in a single cell. Additionally, the dataset contains the `.obs` attributed which includes core cell metadata for each of the sample (cell type, brain region, braak stage, donor age, disease condition, donor gender, etc.), along with the gene names accessed via `.var` attribute.

    The source data have been processed to create a unified data portal ready to be used as training dataset for a Transformer model. The main processing steps were:

    • convert ssREAD data from `.qsave` format to `.h5ad` format that aligns with the AnnData framework
    • discard some unprocessable data samples
    • standardize metadata column names
    • process categorical data to create a unified namespace (e.g.: merge `microglia` and `microgrial` cell type names into one)
    • standardize all gene names to be upper-cased
    • discard dimensionality reduction and clustering attributes, to make a lightweight version of the data portal, since they are not meant to be used in Transformer model training

    Aggregated Data Statistics

    Total Cells

    2.3M

    AD Cells

    1.2M

    Control Cells

    1.1M

    Unique Genes

    91k

    Donors

    166

    Characteristics of Dataset grouped by Data Source

    Data Source

    Unique Genes

    Total Cells

    AD Cells

    Control Cells

    Donors

    Cell Type Label

    Brain Region

    Tissue Type

    Braak Stage

    Donors Id

    Donor Gender

    Donor Age

    rexach et al

    30k

    217k

    118k

    99k

    20

    pan et al

    61k

    43k

    11k

    32k

    7

    dharsini et al

    61k

    425k

    311k

    114k

    46

    ssREAD

    62k

    2.42M

    1.14M

    1.28M

    135

  4. Ageing_Exercise_Single_Cell

    • figshare.com
    application/gzip
    Updated Jul 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Solal Chauquet (2024). Ageing_Exercise_Single_Cell [Dataset]. http://doi.org/10.6084/m9.figshare.21959516.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jul 3, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Solal Chauquet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Single cell RNA seq dataset at the rds format. Readable using the R programming language.

  5. r

    10X single-cell RNA sequencing of bone marrow cells from MDS-RS patients and...

    • researchdata.se
    Updated Nov 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Luis Moura; Eva Hellström-Lindberg (2023). 10X single-cell RNA sequencing of bone marrow cells from MDS-RS patients and healthy donors [Dataset]. http://doi.org/10.48723/nq2a-1e03
    Explore at:
    (1107), (5665)Available download formats
    Dataset updated
    Nov 6, 2023
    Dataset provided by
    Karolinska Institutet
    Authors
    Pedro Luis Moura; Eva Hellström-Lindberg
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    This dataset consists of single-cell RNA sequencing data of bone marrow cells (CD34+ stem cells, GPA+ erythroblasts, ring sideroblasts and mononuclear cells) obtained from multiple healthy bone marrow donors and MDS-RS patients. The objective of this data collection was to assess several parameters on how the bone marrow of MDS-RS patients differs from that of healthy donors.

    This dataset includes raw sequencing data in .fastq format, processed count matrices and associated pseudonymized metadata.

    Processing: All samples were loaded onto Chromium Single Cell Chips (10x Genomics, CA, USA) at a target capture rate of 10,000 cells per sample. Single cell libraries were prepared using Chromium Next GEM Single Cell 3ʹ Kits v3.1 (10x Genomics) as per the manufacturer’s instructions, except 1µl additive ADT primers were added to the initial cDNA PCR amplification buffer and ADT libraries prepared as described in the Total-Seq B protocol (BioLegend) from the initial cDNA SPRI clean up. Libraries were pooled and sequenced on an Illumina NovaSeq 6000 (Illumina). Read pseudoalignment was performed against the GRCh38.p13 human genome assembly through kallisto v0.46.1 and bustools v0.40.0 was used for barcode and UMI counting.

    The dataset consists of 2 folders: - Processed_Count_Matrices - Raw_FASTQ

    And one xlsx file: - Sample_key.xlsx

    The folder Processed_Count_Matrices contains 1 rds file, 1 tsv file, 9 mtx files, and 18 txt files. The folder Raw_FASTQ contains 27 GNU zipped fastq files, and 5 txt files.

    The documentation file File_list_10x.txt contains a full list of the files in the dataset.

    The total size of the dataset is approximately 21 GB.

  6. Multiple Single Cell RNA Expressions ARCHS4

    • kaggle.com
    zip
    Updated Jul 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Chervov (2021). Multiple Single Cell RNA Expressions ARCHS4 [Dataset]. https://www.kaggle.com/alexandervc/multiple-single-cell-rna-expressions-archs4
    Explore at:
    zip(23319014182 bytes)Available download formats
    Dataset updated
    Jul 25, 2021
    Authors
    Alexander Chervov
    Description

    Remark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev

    Context

    Dataset is downloaded from https://amp.pharm.mssm.edu/archs4/download.html The methods are described in Nature Communications paper: https://www.nature.com/articles/s41467-018-03751-6

    The ARCHS4 data provides user-friendly access to multiple gene expression data from the GEO database. (https://www.ncbi.nlm.nih.gov/geo/ ). While in GEO database most of data is stored in raw formats, ARCHS4 provides prepared count matrix expression data. While GEO contains data stored separately for each research paper, ARCHS4 collects all the information in one single matrix. One may consult the main site for further information.

    Main data files are in H5 (HD5, Hierarchical Data Format ) file format https://en.wikipedia.org/wiki/Hierarchical_Data_Format It contains expression data, as well as annotation data and futher meta-information. There are several other auxilliary files like TSNE 3d projection (in CSV format) and correlation matrices for genes for human and mouse in feather format.

    Content

    The main file (for human): human_matrix.h5 - contains data matrix - which is 238522 samples times 35238 genes, as well as, various meta information: gene names, samples information (tissue, etc), references to GEO database id where all the details can be found.

    There is also similar data for mouse, csv files with TSNE images, correlation matrices for genes.

    Acknowledgements

    The ARCHS4 project is by :

    'Alexander Lachmann', 'alexander.lachmann@mssm.edu', update: '2020-02-06'

  7. r

    Single cell sequencing data from: The AML cellular state space unveils NPM1...

    • researchdata.se
    • figshare.scilifelab.se
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Henrik Lilljebjörn; Thoas Fioretos (2025). Single cell sequencing data from: The AML cellular state space unveils NPM1 immune evasion subtypes with distinct clinical outcomes [Dataset]. http://doi.org/10.17044/SCILIFELAB.23715648
    Explore at:
    Dataset updated
    Oct 7, 2025
    Dataset provided by
    Lund University
    Authors
    Henrik Lilljebjörn; Thoas Fioretos
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    This dataset contains 10X single cell 3' RNA sequencing gene expression data from from 38 AML-samples from the subtypes NPM1 (n=12), AML-MR (n=11), TP53 (n=7), CBFB::MYH11 (n=3), RUNX1::RUNX1T1 (n=3), AML without class defining mutations (n=1), and AML meeting the criteria for two subtypes (n=1). In addition, reference samples from normal bone marrow mononuclear cells (n=5) and CD34 sorted cells (n=3) are included. The single cell libraries were constructed from viably frozen cells from bone marrow (n=29+8) or peripheral blood (n=9) using the Chromium Single Cell 3' Library & Gel Bead Kit v3 (10X genomics) and sequenced on a Novaseq 6000 or NextSeq 500.

    Data is available in h5 format for each sample, with raw count output from Cellranger, or as a processed Seurat object with scaled expression data, dimension reductions, and metadata.

  8. n

    Single-cell analysed data

    • data.ncl.ac.uk
    zip
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ioana Nicorescu (2025). Single-cell analysed data [Dataset]. http://doi.org/10.25405/data.ncl.28359179.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 24, 2025
    Dataset provided by
    Newcastle University
    Authors
    Ioana Nicorescu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contains the following files and datasets:Flow Cytometry DataIndividual FCS files - Raw data files obtained following segmentationAnalysis file (pre-transformation) - Data analysis file before transformation, compatible with FCS ExpressAnalysis file (post-transformation) - Data analysis file after transformation, compatible with FCS ExpressDNS format files - Processed files analyzed following data transformationStatistical Analysis and FiguresManuscript figures - All figures from the manuscript in GraphPad Prism format, accessible with Numbers, including statistical test resultsData Extraction and Spatial AnalysisCluster percentages - Excel file containing individual cluster percentages extracted from the analysis fileSpatial neighborhood data - Excel file with all data used as starting point for spatial neighborhood map generationSpatial interaction maps - ZIP archive containing heatmaps showing spatial interactions between individual clustersPlease see the collection for related records https://doi.org/10.25405/data.ncl.c.7890872

  9. FedscGen: privacy-aware federated batch effect correction of single-cell RNA...

    • zenodo.org
    bin
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Bakhtiari; Mohammad Bakhtiari (2025). FedscGen: privacy-aware federated batch effect correction of single-cell RNA sequencing data -- Preprocessed datasets [Dataset]. http://doi.org/10.5281/zenodo.11489844
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mohammad Bakhtiari; Mohammad Bakhtiari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 5, 2024
    Description

    This dataset accompanies the publication "FedscGen: Privacy-Aware Federated Batch Effect Correction of Single-Cell RNA Sequencing Data" and includes eight single-cell RNA sequencing (scRNA-seq) datasets used to benchmark the FedscGen and scGen methods. The datasets are provided in .h5ad format and include comprehensive metadata necessary for replication and further analysis.

    Datasets

    We analyze various datasets to compare FedscGen against scGen (centralized) in terms of batch correction. For simplicity, we refer to the dataset by abbreviations:

    1. Cell Line (CL):

      • Derived from the 293t_jurkat experiment with three batches: Zheng et al., 2017.
    2. Human Dendritic Cells (HDC):

      • scRNA-seq data of human dendritic cells across two batches: Villani et al., 2017.
    3. Human Pancreas (HP):

      • Consolidated data from five sources with 14,767 cells each: Baron et al., 2016; Muraro et al., 2016; Segerstolpe et al., 2016; Wang et al., 2016; Xin et al., 2016.
    4. Mouse Brain (MB):

      • Merged datasets with 691,600 and 141,606 cells: Saunders et al., 2018; Rosenberg et al., 2018.
    5. Mouse Cell Atlas (MCA):

      • Data focusing on 11 cell types from various organs: Han et al., 2018; The Tabula Muris Consortium, 2018.
    6. Mouse Hematopoietic Stem and Progenitor Cells (MHSPC):

      • Data from SMART-seq2 and MARS-seq protocols: Nestorowa et al., 2016; Paul et al., 2015.
    7. Mouse Retina (MR):

      • Data from two unassociated laboratories with 26,830 and 44,808 cells: Macosko et al., 2015; Shekhar et al., 2016.
    8. PBMC (human Peripheral Blood Mononuclear Cell):

      • scRNA-seq data with two batches: Zheng et al., 2017.

    Usage Notes: Each dataset is provided in .h5ad format, compatible with common single-cell analysis tools such as Scanpy. Detailed metadata is included within each file.

    Keywords: Single-cell RNA sequencing, scRNA-seq, Batch effect correction, Privacy-aware, Federated learning, scGen, FedscGen, Clinical multi-center studies, Genomics, Bioinformatics

    Contact: For questions or further information, please contact Mohammad Bakhtiari at mohammad.bakhtiari@uni-hamburg.de.

    License: Creative Commons Attribution 4.0 International (CC BY 4.0)

  10. Additional file 3 of Pooling across cells to normalize single-cell RNA...

    • springernature.figshare.com
    txt
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron L. Lun; Karsten Bach; John Marioni (2023). Additional file 3 of Pooling across cells to normalize single-cell RNA sequencing data with many zero counts [Dataset]. http://doi.org/10.6084/m9.figshare.c.3629252_D2.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Aaron L. Lun; Karsten Bach; John Marioni
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Enriched GO terms for library size normalization. This file is in a tab-separated format and contains the top 200 GO terms that were enriched in the set of DE genes unique to library size normalization. The fields are the same as described for Additional file 2. (13 KB PDF)

  11. h

    gtex-single-cell-rnaseq

    • huggingface.co
    Updated Nov 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lviv Polytechnic National University – Department of Artificial Intelligence Systems (2025). gtex-single-cell-rnaseq [Dataset]. https://huggingface.co/datasets/ai-department-lpnu/gtex-single-cell-rnaseq
    Explore at:
    Dataset updated
    Nov 22, 2025
    Dataset authored and provided by
    Lviv Polytechnic National University – Department of Artificial Intelligence Systems
    Description

    GTEx Single-Cell RNA-seq Dataset

    This repository provides tools to create a Hugging Face dataset from GTEx single-nucleus RNA-seq data, transforming the hierarchical H5AD format into a flat, ML-ready structure.

      Overview
    
    
    
    
    
    
    
      Data Source
    

    The data comes from GTEx's snRNA-seq atlas:

    Source: GTEx Portal Publication: Eraslan et al., Science 2022 - "Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function" Content: 209,126… See the full description on the dataset page: https://huggingface.co/datasets/ai-department-lpnu/gtex-single-cell-rnaseq.

  12. utility: Collection of Tumor-Infiltrating Lymphocyte Single-Cell Experiments...

    • zenodo.org
    Updated Jan 9, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Borcherding; Nicholas Borcherding (2026). utility: Collection of Tumor-Infiltrating Lymphocyte Single-Cell Experiments with TCR [Dataset]. http://doi.org/10.5281/zenodo.17977149
    Explore at:
    Dataset updated
    Jan 9, 2026
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nicholas Borcherding; Nicholas Borcherding
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 11, 2025
    Description

    uTILity is a comprehensive, harmonized collection of publicly available single-cell RNA sequencing data from tumor-infiltrating T cells (TILs) with paired T cell receptor (TCR) sequencing. This resource aggregates data from 28 published studies spanning 13 tissue types, 420 unique patients, and over 2.6 million cells, with 1.8 million cells having associated TCR information.

    Data Processing

    All datasets were uniformly processed using the following pipeline:

    1. Quality Control: Cells with >10% mitochondrial genes and/or 2.5× standard deviation from the mean number of features were excluded. Doublets were identified using scDblFinder.
    2. Annotation: Automated cell type annotation was performed using:
      • SingleR with Human Primary Cell Atlas (HPCA) and Monaco reference datasets
      • Azimuth with the PBMC reference (providing L1, L2, and L3 annotations)
    3. TCR Integration: T cell receptor data was processed using scRepertoire, with clonotypes assigned based on CDR3 amino acid sequences and gene usage.

    Contents

    This archive contains:

    • Seurat Objects (.rds): Fully processed R objects with gene expression, cell metadata, dimensional reductions, and TCR annotations
    • AnnData Files (.h5ad): Python-compatible exports for use with scanpy, scvi-tools, and related ecosystems
    • Processed Data: Intermediate files and per-cohort objects for users who wish to work with individual studies

    Cancer Types Represented

    Breast, Colorectal, Lung, Melanoma, Renal, Ovarian, HNSCC, Esophageal, Biliary, Endometrial, Merkel Cell, and multi-cancer cohorts.

    Tissue Types

    Tumor, Normal adjacent tissue, Peripheral blood, Lymph node, Metastatic lesions, and Juxtatumoral tissue.

    Usage

    This data is intended for researchers studying tumor immunology, T cell biology, and computational methods for single-cell analysis. Users can leverage the harmonized annotations and TCR data for:

    • Pan-cancer T cell phenotype analysis
    • TCR repertoire studies across cancer types
    • Benchmarking integration and annotation methods
    • Training and validating machine learning models

    For analysis code and the processing pipeline, see the associated GitHub repository.

    File Formats

    .h5ad (Hierarchical Data Format) AnnData objects compatible with the Python single-cell ecosystem.

    • X: Raw count matrix (sparse CSR)
    • obs: Cell metadata
    • var: Gene metadata
    • obsm: Embeddings (PCA, UMAP, HARMONY, etc.)

    Load in Python with:

    import scanpy as sc
    adata = sc.read_h5ad("adata.h5ad")

    Load in R with:

    library(Seurat)
    obj <- as.Seurat(readRDS("adata.h5ad"))

    Metadata Columns

    See metadata_headers.txt in the GitHub repository for complete descriptions: https://github.com/ncborcherding/utility/blob/main/summary/metadata_headers.txt

    Key columns:

    • orig.ident: Sample identifier (tumor type + tissue)
    • predicted.celltype.l1/l2/l3: Azimuth annotations
    • Monaco.labels / HPCA.labels: SingleR annotations
    • CTaa: Clonotype by CDR3 amino acid sequence
    • clonalFrequency: Clone count within sample
    • clonalProportion: Clone proportion within sample

    SUGGESTED CITATION FORMAT

    Borcherding, N. (2025). uTILity: Comprehensive Single-Cell Tumor-Infiltrating Lymphocyte Data with Paired TCR Sequencing (Version 1.0.0) [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.10211240

  13. r

    Single cell data

    • resodate.org
    Updated Feb 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefan Bohn; Lorenz Hexemer; Zixin Huang; Laura Strohmaier; Sonja Lenhardt; Stefan Legewie; Alexander Loewer (2023). Single cell data [Dataset]. https://resodate.org/resources/aHR0cHM6Ly90dWRhdGFsaWIudWxiLnR1LWRhcm1zdGFkdC5kZS9oYW5kbGUvdHVkYXRhbGliLzM3MjUuMg==
    Explore at:
    Dataset updated
    Feb 2, 2023
    Dataset provided by
    Technische Universität Darmstadt
    TUdatalib
    Authors
    Stefan Bohn; Lorenz Hexemer; Zixin Huang; Laura Strohmaier; Sonja Lenhardt; Stefan Legewie; Alexander Loewer
    Description

    Time-resolved analysis of nuclear-to-cytoplasmic SMAD2 ratio in individual cells. For some datasets, data regarding motility and cell death is included as well.

    Data is provided in CSV format and generally organized in time points (rows) and individual cells (columns). For each experiment, several files are provided:

    _data.csv - nuc/cyt SMAD2 ratio _conditions.csv - labeling of experimental conditions _map.csv - vector mapping individual cells to experimental conditions, numbering is according to the order given in the corresponding _conditions.csv file. _timeLine.csv - time points for measurements given in minutes _motility.csv - distance moved per time point given in µm/h _division.csv - number of divisions for each cells _fractiondead.csv - fraction of dead cells per field of view - please note that this data is not resolved at the single cell level!

    The MATLAB script "ReproduceFigures.m" allows to reproduce most data panels from the publication and should help to guide you through the data. Effect sizes need to be calculated separately using the function "permTest.m" and the parameters given in the publication.

  14. CellxGene-1K

    • kaggle.com
    zip
    Updated Apr 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darien Schettler (2025). CellxGene-1K [Dataset]. https://www.kaggle.com/datasets/dschettler8845/cellxgene-1k/data
    Explore at:
    zip(64274758 bytes)Available download formats
    Dataset updated
    Apr 5, 2025
    Authors
    Darien Schettler
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    CellXGene-1K

    GCS PATH: gs://kds-2dfa91b267e9146f17786893547814ae5688af7ddeab756631a60ffa


    DATASET OVERVIEW

    A curated dataset of approximately 7,000 healthy human single cells (approx. 1,000 per tissue) sourced from the CellXGene Census, covering seven major tissues: heart, blood, brain, lung, kidney, intestine, and pancreas.

    This is 1 of 4 datasets focusing on providing progressively larger, ready-to-use collections of healthy human single-cell RNA sequencing data in the H5AD format.

    The goal is to offer standardized benchmarks/datasets derived from CellXGene for exploring fundamental scRNA-seq analysis, understanding multi-tissue cellular composition, developing and testing computational models, and evaluating method scalability across different orders of magnitude.

    With its manageable size (approx. 7k total cells), this specific dataset serves as an excellent starting point for exploration, initial model development, or educational purposes.

    Additional Information

    This dataset provides a focused collection of single-cell transcriptomic profiles representing healthy human tissues, curated from the comprehensive CZ CELLxGENE Discover Census (CellXGene) from the latest (Jan 2025) stable release.

    It includes data exclusively from Homo sapiens cells annotated as 'normal' or 'healthy' and in 'cell' suspension. The dataset is specifically balanced to contain approximately 1,000 cells from each of the following seven vital tissues: heart, blood, brain, lung, kidney, intestine, and pancreas.

    With a total size of roughly 7,000 cells, this collection offers a manageable yet diverse snapshot of baseline cellular states across different organ systems. It is well-suited for comparative analyses of healthy cell types and gene expression signatures across these tissues, for benchmarking computational analysis tools on a multi-tissue dataset, or for educational exploration of single-cell data principles. This subset provides a representative sample while reducing the computational burden associated with analyzing the full CellXGene Census.

  15. Single-cell Spatial Transcriptomics Data with Paired RNAseq for TISSUE...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    application/gzip, zip
    Updated Jan 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Sun; Eric Sun (2024). Single-cell Spatial Transcriptomics Data with Paired RNAseq for TISSUE spatial gene expression prediction [Dataset]. http://doi.org/10.5281/zenodo.8259942
    Explore at:
    application/gzip, zipAvailable download formats
    Dataset updated
    Jan 8, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Eric Sun; Eric Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset folders from "TISSUE: uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses". If using the processed data or TISSUE algorithm, please cite: https://doi.org/10.1101/2023.04.25.538326.

    The directory of datasets are compressed in tar gzip format. The top level contains folders with dataset names and within each of those folders, there are the relevant data files which include:

    - Spatial_count.txt --- a tab-delimited file containing spatial transcriptomics counts matrix

    - scRNA_count.txt --- a tab-delimited file containing RNAseq counts matrix

    - Locations.txt --- a tab-delimited file containing the (x,y) spatial coordinates of cells in the spatial transcriptomics data

    - Metadata.txt --- for some datasets, this is a comma-separated file containing the metadata table for the spatial transcriptomics data

    These files are formatted and organized to be read into AnnData objects using the native loading functions in the TISSUE package (https://github.com/sunericd/TISSUE). Some folders will also have additional accessory files such as gene lists corresponding to some experiments present in our manuscript and/or adjacency matrix objects.

    Also included are the two simulated spatial transcriptomics datasets that we generated using SRTsim.

    The SVZ folders contain our processed MERFISH spatial transcriptomics dataset on the adult mouse subventricular zone. Refer to the SVZFullFinal folder for the full dataset with TISSUE-informed cell labels. All other folders are processed data accessed from publicly available sources. The identity of numbered folders can be found in the Data Availability statement of the benchmarking paper from which they were retrieved: https://doi.org/10.1038/s41592-022-01480-9

    "svz_merfish_data.zip" includes the raw MERFISH dataset on the adult mouse subventricular zone.

  16. E

    Processed Chromium Single Cell GEX, CSP and VDJ data from intestinal plasma...

    • ega-archive.org
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Processed Chromium Single Cell GEX, CSP and VDJ data from intestinal plasma cells of untreated celiac disease patients [Dataset]. https://ega-archive.org/datasets/EGAD50000000339
    Explore at:
    Dataset updated
    Apr 18, 2024
    License

    https://ega-archive.org/dacs/EGAC50000000162https://ega-archive.org/dacs/EGAC50000000162

    Description

    The dataset contains processed sequencing data from Chromium Single Cell 5’ gene expression, human B cell VDJ and feature barcode (CSP) sequencing from transglutaminase 2-specific and other small intestinal plasma cells isolated from four untreated celiac disease patients. The raw sequencing data has been processed with Cell Ranger v.6.0.2 with the multi and aggr functions using the pre-built Cell Ranger references GRCh38 version 2020-A for gene expression and GRCh38-alts-ensembl-5.0.0 for V(D)J analysis. The dataset consists of a gene expression and antibody capture expression matrix (cell barcodes and feature names in tsv.gz file, expression matrix in mtx.gz file) and VDJ sequences in AIRR format (csv file). A metadata file (csv file) details cells passing our custom quality control based on number of detected genes, UMIs, mitochondrial genes, immunoglobulin genes and a productively rearranged immunoglobulin heavy chain of the IgA isotype.

  17. e

    Single cell RNA-sequencing of Spike-ins and mESC using STRT-Seq on C1 System...

    • ebi.ac.uk
    Updated Mar 14, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guy Emerton; Valentine Svensson (2017). Single cell RNA-sequencing of Spike-ins and mESC using STRT-Seq on C1 System [Dataset]. https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-5482/
    Explore at:
    Dataset updated
    Mar 14, 2017
    Authors
    Guy Emerton; Valentine Svensson
    Description

    In this study, we assess technical differences between commonly used single-cell RNA-Sequencing (scRNA-Seq) methods. We perform scRNA-seq on a homogenous population of mouse embryonic stem cells along with two kinds of control spike-in molecules to assess sensitivity and accuracy of these specific methods. In this dataset, we perform STRT-seq method on Fluidigm C1 system and generate single-cell libraries using Nextera XT kit. Please note the sample-data relationship format (SDRF) file for this submission contains only a high-level representation of all sample, library and run information, and not per cell. For meta-data at the level of individual cells, please refer to the supplementary file called single_cells_list.txt, which is included as part of this ArrayExpress submission.

  18. r

    Smart-seq3 and Smart-seq3xpress single-cell RNA sequencing of bone marrow...

    • researchdata.se
    Updated Nov 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Luis Moura; Eva Hellström-Lindberg (2023). Smart-seq3 and Smart-seq3xpress single-cell RNA sequencing of bone marrow cells from MDS-RS patients [Dataset]. http://doi.org/10.48723/0f0c-p816
    Explore at:
    (825), (1600), (830)Available download formats
    Dataset updated
    Nov 6, 2023
    Dataset provided by
    Karolinska Institutet
    Authors
    Pedro Luis Moura; Eva Hellström-Lindberg
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    This dataset consists of Smart-seq3 single-cell RNA sequencing data of purified RS from the bone marrow and peripheral blood of 2 MDS-RS patients; and Smart-seq3xpress single-cell RNA sequencing data of FACS-sorted hematopoietic stem cells (HSC), multipotent progenitors (MPP), megakaryocyte-erythroid progenitors (MEP) and erythroblasts from 1 MDS-RS patient. The objective of this data collection was to assess several parameters on how the bone marrow of MDS-RS patients differs from that of healthy donors.

    This dataset includes raw sequencing data in .fastq format, processed count matrices and associated pseudonymized metadata.

    Processing: In brief, cells were sorted into 384-well plates containing 3uL Vapor-Lock (Qiagen) and 0.3uL lysis buffer consisting of 0.125 µM OligodT30VN (5'-Biotin-ACGAGCATCAGCAGCATACGAT30VN-3'; IDT) adjusted to reverse transcription (RT), 0.5mM dNTPs/each adjusted to RT volume, 0.1% Triton X-100, 5% PEG8000 adjusted to RT volume, 0.4u RNase Inhibitor (Takara Bio, 40 U/µL). After cell sorting plates were briefly centrifuged before storage at -80C. Before RT, plates were denatured at 72 degrees for 10 min followed by addition of 0.1 µL of RT mix; 25 mM Tris-HCL pH 8.4 (Fischer Scientific), 30mM NaCl (Ambion), 1 mM GTP (Thermo Fisher Scientific), 2.5 mM MgCl2 (Ambion), 8 mM DTT (Thermo Fisher Scientific), 0.25 U/µl RNase Inhibitor (Takara Bio), 0.75 µM Template Switching Oligo (TSO) (5′-Biotin-AGAGACAGATTGCGCAATGNNNNNNNNWWrGrGrG-3′; IDT) and 2 U/µl of Maxima H Minus reverse transcriptase (Thermo Fisher Scientific). Plates were quickly centrifuged after dispensing to ensure merge of lysis and RT volumes. RT was incubated at 42 °C for 90 minutes, followed by ten cycles of 50 °C for 2 minutes and 42 °C for 2 minutes. After RT, 0.6 µL PCR mix was dispensed to each well containing the following; 1× SeqAmp PCR buffer (Takara Bio), 0.025 U/µl of SeqAmp polymerase (Takara Bio) and 0.5 µM Smartseq3 forward and reverse primer. Plates were quickly spun down before being incubated as follows: 1 minute at 95 °C for initial denaturation, 14 cycles of 10 seconds at 98 °C, 30 seconds at 65 °C and 2–6 minutes at 68 °C. Final elongation was performed for 10 minutes at 72 °C.

    The dataset consists of 2 folders: - SS3_FACS_PB-BM_RS - SS3xpress_FACS_HSC_MPP_MEP_EB

    The folder SS3_FACS_PB-BM_RS contains 1 rds file, 3 txt files, and 1 compressed folder (tar.gz) with fastq files. The folder SS3xpress_FACS_HSC_MPP_MEP_EB contains 1 rds file, 7 txt files, and 2 GNU zipped fastq files.

    The documentation file File_list_SS3_SS3xpress.txt contains a full list of the files in the dataset.

  19. Scanpy Pipeline GSE145926 HDF5 Ingestion Plotly

    • kaggle.com
    zip
    Updated Dec 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). Scanpy Pipeline GSE145926 HDF5 Ingestion Plotly [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/scanpy-pipeline-gse145926-hdf5-ingestion-plotly
    Explore at:
    zip(4663836 bytes)Available download formats
    Dataset updated
    Dec 4, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains single-cell RNA sequencing (scRNA-seq) data processed using the Scanpy pipeline.

    It focuses on the GSE145926 dataset from publicly available sources.

    The data has been ingested and stored in HDF5 format for easy access and manipulation.

    It includes pre-processed expression matrices suitable for downstream analysis.

    The dataset enables exploratory analysis using Plotly interactive visualizations.

    It allows researchers to examine gene expression patterns at single-cell resolution.

    Includes metadata annotations for cell types and experimental conditions.

    Facilitates differential expression analysis and cell clustering investigations.

    Supports visualization of key immune markers such as CD3E across cell populations.

    Designed for bioinformaticians, computational biologists, and immunology researchers.

    Provides an end-to-end demonstration of Scanpy workflow in Python.

    Enables reproducibility and further expansion for custom analyses.

  20. MOESM11 of Benchmarking principal component analysis for large-scale...

    • springernature.figshare.com
    application/x-gzip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koki Tsuyuzaki; Hiroyuki Sato; Kenta Sato; Itoshi Nikaido (2023). MOESM11 of Benchmarking principal component analysis for large-scale single-cell RNA-sequencing [Dataset]. http://doi.org/10.6084/m9.figshare.11662101.v1
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Koki Tsuyuzaki; Hiroyuki Sato; Kenta Sato; Itoshi Nikaido
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 11 Pair plots of all the pCA (Brain) implementations.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Juber Herrera-Uribe; Jayne Wiarda; Sathesh K. Sivasankaran; Lance Daharsh; Haibo Liu; Kristen A. Byrne; Timothy P. L. Smith; Joan K. Lunney; Crystal L. Loving; Christopher K. Tuggle (2025). Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing [Dataset]. http://doi.org/10.15482/USDA.ADC/1522411

Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Nov 21, 2025
Dataset provided by
Ag Data Commons
Authors
Juber Herrera-Uribe; Jayne Wiarda; Sathesh K. Sivasankaran; Lance Daharsh; Haibo Liu; Kristen A. Byrne; Timothy P. L. Smith; Joan K. Lunney; Crystal L. Loving; Christopher K. Tuggle
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows:

matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz)

*The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include:

nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().

Search
Clear search
Close search
Google apps
Main menu