Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows:
matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz)
*The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include:
nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().
Facebook
TwitterSCimilarity is a unifying representation of single-cell expression profiles that quantifies similarity between expression states and generalizes to represent new studies without additional training. This enables a novel cell search capability, which sifts through millions of profiles to find cells similar to a query cell state and allows researchers to quickly and systematically leverage massive public scRNA-seq atlases to learn about a cell state of interest.
This repository contains public datasets for SCimilarity tutorials, specifically:
Terms of GSE136831:
Used with permission. Research developed by TLC4PF and the Yale School of Medicine led by Dr. Naftali Kaminski. © 2023 Pulmonary Fibrosis Cell Atlas website and associated content. All rights reserved. Please see the project website for more information: www.IPFCellAtlas.com
In addition, please cite (https://www.science.org/doi/10.1126/sciadv.aba1983 and for a description of the website creation methodology please cite (https://doi.org/10.1152/ajplung.00451.2020).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The single cell Alzheimer's Disease Data Portal is an aggregated data portal created as part of the Enfield EU Funded program for the single-cell Generative Pretrained Transformer (scGPT-AD) model research. The data portal contains data from the ssREAD data portal, along with single-cell AD data from latest studies (dharsini et al, pan et al, rexach et al). The data from the individual studies where accessed through the cellXgene data portal, a vast portal for single cell data. The data have been uploaded in two seperate .zip files (part1, part2).
The single cell data follow the Annotated Data format. The core data for each sample is the gene-expression matrix, which refers to the level of expression of each gene in a single cell. Additionally, the dataset contains the `.obs` attributed which includes core cell metadata for each of the sample (cell type, brain region, braak stage, donor age, disease condition, donor gender, etc.), along with the gene names accessed via `.var` attribute.
The source data have been processed to create a unified data portal ready to be used as training dataset for a Transformer model. The main processing steps were:
|
Total Cells |
2.3M |
|
AD Cells |
1.2M |
|
Control Cells |
1.1M |
|
Unique Genes |
91k |
|
Donors |
166 |
|
Data Source |
Unique Genes |
Total Cells |
AD Cells |
Control Cells |
Donors |
Cell Type Label |
Brain Region |
Tissue Type |
Braak Stage |
Donors Id |
Donor Gender |
Donor Age |
|
rexach et al |
30k |
217k |
118k |
99k |
20 |
✅ |
✘ |
✅ |
✘ |
✅ |
✅ |
✅ |
|
pan et al |
61k |
43k |
11k |
32k |
7 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
dharsini et al |
61k |
425k |
311k |
114k |
46 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
ssREAD |
62k |
2.42M |
1.14M |
1.28M |
135 |
✅ |
✅ |
✘ |
✅ |
✅ |
✅ |
✅ |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single cell RNA seq dataset at the rds format. Readable using the R programming language.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset consists of single-cell RNA sequencing data of bone marrow cells (CD34+ stem cells, GPA+ erythroblasts, ring sideroblasts and mononuclear cells) obtained from multiple healthy bone marrow donors and MDS-RS patients. The objective of this data collection was to assess several parameters on how the bone marrow of MDS-RS patients differs from that of healthy donors.
This dataset includes raw sequencing data in .fastq format, processed count matrices and associated pseudonymized metadata.
Processing: All samples were loaded onto Chromium Single Cell Chips (10x Genomics, CA, USA) at a target capture rate of 10,000 cells per sample. Single cell libraries were prepared using Chromium Next GEM Single Cell 3ʹ Kits v3.1 (10x Genomics) as per the manufacturer’s instructions, except 1µl additive ADT primers were added to the initial cDNA PCR amplification buffer and ADT libraries prepared as described in the Total-Seq B protocol (BioLegend) from the initial cDNA SPRI clean up. Libraries were pooled and sequenced on an Illumina NovaSeq 6000 (Illumina). Read pseudoalignment was performed against the GRCh38.p13 human genome assembly through kallisto v0.46.1 and bustools v0.40.0 was used for barcode and UMI counting.
The dataset consists of 2 folders: - Processed_Count_Matrices - Raw_FASTQ
And one xlsx file: - Sample_key.xlsx
The folder Processed_Count_Matrices contains 1 rds file, 1 tsv file, 9 mtx files, and 18 txt files. The folder Raw_FASTQ contains 27 GNU zipped fastq files, and 5 txt files.
The documentation file File_list_10x.txt contains a full list of the files in the dataset.
The total size of the dataset is approximately 21 GB.
Facebook
TwitterRemark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev
Dataset is downloaded from https://amp.pharm.mssm.edu/archs4/download.html The methods are described in Nature Communications paper: https://www.nature.com/articles/s41467-018-03751-6
The ARCHS4 data provides user-friendly access to multiple gene expression data from the GEO database. (https://www.ncbi.nlm.nih.gov/geo/ ). While in GEO database most of data is stored in raw formats, ARCHS4 provides prepared count matrix expression data. While GEO contains data stored separately for each research paper, ARCHS4 collects all the information in one single matrix. One may consult the main site for further information.
Main data files are in H5 (HD5, Hierarchical Data Format ) file format https://en.wikipedia.org/wiki/Hierarchical_Data_Format It contains expression data, as well as annotation data and futher meta-information. There are several other auxilliary files like TSNE 3d projection (in CSV format) and correlation matrices for genes for human and mouse in feather format.
The main file (for human): human_matrix.h5 - contains data matrix - which is 238522 samples times 35238 genes, as well as, various meta information: gene names, samples information (tissue, etc), references to GEO database id where all the details can be found.
There is also similar data for mouse, csv files with TSNE images, correlation matrices for genes.
The ARCHS4 project is by :
'Alexander Lachmann', 'alexander.lachmann@mssm.edu', update: '2020-02-06'
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset contains 10X single cell 3' RNA sequencing gene expression data from from 38 AML-samples from the subtypes NPM1 (n=12), AML-MR (n=11), TP53 (n=7), CBFB::MYH11 (n=3), RUNX1::RUNX1T1 (n=3), AML without class defining mutations (n=1), and AML meeting the criteria for two subtypes (n=1). In addition, reference samples from normal bone marrow mononuclear cells (n=5) and CD34 sorted cells (n=3) are included. The single cell libraries were constructed from viably frozen cells from bone marrow (n=29+8) or peripheral blood (n=9) using the Chromium Single Cell 3' Library & Gel Bead Kit v3 (10X genomics) and sequenced on a Novaseq 6000 or NextSeq 500.
Data is available in h5 format for each sample, with raw count output from Cellranger, or as a processed Seurat object with scaled expression data, dimension reductions, and metadata.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This folder contains the following files and datasets:Flow Cytometry DataIndividual FCS files - Raw data files obtained following segmentationAnalysis file (pre-transformation) - Data analysis file before transformation, compatible with FCS ExpressAnalysis file (post-transformation) - Data analysis file after transformation, compatible with FCS ExpressDNS format files - Processed files analyzed following data transformationStatistical Analysis and FiguresManuscript figures - All figures from the manuscript in GraphPad Prism format, accessible with Numbers, including statistical test resultsData Extraction and Spatial AnalysisCluster percentages - Excel file containing individual cluster percentages extracted from the analysis fileSpatial neighborhood data - Excel file with all data used as starting point for spatial neighborhood map generationSpatial interaction maps - ZIP archive containing heatmaps showing spatial interactions between individual clustersPlease see the collection for related records https://doi.org/10.25405/data.ncl.c.7890872
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset accompanies the publication "FedscGen: Privacy-Aware Federated Batch Effect Correction of Single-Cell RNA Sequencing Data" and includes eight single-cell RNA sequencing (scRNA-seq) datasets used to benchmark the FedscGen and scGen methods. The datasets are provided in .h5ad format and include comprehensive metadata necessary for replication and further analysis.
We analyze various datasets to compare FedscGen against scGen (centralized) in terms of batch correction. For simplicity, we refer to the dataset by abbreviations:
Cell Line (CL):
Human Dendritic Cells (HDC):
Human Pancreas (HP):
Mouse Brain (MB):
Mouse Cell Atlas (MCA):
Mouse Hematopoietic Stem and Progenitor Cells (MHSPC):
Mouse Retina (MR):
PBMC (human Peripheral Blood Mononuclear Cell):
Usage Notes: Each dataset is provided in .h5ad format, compatible with common single-cell analysis tools such as Scanpy. Detailed metadata is included within each file.
Keywords: Single-cell RNA sequencing, scRNA-seq, Batch effect correction, Privacy-aware, Federated learning, scGen, FedscGen, Clinical multi-center studies, Genomics, Bioinformatics
Contact: For questions or further information, please contact Mohammad Bakhtiari at mohammad.bakhtiari@uni-hamburg.de.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Enriched GO terms for library size normalization. This file is in a tab-separated format and contains the top 200 GO terms that were enriched in the set of DE genes unique to library size normalization. The fields are the same as described for Additional file 2. (13 KB PDF)
Facebook
TwitterGTEx Single-Cell RNA-seq Dataset
This repository provides tools to create a Hugging Face dataset from GTEx single-nucleus RNA-seq data, transforming the hierarchical H5AD format into a flat, ML-ready structure.
Overview
Data Source
The data comes from GTEx's snRNA-seq atlas:
Source: GTEx Portal Publication: Eraslan et al., Science 2022 - "Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function" Content: 209,126… See the full description on the dataset page: https://huggingface.co/datasets/ai-department-lpnu/gtex-single-cell-rnaseq.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
uTILity is a comprehensive, harmonized collection of publicly available single-cell RNA sequencing data from tumor-infiltrating T cells (TILs) with paired T cell receptor (TCR) sequencing. This resource aggregates data from 28 published studies spanning 13 tissue types, 420 unique patients, and over 2.6 million cells, with 1.8 million cells having associated TCR information.
All datasets were uniformly processed using the following pipeline:
This archive contains:
Breast, Colorectal, Lung, Melanoma, Renal, Ovarian, HNSCC, Esophageal, Biliary, Endometrial, Merkel Cell, and multi-cancer cohorts.
Tumor, Normal adjacent tissue, Peripheral blood, Lymph node, Metastatic lesions, and Juxtatumoral tissue.
This data is intended for researchers studying tumor immunology, T cell biology, and computational methods for single-cell analysis. Users can leverage the harmonized annotations and TCR data for:
For analysis code and the processing pipeline, see the associated GitHub repository.
.h5ad (Hierarchical Data Format) AnnData objects compatible with the Python single-cell ecosystem.
Load in Python with:
import scanpy as sc
adata = sc.read_h5ad("adata.h5ad")
Load in R with:
library(Seurat)
obj <- as.Seurat(readRDS("adata.h5ad"))
See metadata_headers.txt in the GitHub repository for complete descriptions: https://github.com/ncborcherding/utility/blob/main/summary/metadata_headers.txt
Key columns:
Borcherding, N. (2025). uTILity: Comprehensive Single-Cell Tumor-Infiltrating Lymphocyte Data with Paired TCR Sequencing (Version 1.0.0) [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.10211240
Facebook
TwitterTime-resolved analysis of nuclear-to-cytoplasmic SMAD2 ratio in individual cells. For some datasets, data regarding motility and cell death is included as well.
Data is provided in CSV format and generally organized in time points (rows) and individual cells (columns). For each experiment, several files are provided:
_data.csv - nuc/cyt SMAD2 ratio _conditions.csv - labeling of experimental conditions _map.csv - vector mapping individual cells to experimental conditions, numbering is according to the order given in the corresponding _conditions.csv file. _timeLine.csv - time points for measurements given in minutes _motility.csv - distance moved per time point given in µm/h _division.csv - number of divisions for each cells _fractiondead.csv - fraction of dead cells per field of view - please note that this data is not resolved at the single cell level!
The MATLAB script "ReproduceFigures.m" allows to reproduce most data panels from the publication and should help to guide you through the data. Effect sizes need to be calculated separately using the function "permTest.m" and the parameters given in the publication.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
GCS PATH:
gs://kds-2dfa91b267e9146f17786893547814ae5688af7ddeab756631a60ffa
A curated dataset of approximately 7,000 healthy human single cells (approx. 1,000 per tissue) sourced from the CellXGene Census, covering seven major tissues: heart, blood, brain, lung, kidney, intestine, and pancreas.
This is 1 of 4 datasets focusing on providing progressively larger, ready-to-use collections of healthy human single-cell RNA sequencing data in the H5AD format.
The goal is to offer standardized benchmarks/datasets derived from CellXGene for exploring fundamental scRNA-seq analysis, understanding multi-tissue cellular composition, developing and testing computational models, and evaluating method scalability across different orders of magnitude.
With its manageable size (approx. 7k total cells), this specific dataset serves as an excellent starting point for exploration, initial model development, or educational purposes.
This dataset provides a focused collection of single-cell transcriptomic profiles representing healthy human tissues, curated from the comprehensive CZ CELLxGENE Discover Census (CellXGene) from the latest (Jan 2025) stable release.
It includes data exclusively from Homo sapiens cells annotated as 'normal' or 'healthy' and in 'cell' suspension. The dataset is specifically balanced to contain approximately 1,000 cells from each of the following seven vital tissues: heart, blood, brain, lung, kidney, intestine, and pancreas.
With a total size of roughly 7,000 cells, this collection offers a manageable yet diverse snapshot of baseline cellular states across different organ systems. It is well-suited for comparative analyses of healthy cell types and gene expression signatures across these tissues, for benchmarking computational analysis tools on a multi-tissue dataset, or for educational exploration of single-cell data principles. This subset provides a representative sample while reducing the computational burden associated with analyzing the full CellXGene Census.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset folders from "TISSUE: uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses". If using the processed data or TISSUE algorithm, please cite: https://doi.org/10.1101/2023.04.25.538326.
The directory of datasets are compressed in tar gzip format. The top level contains folders with dataset names and within each of those folders, there are the relevant data files which include:
- Spatial_count.txt --- a tab-delimited file containing spatial transcriptomics counts matrix
- scRNA_count.txt --- a tab-delimited file containing RNAseq counts matrix
- Locations.txt --- a tab-delimited file containing the (x,y) spatial coordinates of cells in the spatial transcriptomics data
- Metadata.txt --- for some datasets, this is a comma-separated file containing the metadata table for the spatial transcriptomics data
These files are formatted and organized to be read into AnnData objects using the native loading functions in the TISSUE package (https://github.com/sunericd/TISSUE). Some folders will also have additional accessory files such as gene lists corresponding to some experiments present in our manuscript and/or adjacency matrix objects.
Also included are the two simulated spatial transcriptomics datasets that we generated using SRTsim.
The SVZ folders contain our processed MERFISH spatial transcriptomics dataset on the adult mouse subventricular zone. Refer to the SVZFullFinal folder for the full dataset with TISSUE-informed cell labels. All other folders are processed data accessed from publicly available sources. The identity of numbered folders can be found in the Data Availability statement of the benchmarking paper from which they were retrieved: https://doi.org/10.1038/s41592-022-01480-9
"svz_merfish_data.zip" includes the raw MERFISH dataset on the adult mouse subventricular zone.
Facebook
Twitterhttps://ega-archive.org/dacs/EGAC50000000162https://ega-archive.org/dacs/EGAC50000000162
The dataset contains processed sequencing data from Chromium Single Cell 5’ gene expression, human B cell VDJ and feature barcode (CSP) sequencing from transglutaminase 2-specific and other small intestinal plasma cells isolated from four untreated celiac disease patients. The raw sequencing data has been processed with Cell Ranger v.6.0.2 with the multi and aggr functions using the pre-built Cell Ranger references GRCh38 version 2020-A for gene expression and GRCh38-alts-ensembl-5.0.0 for V(D)J analysis. The dataset consists of a gene expression and antibody capture expression matrix (cell barcodes and feature names in tsv.gz file, expression matrix in mtx.gz file) and VDJ sequences in AIRR format (csv file). A metadata file (csv file) details cells passing our custom quality control based on number of detected genes, UMIs, mitochondrial genes, immunoglobulin genes and a productively rearranged immunoglobulin heavy chain of the IgA isotype.
Facebook
TwitterIn this study, we assess technical differences between commonly used single-cell RNA-Sequencing (scRNA-Seq) methods. We perform scRNA-seq on a homogenous population of mouse embryonic stem cells along with two kinds of control spike-in molecules to assess sensitivity and accuracy of these specific methods. In this dataset, we perform STRT-seq method on Fluidigm C1 system and generate single-cell libraries using Nextera XT kit. Please note the sample-data relationship format (SDRF) file for this submission contains only a high-level representation of all sample, library and run information, and not per cell. For meta-data at the level of individual cells, please refer to the supplementary file called single_cells_list.txt, which is included as part of this ArrayExpress submission.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset consists of Smart-seq3 single-cell RNA sequencing data of purified RS from the bone marrow and peripheral blood of 2 MDS-RS patients; and Smart-seq3xpress single-cell RNA sequencing data of FACS-sorted hematopoietic stem cells (HSC), multipotent progenitors (MPP), megakaryocyte-erythroid progenitors (MEP) and erythroblasts from 1 MDS-RS patient. The objective of this data collection was to assess several parameters on how the bone marrow of MDS-RS patients differs from that of healthy donors.
This dataset includes raw sequencing data in .fastq format, processed count matrices and associated pseudonymized metadata.
Processing: In brief, cells were sorted into 384-well plates containing 3uL Vapor-Lock (Qiagen) and 0.3uL lysis buffer consisting of 0.125 µM OligodT30VN (5'-Biotin-ACGAGCATCAGCAGCATACGAT30VN-3'; IDT) adjusted to reverse transcription (RT), 0.5mM dNTPs/each adjusted to RT volume, 0.1% Triton X-100, 5% PEG8000 adjusted to RT volume, 0.4u RNase Inhibitor (Takara Bio, 40 U/µL). After cell sorting plates were briefly centrifuged before storage at -80C. Before RT, plates were denatured at 72 degrees for 10 min followed by addition of 0.1 µL of RT mix; 25 mM Tris-HCL pH 8.4 (Fischer Scientific), 30mM NaCl (Ambion), 1 mM GTP (Thermo Fisher Scientific), 2.5 mM MgCl2 (Ambion), 8 mM DTT (Thermo Fisher Scientific), 0.25 U/µl RNase Inhibitor (Takara Bio), 0.75 µM Template Switching Oligo (TSO) (5′-Biotin-AGAGACAGATTGCGCAATGNNNNNNNNWWrGrGrG-3′; IDT) and 2 U/µl of Maxima H Minus reverse transcriptase (Thermo Fisher Scientific). Plates were quickly centrifuged after dispensing to ensure merge of lysis and RT volumes. RT was incubated at 42 °C for 90 minutes, followed by ten cycles of 50 °C for 2 minutes and 42 °C for 2 minutes. After RT, 0.6 µL PCR mix was dispensed to each well containing the following; 1× SeqAmp PCR buffer (Takara Bio), 0.025 U/µl of SeqAmp polymerase (Takara Bio) and 0.5 µM Smartseq3 forward and reverse primer. Plates were quickly spun down before being incubated as follows: 1 minute at 95 °C for initial denaturation, 14 cycles of 10 seconds at 98 °C, 30 seconds at 65 °C and 2–6 minutes at 68 °C. Final elongation was performed for 10 minutes at 72 °C.
The dataset consists of 2 folders: - SS3_FACS_PB-BM_RS - SS3xpress_FACS_HSC_MPP_MEP_EB
The folder SS3_FACS_PB-BM_RS contains 1 rds file, 3 txt files, and 1 compressed folder (tar.gz) with fastq files. The folder SS3xpress_FACS_HSC_MPP_MEP_EB contains 1 rds file, 7 txt files, and 2 GNU zipped fastq files.
The documentation file File_list_SS3_SS3xpress.txt contains a full list of the files in the dataset.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains single-cell RNA sequencing (scRNA-seq) data processed using the Scanpy pipeline.
It focuses on the GSE145926 dataset from publicly available sources.
The data has been ingested and stored in HDF5 format for easy access and manipulation.
It includes pre-processed expression matrices suitable for downstream analysis.
The dataset enables exploratory analysis using Plotly interactive visualizations.
It allows researchers to examine gene expression patterns at single-cell resolution.
Includes metadata annotations for cell types and experimental conditions.
Facilitates differential expression analysis and cell clustering investigations.
Supports visualization of key immune markers such as CD3E across cell populations.
Designed for bioinformaticians, computational biologists, and immunology researchers.
Provides an end-to-end demonstration of Scanpy workflow in Python.
Enables reproducibility and further expansion for custom analyses.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 11 Pair plots of all the pCA (Brain) implementations.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows:
matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz)
*The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include:
nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().