100+ datasets found
  1. Raw and processed (filtered and annotated) scRNAseq data

    • figshare.com
    zip
    Updated Jun 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabrielle Leclercq-Cohen; Sabrina Danilin; Llucia Alberti-Servera; Stephan Schmeing; Hélène Haegel; Sina Nassiri; Marina Bacac (2023). Raw and processed (filtered and annotated) scRNAseq data [Dataset]. http://doi.org/10.6084/m9.figshare.23499192.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Gabrielle Leclercq-Cohen; Sabrina Danilin; Llucia Alberti-Servera; Stephan Schmeing; Hélène Haegel; Sina Nassiri; Marina Bacac
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Single cell RNA-seq data generated and reported as part of the manuscript entitled "Dissecting the mechanisms underlying the Cytokine Release Syndrome (CRS) mediated by T Cell Bispecific Antibodies" by Leclercq-Cohen et al 2023. Raw and processed (filtered and annotated) data are provided as AnnData objects which can be directly ingested to reproduce the findings of the paper or for ab initio data reuse: 1- raw.zip provides concatenated raw/unfiltered counts for the 20 samples in the standard Market Exchange Format (MEX) format. 2- 230330_sw_besca2_LowFil_raw.h5ad contains filtered cells and raw counts in the HDF5 format. 3- 221124_sw_besca2_LowFil.annotated.h5ad contains filtered cells and log normalized counts, along with cell type annotation in the HDF5 format.

    scRNAseq data generation: Whole blood from 4 donors was treated with 0.2 μg/mL CD20-TCB, or incubated in the absence of CD20- TCB. At baseline (before addition of TCB) and assay endpoints (2, 4, 6, and 20 hrs), blood was collected for total leukocyte isolation using EasySepTM red blood cell depletion reagent (Stemcell). Briefly, cells were counted and processed for single cell RNA sequencing using the BD Rhapsody platform. To load several samples on a single BD Rhapsody cartridge, sample cells were labelled with sample tags (BD Human Single-Cell Multiplexing Kit) following the manufacturer’s protocol prior to pooling. Briefly, 1x106 cells from each sample were re-suspended in 180 μL FBS Stain Buffer (BD, PharMingen) and sample tags were added to the respective samples and incubated for 20 min at RT. After incubation, 2 successive washes were performed by addition of 2 mL stain buffer and centrifugation for 5 min at 300 g. Cells were then re- suspended in 620 μL cold BD Sample Buffer, stained with 3.1 μL of both 2 mM Calcein AM (Thermo Fisher Scientific) and 0.3 mM Draq7 (BD Biosciences) and finally counted on the BD Rhapsody scanner. Samples were then diluted and/or pooled equally in 650 μL cold BD Sample Buffer. The BD Rhapsody cartridges were then loaded with up to 40 000 – 50 000 cells. Single cells were isolated using Single-Cell Capture and cDNA Synthesis with the BD Rhapsody Express Single-Cell Analysis System according to the manufacturer’s recommendations (BD Biosciences). cDNA libraries were prepared using the Whole Transcriptome Analysis Amplification Kit following the BD Rhapsody System mRNA Whole Transcriptome Analysis (WTA) and Sample Tag Library Preparation Protocol (BD Biosciences). Indexed WTA and sample tags libraries were quantified and quality controlled on the Qubit Fluorometer using the Qubit dsDNA HS Assay, and on the Agilent 2100 Bioanalyzer system using the Agilent High Sensitivity DNA Kit. Sequencing was performed on a Novaseq 6000 (Illumina) in paired-end mode (64-8- 58) with Novaseq6000 S2 v1 or Novaseq6000 SP v1.5 reagents kits (100 cycles). scRNAseq data analysis: Sequencing data was processed using the BD Rhapsody Analysis pipeline (v 1.0 https://www.bd.com/documents/guides/user-guides/GMX_BD-Rhapsody-genomics- informatics_UG_EN.pdf) on the Seven Bridges Genomics platform. Briefly, read pairs with low sequencing quality were first removed and the cell label and UMI identified for further quality check and filtering. Valid reads were then mapped to the human reference genome (GRCh38-PhiX-gencodev29) using the aligner Bowtie2 v2.2.9, and reads with the same cell label, same UMI sequence and same gene were collapsed into a single raw molecule while undergoing further error correction and quality checks. Cell labels were filtered with a multi-step algorithm to distinguish those associated with putative cells from those associated with noise. After determining the putative cells, each cell was assigned to the sample of origin through the sample tag (only for cartridges with multiplex loading). Finally, the single-cell gene expression matrices were generated and a metrics summary was provided. After pre-processing with BD’s pipeline, the count matrices and metadata of each sample were aggregated into a single adata object and loaded into the besca v2.3 pipeline for the single cell RNA sequencing analysis (43). First, we filtered low quality cells with less than 200 genes, less than 500 counts or more than 30% of mitochondrial reads. This permissive filtering was used in order to preserve the neutrophils. We further excluded potential multiplets (cells with more than 5,000 genes or 20,000 counts), and genes expressed in less than 30 cells. Normalization, log-transformed UMI counts per 10,000 reads [log(CP10K+1)], was applied before downstream analysis. After normalization, technical variance was removed by regressing out the effects of total UMI counts and percentage of mitochondrial reads, and gene expression was scaled. The 2,507 most variable genes (having a minimum mean expression of 0.0125, a maximum mean expression of 3 and a minimum dispersion of 0.5) were used for principal component analysis. Finally, the first 50 PCs were used as input for calculating the 10 nearest neighbours and the neighbourhood graph was then embedded into the two-dimensional space using the UMAP algorithm at a resolution of 2. Cell type annotation was performed using the Sig-annot semi-automated besca module, which is a signature- based hierarchical cell annotation method. The used signatures, configuration and nomenclature files can be found at https://github.com/bedapub/besca/tree/master/besca/datasets. For more details, please refer to the publication.

  2. c

    Data from: Reference transcriptomics of porcine peripheral immune cells...

    • s.cnmilf.com
    • agdatacommons.nal.usda.gov
    • +3more
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/data-from-reference-transcriptomics-of-porcine-peripheral-immune-cells-created-through-bul-e667c
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows: matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz) *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include: nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().

  3. z

    Single-cell RNA-Seq and TCR-Seq analysis of PD-1+ CD8+ T-cells responding to...

    • zenodo.org
    bin, csv, zip
    Updated Oct 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bertram Bengsch; Bertram Bengsch; Sagar; Sagar; Zhen Zhang; Zhen Zhang (2024). Single-cell RNA-Seq and TCR-Seq analysis of PD-1+ CD8+ T-cells responding to anti-PD-1 and anti-PD-1/CTLA-4 immunotherapy in melanoma [Dataset]. http://doi.org/10.5281/zenodo.13971562
    Explore at:
    bin, csv, zipAvailable download formats
    Dataset updated
    Oct 24, 2024
    Dataset provided by
    Zenodo
    Authors
    Bertram Bengsch; Bertram Bengsch; Sagar; Sagar; Zhen Zhang; Zhen Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset details the scRNASeq and TCR-Seq analysis of sorted PD-1+ CD8+ T cells from patients with melanoma treated with checkpoint therapy (anti-PD-1 monotherapy and anti-PD-1 & anti-CTLA-4 combination therapy) at baseline and after the first cycle of therapy. A major publication using this dataset is accessible here: (reference)

    *experimental design

    Single-cell RNA sequencing was performed using 10x Genomics with feature barcoding technology to multiplex cell samples from different patients undergoing mono or dual therapy so that they can be loaded on one well to reduce costs and minimize technical variability. Hashtag oligomers (oligos) were obtained as purified and already oligo-conjugated in TotalSeq-C format from BioLegend. Cells were thawed, counted and 20 million cells per patient and time point were used for staining. Cells were stained with barcoded antibodies together with a staining solution containing antibodies against CD3, CD4, CD8, PD-1/IgG4 and fixable viability dye (eBioscience) prior to FACS sorting. Barcoded antibody concentrations used were 0.5 µg per million cells, as recommended by the manufacturer (BioLegend) for flow cytometry applications. After staining, cells were washed twice in PBS containing 2% BSA and 0.01% Tween 20, followed by centrifugation (300 xg 5 min at 4 °C) and supernatant exchange. After the final wash, cells were resuspended in PBS and filtered through 40 µm cell strainers and proceeded for sorting. Sorted cells were counted and approximately 75,000 cells were processed through 10x Genomics single-cell V(D)J workflow according to the manufacturer’s instructions. Gene expression, hashing and TCR libraries were pooled to desired quantities to obtain the sequencing depths of 15,000 reads per cell for gene expression libraries and 5,000 reads per cell for hashing and TCR libraries. Libraries were sequenced on a NovaSeq 6000 flow cell in a 2X100 paired-end format.

    *extract protocol

    PBMCs were thawed, counted and 20 million cells per patient and time point were used for staining. Cells were stained with barcoded antibodies together with a staining solution containing antibodies against CD3, CD4, CD8, PD-1/IgG4 and fixable viability dye (eBioscience) prior to FACS sorting. Barcoded antibody concentrations used were 0.5 µg per million cells, as recommended by the manufacturer (BioLegend) for flow cytometry applications. After staining, cells were washed twice in PBS containing 2% BSA and 0.01% Tween 20, followed by centrifugation (300 xg 5 min at 4 °C) and supernatant exchange. After the final wash, cells were resuspended in PBS and filtered through 40 µm cell strainers and proceeded for sorting. Sorted cells were counted and approximately 75,000 cells were processed through 10x Genomics single-cell V(D)J workflow according to the manufacturer’s instructions.

    *library construction protocol

    Sorted cells were counted and approximately 75,000 cells were processed through 10x Genomics single-cell V(D)J workflow according to the manufacturer’s instructions. Gene expression, hashing and TCR libraries were pooled to desired quantities to obtain the sequencing depths of 15,000 reads per cell for gene expression libraries and 5,000 reads per cell for hashing and TCR libraries. Libraries were sequenced on a NovaSeq 6000 flow cell in a 2X100 paired-end format.

    *library strategy

    scRNA-seq and scTCR-seq

    *data processing step

    Pre-processing of sequencing results to generate count matrices (gene expression and HTO barcode counts) was performed using the 10x genomics Cell Ranger pipeline.

    Further processing was done with Seurat (cell and gene filtering, hashtag identification, clustering, differential gene expression analysis based on gene expression).

    *genome build/assembly

    Alignment was performed using prebuilt Cell Ranger human reference GRCh38.

    *processed data files format and content

    RNA counts and HTO counts are in sparse matrix format and TCR clonotypes are in csv format.

    Datasets were merged and analyzed by Seurat and the analyzed objects are in rds format.

    file name

    file checksum

    PD1CD8_160421_filtered_feature_bc_matrix.zip

    da2e006d2b39485fd8cf8701742c6d77

    PD1CD8_190421_filtered_feature_bc_matrix.zip

    e125fc5031899bba71e1171888d78205

    PD1CD8_160421_filtered_contig_annotations.csv

    927241805d507204fbe9ef7045d0ccf4

    PD1CD8_190421_filtered_contig_annotations.csv

    8ca544d27f06e66592b567d3ab86551e

    *processed data file

    antibodies/tags

    PD1CD8_160421_filtered_feature_bc_matrix.zip

    none

    PD1CD8_160421_filtered_feature_bc_matrix.zip

    TotalSeq™-C0251 anti-human Hashtag 1 Antibody - (HASH_1) - M1_base_monotherapy
    TotalSeq™-C0252 anti-human Hashtag 2 Antibody - (HASH_2) - M1_post_monotherapy
    TotalSeq™-C0253 anti-human Hashtag 3 Antibody - (HASH_3) - C1_base_combined_therapy
    TotalSeq™-C0254 anti-human Hashtag 4 Antibody - (HASH_4) - C1_post_combined_therapy
    TotalSeq™-C0255 anti-human Hashtag 5 Antibody - (HASH_5) - C2_base_combined_therapy
    TotalSeq™-C0256 anti-human Hashtag 6 Antibody - (HASH_6) - C2_post_combined_therapy

    PD1CD8_160421_filtered_contig_annotations.csv

    none

    PD1CD8_190421_filtered_feature_bc_matrix.zip

    none

    PD1CD8_190421_filtered_feature_bc_matrix.zip

    TotalSeq™-C0251 anti-human Hashtag 1 Antibody - (HASH_1) - M2_base_monotherapy
    TotalSeq™-C0252 anti-human Hashtag 2 Antibody - (HASH_2) - M2_post_monotherapy
    TotalSeq™-C0253 anti-human Hashtag 3 Antibody - (HASH_3) - M3_base_monotherapy
    TotalSeq™-C0254 anti-human Hashtag 4 Antibody - (HASH_4) - M3_post_monotherapy
    TotalSeq™-C0255 anti-human Hashtag 5 Antibody - (HASH_5) - C3_base_combined_therapy
    TotalSeq™-C0256 anti-human Hashtag 6 Antibody - (HASH_6) - C3_post_combined_therapy

    PD1CD8_190421_filtered_contig_annotations.csv

    none

  4. Data from: A Single-Cell Tumor Immune Atlas for Precision Oncology

    • zenodo.org
    bin, csv
    Updated Mar 31, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paula Nieto; Paula Nieto (2022). A Single-Cell Tumor Immune Atlas for Precision Oncology [Dataset]. http://doi.org/10.5281/zenodo.4263972
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Mar 31, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Paula Nieto; Paula Nieto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Preprint version of the Single-Cell Tumor Immune Atlas

    This upload contains:

    • TICAtlas.rds: an rds file containing a Seurat object with the whole Atlas (317111 cells, RNA and integrated assays, PCA and UMAP reductions)
    • TICAtlas.h5ad: an h5ad file with the whole Atlas (317111 cells, RNA assay, PCA and UMAP)
    • TICAtlas_RNA.rds: an rds file containing a Seurat object of the whole Atlas but only the RNA assay (317111 cells, UMAP embedding)
    • TICAtlas_downsampled_1000.rds: an rds file containing a downsampled version of the Seurat object of the whole Atlas (24834 cells, RNA and integrated assay, PCA and UMAP reductions)
    • TICAtlas_downsampled_1000.h5ad: an rds file containing a downsampled version of the Seurat object of the whole Atlas (24834 cells, RNA assay, PCA and UMAP reductions)
    • TICAtlas_metadata.csv: a comma-separated text file with the metadata for each of the cells

    For the h5ad files, the .X slot contains the normalized data, while the .X.raw slot contains the raw counts as they were in the original datasets.

    All the files contain the following patient/sample metadata variables:

    • patient: assigned patient identifiers
    • gender: the patient's gender (male/female/unknown)
    • source: dataset of origin
    • subtype: cancer type (abbreviations as indicated in the preprint)
    • cluster_kmeans_k6: patients clusters, NA if filtered out
    • cell_type: annotated cell type for each of the cells

    If you have any issues with the metadata you can use the TICAtlas_metadata.csv file.

    For more information, read our preprint and check our GitHub.

    h5ad files can be read with Python using Scanpy, rds files can be read in R using Seurat. For format conversion between AnnData and Seurat we recommend SeuratDisk. For other single-cell data formats you can use sceasy.

  5. Test Files for ShinyCell R Package

    • figshare.com
    hdf
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Ouyang (2023). Test Files for ShinyCell R Package [Dataset]. http://doi.org/10.6084/m9.figshare.14229350.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    figshare
    Authors
    John Ouyang
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    Test Files for ShinyCell R Package containing single-cell datasets in various common single-cell data formats i.e. Seurat rda, SCE rds, loom file, h5ad file and plain text gene expression.

  6. Single-Cell RNA Data Portal for Alzheimer's Disease

    • zenodo.org
    • explore.openaire.eu
    zip
    Updated Mar 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theodoros Siozos; Theodoros Siozos; Christos Petrou; Christos Petrou; ATHANASIOS BALOMENOS; ATHANASIOS BALOMENOS; Yannis Kopsinis; Yannis Kopsinis (2025). Single-Cell RNA Data Portal for Alzheimer's Disease [Dataset]. http://doi.org/10.5281/zenodo.14900198
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 4, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Theodoros Siozos; Theodoros Siozos; Christos Petrou; Christos Petrou; ATHANASIOS BALOMENOS; ATHANASIOS BALOMENOS; Yannis Kopsinis; Yannis Kopsinis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Single-Cell RNA Data Portal for Alzheimer's Disease

    The single cell Alzheimer's Disease Data Portal is an aggregated data portal created as part of the Enfield EU Funded program for the single-cell Generative Pretrained Transformer (scGPT-AD) model research. The data portal contains data from the ssREAD data portal, along with single-cell AD data from latest studies (dharsini et al, pan et al, rexach et al). The data from the individual studies where accessed through the cellXgene data portal, a vast portal for single cell data. The data have been uploaded in two seperate .zip files (part1, part2).

    The single cell data follow the Annotated Data format. The core data for each sample is the gene-expression matrix, which refers to the level of expression of each gene in a single cell. Additionally, the dataset contains the `.obs` attributed which includes core cell metadata for each of the sample (cell type, brain region, braak stage, donor age, disease condition, donor gender, etc.), along with the gene names accessed via `.var` attribute.

    The source data have been processed to create a unified data portal ready to be used as training dataset for a Transformer model. The main processing steps were:

    • convert ssREAD data from `.qsave` format to `.h5ad` format that aligns with the AnnData framework
    • discard some unprocessable data samples
    • standardize metadata column names
    • process categorical data to create a unified namespace (e.g.: merge `microglia` and `microgrial` cell type names into one)
    • discard dimensionality reduction and clustering attributes, to make a lightweight version of the data portal, since they are not meant to be used in Transformer model training

    Aggregated Data Statistics

    Total Cells

    2.3M

    AD Cells

    1.2M

    Control Cells

    1.1M

    Unique Genes

    107k

    Donors

    166

    Characteristics of Dataset grouped by Data Source

    Data Source

    Unique Genes

    Total Cells

    AD Cells

    Control Cells

    Donors

    Cell Type Label

    Brain Region

    Tissue Type

    Braak Stage

    Donors Id

    Donor Gender

    Donor Age

    rexach et al

    30k

    217k

    118k

    99k

    20

    pan et al

    61k

    43k

    11k

    32k

    7

    dharsini et al

    61k

    425k

    311k

    114k

    46

    ssREAD

    62k

    2.42M

    1.14M

    1.28M

    135

  7. r

    10X single-cell RNA sequencing of bone marrow cells from MDS-RS patients and...

    • researchdata.se
    Updated Nov 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Luis Moura; Eva Hellström-Lindberg (2023). 10X single-cell RNA sequencing of bone marrow cells from MDS-RS patients and healthy donors [Dataset]. http://doi.org/10.48723/nq2a-1e03
    Explore at:
    (1107), (5665)Available download formats
    Dataset updated
    Nov 6, 2023
    Dataset provided by
    Karolinska Institutet
    Authors
    Pedro Luis Moura; Eva Hellström-Lindberg
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    This dataset consists of single-cell RNA sequencing data of bone marrow cells (CD34+ stem cells, GPA+ erythroblasts, ring sideroblasts and mononuclear cells) obtained from multiple healthy bone marrow donors and MDS-RS patients. The objective of this data collection was to assess several parameters on how the bone marrow of MDS-RS patients differs from that of healthy donors.

    This dataset includes raw sequencing data in .fastq format, processed count matrices and associated pseudonymized metadata.

    Processing: All samples were loaded onto Chromium Single Cell Chips (10x Genomics, CA, USA) at a target capture rate of 10,000 cells per sample. Single cell libraries were prepared using Chromium Next GEM Single Cell 3ʹ Kits v3.1 (10x Genomics) as per the manufacturer’s instructions, except 1µl additive ADT primers were added to the initial cDNA PCR amplification buffer and ADT libraries prepared as described in the Total-Seq B protocol (BioLegend) from the initial cDNA SPRI clean up. Libraries were pooled and sequenced on an Illumina NovaSeq 6000 (Illumina). Read pseudoalignment was performed against the GRCh38.p13 human genome assembly through kallisto v0.46.1 and bustools v0.40.0 was used for barcode and UMI counting.

    The dataset consists of 2 folders: - Processed_Count_Matrices - Raw_FASTQ

    And one xlsx file: - Sample_key.xlsx

    The folder Processed_Count_Matrices contains 1 rds file, 1 tsv file, 9 mtx files, and 18 txt files. The folder Raw_FASTQ contains 27 GNU zipped fastq files, and 5 txt files.

    The documentation file File_list_10x.txt contains a full list of the files in the dataset.

    The total size of the dataset is approximately 21 GB.

  8. Z

    Single-cell Spatial Transcriptomics Data with Paired RNAseq for TISSUE...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sun, Eric (2024). Single-cell Spatial Transcriptomics Data with Paired RNAseq for TISSUE spatial gene expression prediction [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8259941
    Explore at:
    Dataset updated
    Jan 8, 2024
    Dataset authored and provided by
    Sun, Eric
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset folders from "TISSUE: uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses". If using the processed data or TISSUE algorithm, please cite: https://doi.org/10.1101/2023.04.25.538326.

    The directory of datasets are compressed in tar gzip format. The top level contains folders with dataset names and within each of those folders, there are the relevant data files which include:

    • Spatial_count.txt --- a tab-delimited file containing spatial transcriptomics counts matrix

    • scRNA_count.txt --- a tab-delimited file containing RNAseq counts matrix

    • Locations.txt --- a tab-delimited file containing the (x,y) spatial coordinates of cells in the spatial transcriptomics data

    • Metadata.txt --- for some datasets, this is a comma-separated file containing the metadata table for the spatial transcriptomics data

    These files are formatted and organized to be read into AnnData objects using the native loading functions in the TISSUE package (https://github.com/sunericd/TISSUE). Some folders will also have additional accessory files such as gene lists corresponding to some experiments present in our manuscript and/or adjacency matrix objects.

    Also included are the two simulated spatial transcriptomics datasets that we generated using SRTsim.

    The SVZ folders contain our processed MERFISH spatial transcriptomics dataset on the adult mouse subventricular zone. Refer to the SVZFullFinal folder for the full dataset with TISSUE-informed cell labels. All other folders are processed data accessed from publicly available sources. The identity of numbered folders can be found in the Data Availability statement of the benchmarking paper from which they were retrieved: https://doi.org/10.1038/s41592-022-01480-9

    "svz_merfish_data.zip" includes the raw MERFISH dataset on the adult mouse subventricular zone.

  9. n

    Single-cell analysed data

    • data.ncl.ac.uk
    zip
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ioana Nicorescu (2025). Single-cell analysed data [Dataset]. http://doi.org/10.25405/data.ncl.28359179.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 24, 2025
    Dataset provided by
    Newcastle University
    Authors
    Ioana Nicorescu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contains the following files and datasets:Flow Cytometry DataIndividual FCS files - Raw data files obtained following segmentationAnalysis file (pre-transformation) - Data analysis file before transformation, compatible with FCS ExpressAnalysis file (post-transformation) - Data analysis file after transformation, compatible with FCS ExpressDNS format files - Processed files analyzed following data transformationStatistical Analysis and FiguresManuscript figures - All figures from the manuscript in GraphPad Prism format, accessible with Numbers, including statistical test resultsData Extraction and Spatial AnalysisCluster percentages - Excel file containing individual cluster percentages extracted from the analysis fileSpatial neighborhood data - Excel file with all data used as starting point for spatial neighborhood map generationSpatial interaction maps - ZIP archive containing heatmaps showing spatial interactions between individual clustersPlease see the collection for related records https://doi.org/10.25405/data.ncl.c.7890872

  10. FedscGen: privacy-aware federated batch effect correction of single-cell RNA...

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Bakhtiari; Mohammad Bakhtiari (2025). FedscGen: privacy-aware federated batch effect correction of single-cell RNA sequencing data -- Preprocessed datasets [Dataset]. http://doi.org/10.5281/zenodo.11489844
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mohammad Bakhtiari; Mohammad Bakhtiari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 5, 2024
    Description

    This dataset accompanies the publication "FedscGen: Privacy-Aware Federated Batch Effect Correction of Single-Cell RNA Sequencing Data" and includes eight single-cell RNA sequencing (scRNA-seq) datasets used to benchmark the FedscGen and scGen methods. The datasets are provided in .h5ad format and include comprehensive metadata necessary for replication and further analysis.

    Datasets

    We analyze various datasets to compare FedscGen against scGen (centralized) in terms of batch correction. For simplicity, we refer to the dataset by abbreviations:

    1. Cell Line (CL):

      • Derived from the 293t_jurkat experiment with three batches: Zheng et al., 2017.
    2. Human Dendritic Cells (HDC):

      • scRNA-seq data of human dendritic cells across two batches: Villani et al., 2017.
    3. Human Pancreas (HP):

      • Consolidated data from five sources with 14,767 cells each: Baron et al., 2016; Muraro et al., 2016; Segerstolpe et al., 2016; Wang et al., 2016; Xin et al., 2016.
    4. Mouse Brain (MB):

      • Merged datasets with 691,600 and 141,606 cells: Saunders et al., 2018; Rosenberg et al., 2018.
    5. Mouse Cell Atlas (MCA):

      • Data focusing on 11 cell types from various organs: Han et al., 2018; The Tabula Muris Consortium, 2018.
    6. Mouse Hematopoietic Stem and Progenitor Cells (MHSPC):

      • Data from SMART-seq2 and MARS-seq protocols: Nestorowa et al., 2016; Paul et al., 2015.
    7. Mouse Retina (MR):

      • Data from two unassociated laboratories with 26,830 and 44,808 cells: Macosko et al., 2015; Shekhar et al., 2016.
    8. PBMC (human Peripheral Blood Mononuclear Cell):

      • scRNA-seq data with two batches: Zheng et al., 2017.

    Usage Notes: Each dataset is provided in .h5ad format, compatible with common single-cell analysis tools such as Scanpy. Detailed metadata is included within each file.

    Keywords: Single-cell RNA sequencing, scRNA-seq, Batch effect correction, Privacy-aware, Federated learning, scGen, FedscGen, Clinical multi-center studies, Genomics, Bioinformatics

    Contact: For questions or further information, please contact Mohammad Bakhtiari at mohammad.bakhtiari@uni-hamburg.de.

    License: Creative Commons Attribution 4.0 International (CC BY 4.0)

  11. Z

    SCimilarity Tutorial Data

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Graham Heimberg (2024). SCimilarity Tutorial Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8242082
    Explore at:
    Dataset updated
    Sep 4, 2024
    Dataset provided by
    Graham Heimberg
    Héctor Corrada Bravo
    Jason Vander Heiden
    Tony Kuo
    Nathaniel Diamant
    Omar Salem
    Description

    SCimilarity is a unifying representation of single-cell expression profiles that quantifies similarity between expression states and generalizes to represent new studies without additional training. This enables a novel cell search capability, which sifts through millions of profiles to find cells similar to a query cell state and allows researchers to quickly and systematically leverage massive public scRNA-seq atlases to learn about a cell state of interest.

    This repository contains public datasets for SCimilarity tutorials, specifically:

    A subsample of single-cell data from Adams, et al. Science Advances, 2020 (GSE136831) as an AnnData object in h5ad format.

    Terms of GSE136831:

    Used with permission. Research developed by TLC4PF and the Yale School of Medicine led by Dr. Naftali Kaminski. © 2023 Pulmonary Fibrosis Cell Atlas website and associated content. All rights reserved. Please see the project website for more information: www.IPFCellAtlas.com

    In addition, please cite (https://www.science.org/doi/10.1126/sciadv.aba1983 and for a description of the website creation methodology please cite (https://doi.org/10.1152/ajplung.00451.2020).

  12. Ageing_Exercise_Single_Cell

    • figshare.com
    application/gzip
    Updated Jul 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Solal Chauquet (2024). Ageing_Exercise_Single_Cell [Dataset]. http://doi.org/10.6084/m9.figshare.21959516.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jul 3, 2024
    Dataset provided by
    figshare
    Authors
    Solal Chauquet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Single cell RNA seq dataset at the rds format. Readable using the R programming language.

  13. Multiple Single Cell RNA Expressions ARCHS4

    • kaggle.com
    zip
    Updated Jun 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander (2021). Multiple Single Cell RNA Expressions ARCHS4 [Dataset]. https://www.kaggle.com/alexandervc/multiple-single-cell-rna-expressions-archs4
    Explore at:
    zip(23088130184 bytes)Available download formats
    Dataset updated
    Jun 26, 2021
    Authors
    Alexander
    Description

    Context

    Dataset is downloaded from https://amp.pharm.mssm.edu/archs4/download.html The methods are described in Nature Communications paper: https://www.nature.com/articles/s41467-018-03751-6

    The ARCHS4 data provides user-friendly access to multiple gene expression data from the GEO database. (https://www.ncbi.nlm.nih.gov/geo/ ). While in GEO database most of data is stored in raw formats, ARCHS4 provides prepared count matrix expression data. While GEO contains data stored separately for each research paper, ARCHS4 collects all the information in one single matrix. One may consult the main site for further information.

    Main data files are in H5 (HD5, Hierarchical Data Format ) file format https://en.wikipedia.org/wiki/Hierarchical_Data_Format It contains expression data, as well as annotation data and futher meta-information. There are several other auxilliary files like TSNE 3d projection (in CSV format) and correlation matrices for genes for human and mouse in feather format.

    Content

    The main file (for human): human_matrix.h5 - contains data matrix - which is 238522 samples times 35238 genes, as well as, various meta information: gene names, samples information (tissue, etc), references to GEO database id where all the details can be found.

    There is also similar data for mouse, csv files with TSNE images, correlation matrices for genes.

    Acknowledgements

    The ARCHS4 project is by :

    'Alexander Lachmann', 'alexander.lachmann@mssm.edu', update: '2020-02-06'

  14. E

    Single-cell omics data for COVID-19 patients

    • ega-archive.org
    • omicsdi.org
    Updated Sep 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Single-cell omics data for COVID-19 patients [Dataset]. https://ega-archive.org/datasets/EGAD00001009331
    Explore at:
    Dataset updated
    Sep 5, 2022
    License

    https://ega-archive.org/dacs/EGAC00001002844https://ega-archive.org/dacs/EGAC00001002844

    Description

    Single-cell RNA-seq, single-cell ATAC-seq, and genotypes used in the analysis for the study "Altered and allele-specific open chromatin landscape reveal epigenetic and genetic regulators of innate immunity in COVID-19". The RNA-seq and ATAC-seq are raw data in FASTQ format while the genotypes are in the VCF format which was filtered and imputed (more details are available in the main text of the study).

  15. Protocol data (R version)

    • figshare.com
    application/gzip
    Updated Oct 16, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jesse Gillis (2020). Protocol data (R version) [Dataset]. http://doi.org/10.6084/m9.figshare.13020569.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Oct 16, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Jesse Gillis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We published 3 protocols illustrating how MetaNeighbor can be used to quantify cell type replicability across single cell transcriptomic datasets.The data files included here are needed to run the R version of the protocols available on Github (https://github.com/gillislab/MetaNeighbor-Protocol) in RMarkdown (.Rmd) and Jupyter (.ipynb) notebook format. To run the protocols, download the protocols on Github, download the data on Figshare, place the data and protocol files in the same directory, then run the notebooks in Rstudio or Jupyter.The scripts used to generate the data are included in the Github directory. Briefly: - full_biccn_hvg.rds contains a single cell transcriptomic dataset published by the Brain Initiative Cell Census Network (in SingleCellExperiment format). It combines data from 7 datasets obtained in the mouse primary motor cortex (https://www.biorxiv.org/content/10.1101/2020.02.29.970558v2). Note that this dataset only contains highly variable genes. - biccn_hvgs.txt: highly variable genes from the BICCN dataset described above (computed with the MetaNeighbor library). - biccn_gaba.rds: same dataset as full_biccn_hvg.rds, but restricted to GABAergic neurons. The dataset contains all genes common to the 7 BICCN datasets (not just highly variable genes). - go_mouse.rds: gene ontology annotations, stored as a list of gene symbols (one element per gene set).- functional_aurocs.txt: results of the MetaNeighbor functional analysis in protocol 3.

  16. f

    MOESM11 of Benchmarking principal component analysis for large-scale...

    • springernature.figshare.com
    application/x-gzip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koki Tsuyuzaki; Hiroyuki Sato; Kenta Sato; Itoshi Nikaido (2023). MOESM11 of Benchmarking principal component analysis for large-scale single-cell RNA-sequencing [Dataset]. http://doi.org/10.6084/m9.figshare.11662101.v1
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Koki Tsuyuzaki; Hiroyuki Sato; Kenta Sato; Itoshi Nikaido
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 11 Pair plots of all the pCA (Brain) implementations.

  17. E

    Processed Chromium Single Cell GEX, CSP and VDJ data from intestinal plasma...

    • ega-archive.org
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Processed Chromium Single Cell GEX, CSP and VDJ data from intestinal plasma cells of untreated celiac disease patients [Dataset]. https://ega-archive.org/datasets/EGAD50000000339
    Explore at:
    Dataset updated
    Apr 18, 2024
    License

    https://ega-archive.org/dacs/EGAC50000000162https://ega-archive.org/dacs/EGAC50000000162

    Description

    The dataset contains processed sequencing data from Chromium Single Cell 5’ gene expression, human B cell VDJ and feature barcode (CSP) sequencing from transglutaminase 2-specific and other small intestinal plasma cells isolated from four untreated celiac disease patients. The raw sequencing data has been processed with Cell Ranger v.6.0.2 with the multi and aggr functions using the pre-built Cell Ranger references GRCh38 version 2020-A for gene expression and GRCh38-alts-ensembl-5.0.0 for V(D)J analysis. The dataset consists of a gene expression and antibody capture expression matrix (cell barcodes and feature names in tsv.gz file, expression matrix in mtx.gz file) and VDJ sequences in AIRR format (csv file). A metadata file (csv file) details cells passing our custom quality control based on number of detected genes, UMIs, mitochondrial genes, immunoglobulin genes and a productively rearranged immunoglobulin heavy chain of the IgA isotype.

  18. Z

    Processed snRNA-seq data from "Divergent single cell transcriptome and...

    • data.niaid.nih.gov
    Updated Jul 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexey Kozlenkov (2023). Processed snRNA-seq data from "Divergent single cell transcriptome and epigenome alterations in ALS and FTD patients with C9orf72 mutation" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8190316
    Explore at:
    Dataset updated
    Jul 29, 2023
    Dataset provided by
    Mahammad Gardashli
    Junhao Li
    Eran A Mukamel
    Veronique V. Belzil
    Luc J. Pregent
    Alexey Kozlenkov
    Manoj K Jaiswal
    Erica Engelberg-Cook
    Jinyoung Jung
    Ping Zhou
    Dennis W. Dickson
    Stella Dracheva
    Jo-Fan Chien
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Processed snRNA-seq data from "Divergent single cell transcriptome and epigenome alterations in ALS and FTD patients with C9orf72 mutation". All nuclei passed QC and were corrected for background noise using cellBender. Files are in R objects saved in RDS (R Data Serialization) format. This repo contains one Seurat v4 object and one gene-by-cell raw RNA count matrix in sparse matrix format (dgCMatrix).

  19. Z

    Supplementary data for "Scalable eQTL mapping using single-nucleus...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schneeberger, Korbinian (2025). Supplementary data for "Scalable eQTL mapping using single-nucleus RNA-sequencing of recombined gametes from a small number of individuals" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14864053
    Explore at:
    Dataset updated
    Feb 14, 2025
    Dataset provided by
    Schneeberger, Korbinian
    Parker, Matthew
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We developed a rapid expression quantitative trait locus analysis approach using single cell or nucleus RNA sequencing (snRNAseq) of gametes from a small number of heterozygous individuals. This archive contains supplementary datasets for the paper "Scalable eQTL mapping using single-nucleus RNA-sequencing of recombined gametes from a small number of individuals" in excel spreadsheet format.

  20. s

    Single cell sequencing data from: The cellular state space of AML unveils...

    • figshare.scilifelab.se
    • researchdata.se
    • +1more
    hdf
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Henrik Lilljebjörn; Thoas Fioretos (2025). Single cell sequencing data from: The cellular state space of AML unveils novel NPM1 subtypes with distinct clinical outcomes and immune evasion properties [Dataset]. http://doi.org/10.17044/scilifelab.23715648.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    Jul 15, 2025
    Dataset provided by
    Lund University
    Authors
    Henrik Lilljebjörn; Thoas Fioretos
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    This dataset contains 10X single cell 3' RNA sequencing gene expression data from from 38 AML-samples from the subtypes NPM1 (n=12), AML-MR (n=11), TP53 (n=7), CBFB::MYH11 (n=3), RUNX1::RUNX1T1 (n=3), AML without class defining mutations (n=1), and AML meeting the criteria for two subtypes (n=1). In addition, reference samples from normal bone marrow mononuclear cells (n=5) and CD34 sorted cells (n=3) are included. The single cell libraries were constructed from viably frozen cells from bone marrow (n=29+8) or peripheral blood (n=9) using the Chromium Single Cell 3' Library & Gel Bead Kit v3 (10X genomics) and sequenced on a Novaseq 6000 or NextSeq 500.Data is available in h5 format for each sample, with raw count output from Cellranger, or as a processed Seurat object with scaled expression data, dimension reductions, and metadata.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gabrielle Leclercq-Cohen; Sabrina Danilin; Llucia Alberti-Servera; Stephan Schmeing; Hélène Haegel; Sina Nassiri; Marina Bacac (2023). Raw and processed (filtered and annotated) scRNAseq data [Dataset]. http://doi.org/10.6084/m9.figshare.23499192.v1
Organization logo

Raw and processed (filtered and annotated) scRNAseq data

Explore at:
zipAvailable download formats
Dataset updated
Jun 12, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Gabrielle Leclercq-Cohen; Sabrina Danilin; Llucia Alberti-Servera; Stephan Schmeing; Hélène Haegel; Sina Nassiri; Marina Bacac
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Single cell RNA-seq data generated and reported as part of the manuscript entitled "Dissecting the mechanisms underlying the Cytokine Release Syndrome (CRS) mediated by T Cell Bispecific Antibodies" by Leclercq-Cohen et al 2023. Raw and processed (filtered and annotated) data are provided as AnnData objects which can be directly ingested to reproduce the findings of the paper or for ab initio data reuse: 1- raw.zip provides concatenated raw/unfiltered counts for the 20 samples in the standard Market Exchange Format (MEX) format. 2- 230330_sw_besca2_LowFil_raw.h5ad contains filtered cells and raw counts in the HDF5 format. 3- 221124_sw_besca2_LowFil.annotated.h5ad contains filtered cells and log normalized counts, along with cell type annotation in the HDF5 format.

scRNAseq data generation: Whole blood from 4 donors was treated with 0.2 μg/mL CD20-TCB, or incubated in the absence of CD20- TCB. At baseline (before addition of TCB) and assay endpoints (2, 4, 6, and 20 hrs), blood was collected for total leukocyte isolation using EasySepTM red blood cell depletion reagent (Stemcell). Briefly, cells were counted and processed for single cell RNA sequencing using the BD Rhapsody platform. To load several samples on a single BD Rhapsody cartridge, sample cells were labelled with sample tags (BD Human Single-Cell Multiplexing Kit) following the manufacturer’s protocol prior to pooling. Briefly, 1x106 cells from each sample were re-suspended in 180 μL FBS Stain Buffer (BD, PharMingen) and sample tags were added to the respective samples and incubated for 20 min at RT. After incubation, 2 successive washes were performed by addition of 2 mL stain buffer and centrifugation for 5 min at 300 g. Cells were then re- suspended in 620 μL cold BD Sample Buffer, stained with 3.1 μL of both 2 mM Calcein AM (Thermo Fisher Scientific) and 0.3 mM Draq7 (BD Biosciences) and finally counted on the BD Rhapsody scanner. Samples were then diluted and/or pooled equally in 650 μL cold BD Sample Buffer. The BD Rhapsody cartridges were then loaded with up to 40 000 – 50 000 cells. Single cells were isolated using Single-Cell Capture and cDNA Synthesis with the BD Rhapsody Express Single-Cell Analysis System according to the manufacturer’s recommendations (BD Biosciences). cDNA libraries were prepared using the Whole Transcriptome Analysis Amplification Kit following the BD Rhapsody System mRNA Whole Transcriptome Analysis (WTA) and Sample Tag Library Preparation Protocol (BD Biosciences). Indexed WTA and sample tags libraries were quantified and quality controlled on the Qubit Fluorometer using the Qubit dsDNA HS Assay, and on the Agilent 2100 Bioanalyzer system using the Agilent High Sensitivity DNA Kit. Sequencing was performed on a Novaseq 6000 (Illumina) in paired-end mode (64-8- 58) with Novaseq6000 S2 v1 or Novaseq6000 SP v1.5 reagents kits (100 cycles). scRNAseq data analysis: Sequencing data was processed using the BD Rhapsody Analysis pipeline (v 1.0 https://www.bd.com/documents/guides/user-guides/GMX_BD-Rhapsody-genomics- informatics_UG_EN.pdf) on the Seven Bridges Genomics platform. Briefly, read pairs with low sequencing quality were first removed and the cell label and UMI identified for further quality check and filtering. Valid reads were then mapped to the human reference genome (GRCh38-PhiX-gencodev29) using the aligner Bowtie2 v2.2.9, and reads with the same cell label, same UMI sequence and same gene were collapsed into a single raw molecule while undergoing further error correction and quality checks. Cell labels were filtered with a multi-step algorithm to distinguish those associated with putative cells from those associated with noise. After determining the putative cells, each cell was assigned to the sample of origin through the sample tag (only for cartridges with multiplex loading). Finally, the single-cell gene expression matrices were generated and a metrics summary was provided. After pre-processing with BD’s pipeline, the count matrices and metadata of each sample were aggregated into a single adata object and loaded into the besca v2.3 pipeline for the single cell RNA sequencing analysis (43). First, we filtered low quality cells with less than 200 genes, less than 500 counts or more than 30% of mitochondrial reads. This permissive filtering was used in order to preserve the neutrophils. We further excluded potential multiplets (cells with more than 5,000 genes or 20,000 counts), and genes expressed in less than 30 cells. Normalization, log-transformed UMI counts per 10,000 reads [log(CP10K+1)], was applied before downstream analysis. After normalization, technical variance was removed by regressing out the effects of total UMI counts and percentage of mitochondrial reads, and gene expression was scaled. The 2,507 most variable genes (having a minimum mean expression of 0.0125, a maximum mean expression of 3 and a minimum dispersion of 0.5) were used for principal component analysis. Finally, the first 50 PCs were used as input for calculating the 10 nearest neighbours and the neighbourhood graph was then embedded into the two-dimensional space using the UMAP algorithm at a resolution of 2. Cell type annotation was performed using the Sig-annot semi-automated besca module, which is a signature- based hierarchical cell annotation method. The used signatures, configuration and nomenclature files can be found at https://github.com/bedapub/besca/tree/master/besca/datasets. For more details, please refer to the publication.

Search
Clear search
Close search
Google apps
Main menu