Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single cell RNA-seq data generated and reported as part of the manuscript entitled "Dissecting the mechanisms underlying the Cytokine Release Syndrome (CRS) mediated by T Cell Bispecific Antibodies" by Leclercq-Cohen et al 2023. Raw and processed (filtered and annotated) data are provided as AnnData objects which can be directly ingested to reproduce the findings of the paper or for ab initio data reuse: 1- raw.zip provides concatenated raw/unfiltered counts for the 20 samples in the standard Market Exchange Format (MEX) format. 2- 230330_sw_besca2_LowFil_raw.h5ad contains filtered cells and raw counts in the HDF5 format. 3- 221124_sw_besca2_LowFil.annotated.h5ad contains filtered cells and log normalized counts, along with cell type annotation in the HDF5 format.
scRNAseq data generation: Whole blood from 4 donors was treated with 0.2 μg/mL CD20-TCB, or incubated in the absence of CD20- TCB. At baseline (before addition of TCB) and assay endpoints (2, 4, 6, and 20 hrs), blood was collected for total leukocyte isolation using EasySepTM red blood cell depletion reagent (Stemcell). Briefly, cells were counted and processed for single cell RNA sequencing using the BD Rhapsody platform. To load several samples on a single BD Rhapsody cartridge, sample cells were labelled with sample tags (BD Human Single-Cell Multiplexing Kit) following the manufacturer’s protocol prior to pooling. Briefly, 1x106 cells from each sample were re-suspended in 180 μL FBS Stain Buffer (BD, PharMingen) and sample tags were added to the respective samples and incubated for 20 min at RT. After incubation, 2 successive washes were performed by addition of 2 mL stain buffer and centrifugation for 5 min at 300 g. Cells were then re- suspended in 620 μL cold BD Sample Buffer, stained with 3.1 μL of both 2 mM Calcein AM (Thermo Fisher Scientific) and 0.3 mM Draq7 (BD Biosciences) and finally counted on the BD Rhapsody scanner. Samples were then diluted and/or pooled equally in 650 μL cold BD Sample Buffer. The BD Rhapsody cartridges were then loaded with up to 40 000 – 50 000 cells. Single cells were isolated using Single-Cell Capture and cDNA Synthesis with the BD Rhapsody Express Single-Cell Analysis System according to the manufacturer’s recommendations (BD Biosciences). cDNA libraries were prepared using the Whole Transcriptome Analysis Amplification Kit following the BD Rhapsody System mRNA Whole Transcriptome Analysis (WTA) and Sample Tag Library Preparation Protocol (BD Biosciences). Indexed WTA and sample tags libraries were quantified and quality controlled on the Qubit Fluorometer using the Qubit dsDNA HS Assay, and on the Agilent 2100 Bioanalyzer system using the Agilent High Sensitivity DNA Kit. Sequencing was performed on a Novaseq 6000 (Illumina) in paired-end mode (64-8- 58) with Novaseq6000 S2 v1 or Novaseq6000 SP v1.5 reagents kits (100 cycles). scRNAseq data analysis: Sequencing data was processed using the BD Rhapsody Analysis pipeline (v 1.0 https://www.bd.com/documents/guides/user-guides/GMX_BD-Rhapsody-genomics- informatics_UG_EN.pdf) on the Seven Bridges Genomics platform. Briefly, read pairs with low sequencing quality were first removed and the cell label and UMI identified for further quality check and filtering. Valid reads were then mapped to the human reference genome (GRCh38-PhiX-gencodev29) using the aligner Bowtie2 v2.2.9, and reads with the same cell label, same UMI sequence and same gene were collapsed into a single raw molecule while undergoing further error correction and quality checks. Cell labels were filtered with a multi-step algorithm to distinguish those associated with putative cells from those associated with noise. After determining the putative cells, each cell was assigned to the sample of origin through the sample tag (only for cartridges with multiplex loading). Finally, the single-cell gene expression matrices were generated and a metrics summary was provided. After pre-processing with BD’s pipeline, the count matrices and metadata of each sample were aggregated into a single adata object and loaded into the besca v2.3 pipeline for the single cell RNA sequencing analysis (43). First, we filtered low quality cells with less than 200 genes, less than 500 counts or more than 30% of mitochondrial reads. This permissive filtering was used in order to preserve the neutrophils. We further excluded potential multiplets (cells with more than 5,000 genes or 20,000 counts), and genes expressed in less than 30 cells. Normalization, log-transformed UMI counts per 10,000 reads [log(CP10K+1)], was applied before downstream analysis. After normalization, technical variance was removed by regressing out the effects of total UMI counts and percentage of mitochondrial reads, and gene expression was scaled. The 2,507 most variable genes (having a minimum mean expression of 0.0125, a maximum mean expression of 3 and a minimum dispersion of 0.5) were used for principal component analysis. Finally, the first 50 PCs were used as input for calculating the 10 nearest neighbours and the neighbourhood graph was then embedded into the two-dimensional space using the UMAP algorithm at a resolution of 2. Cell type annotation was performed using the Sig-annot semi-automated besca module, which is a signature- based hierarchical cell annotation method. The used signatures, configuration and nomenclature files can be found at https://github.com/bedapub/besca/tree/master/besca/datasets. For more details, please refer to the publication.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows:
matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz)
*The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include:
nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are count matrix data from all the four experiments. Experiment ID and cell population names are indicated in the file names. To read the count matrix data, please follow the instruction: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/matrices
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Gene expression (counts) scRNA-seq of co-cultured cancer- and immune cells treated with trifluridine and DMSO control assayed at two time-points (12h and 72h).
HCT116 were seeded in 6-well Nunc plates (50,000 cells/3mL/well) and precultured for 24 h before PBMCs were added at a 1:8 ratio. Co-cultures were treated with DMSO vehicle (0.1%) or FTD (3mM) for 12 h or 72 h. MACS Dead Cell Removal Kit (Miltenyi Biotec, Gladbach, DEU) was performed according to the manufacturer’s instructions on cells treated for 72 h to increase the viability of the samples before RNA-sequencing. The viability of the samples treated for 12 h was not subjected to Dead Cell Removal as the viability was already sufficient. All samples were washed in PBS with 0.04% BSA (2x1mL). Chromium Next GEM Single Cell 3’ library preparation and RNA-sequencing were performed by the SNP&SEQ Technology Platform (National Genomics Infrastructure (NGI), Science for Life Laboratory, Uppsala University, Sweden).
This data set contains processed data using Cell Ranger toolkit version 5.0.1 provided by 10x Genomics, for demultiplexing, aligning reads to the human reference genome GRCh38, and generating gene-cell unique molecular identifiers
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset includes single-cell RNA sequencing (scRNA-seq) data from primary AML (acute myeloid leukemia) samples. Libraries were produced using the 10X Genomics Chromium Single Cell 3ʹ Reagent Kits v3 and sequenced on an Illumina Novaseq 6000 system (Illumina). The dataset is available as raw sequencing reads (fastq; restricted access) or as an annotated matrix of scRNA count data (h5ad). Published in: Sandén et al, Nature Cancer, 2025: https://www.nature.com/articles/s43018-025-01054-6
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Count matrices and meta data tables from simulated and real world immune cell single-cell RNA-seq experiments.
All files are in Rds format and can be read by R using "readRDS()".
Facebook
TwitterSingle cells were isolated and then processed using the 10X Genomics Single Cell 3' v3 kit according to the manufacturer's instructions. Libraries were sequenced on the Illumina NovaSeq 6000 instrument (RRID:SCR_016387). Raw sequencing data were processed using the Cell Ranger (CR) (v6.0.0) pipeline (RRID:SCR_017344) to generate fastq files. Fastq files were aligned and quantified, generating feature-barcode count matrices. Gene-barcode matrices containing Unique Molecular Identifier (UMI) counts are filtered using CR's cell detection algorithm. Downstream analyses were performed mainly using Seurat (v5.0.0) single-cell analysis R package (RRID:SCR_016341). Eight single-cell RNA seq samples were individually read into a Seurat object (RRID:SCR_016341) to examine feature number, mitochondrial percentage, and read count distributions within each sample. Cells with fewer than 500 features or greater than 7500 features or >15% mitochondrial content were filtered out. After normalizing an...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains count matrices and per-cells metadata tables for RNA sequencing of 39778 single nuclei from healthy primary lung samples of 12 lung adenocarcinoma patients as well as 17451 single human bronchiole epithelial cells from 4 donors. All samples were processed using the 10X Genomics Chromium platform with v2 chemistry and sequenced with one sample per lane on an Illumina HiSeq4000. Reads were aligned to the hg19 reference genome version 1.2.0 obtained from 10X Genomics. Data processing was performed using Seurat3. The metadata table includes patient ID, sex, age, smoking status, and cell type, as well as QC statistics (number of genes, number of cells, ratio of mitochondrial reads).
Facebook
TwitterQuality control analysis of fastq raw data was performed using FastQC [93]. Reads were aligned to reference genome (hg38) using STAR, and reads were quantified using HTSeq-counts with Gencode annotation v38 and raw counts provided in Table 2a. Differential expression analysis was performed with DESeq. For differential gene expression analyses, the cutoff for significant fold change was >1.5, adjusted p-value <0.05 and provided in Table 2b. Single cell data: After samples were demultiplexed, individual fastq files were subjected to barcode processing and UMI counting using Cell Ranger v2.1.0 (https://support.10xgenomics.com). Each individual library was processed using cellranger count function to generate a gene-barcode matrix for each library and reads aligned to the human reference genome (hg38). Cell barcodes and UMIs associated with the aligned reads were subjected to correction and filtering using an estimation of 3000 recovered cells (—expect-cells 3000). The resulting gene-cell UMI count matrices for each sample were then concatenated into one matrix using the “cellranger aggr” pipeline and files are provided as compressed files in Table 2c. (ZIP)
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Arsenic exposure via drinking water is a serious environmental health concern. Epidemiological studies suggest a strong association between prenatal arsenic exposure and subsequent childhood respiratory infections, as well as morbidity from respiratory diseases in adulthood, long after systemic clearance of arsenic. We investigated the impact of exclusive prenatal arsenic exposure on the inflammatory immune response and respiratory health after an adult influenza A (IAV) lung infection. C57BL/6J mice were exposed to 100 ppb sodium arsenite in utero, and subsequently infected with IAV (H1N1) after maturation to adulthood. Assessment of lung tissue and bronchoalveolar lavage fluid (BALF) at various time points post IAV infection reveals greater lung damage and inflammation in arsenic exposed mice versus control mice. Single-cell RNA sequencing analysis of immune cells harvested from IAV infected lungs suggests that the enhanced inflammatory response is mediated by dysregulation of innate immune function of monocyte derived macrophages, neutrophils, NK cells, and alveolar macrophages. Our results suggest that prenatal arsenic exposure results in lasting effects on the adult host innate immune response to IAV infection, long after exposure to arsenic, leading to greater immunopathology. This study provides the first direct evidence that exclusive prenatal exposure to arsenic in drinking water causes predisposition to a hyperinflammatory response to IAV infection in adult mice, which is associated with significant lung damage.
Methods Whole lung homogenate preparation for single cell RNA sequencing (scRNA-seq).
Lungs were perfused with PBS via the right ventricle, harvested, and mechanically disassociated prior to straining through 70- and 30-µm filters to obtain a single-cell suspension. Dead cells were removed (annexin V EasySep kit, StemCell Technologies, Vancouver, Canada), and samples were enriched for cells of hematopoetic origin by magnetic separation using anti-CD45-conjugated microbeads (Miltenyi, Auburn, CA). Single-cell suspensions of 6 samples were loaded on a Chromium Single Cell system (10X Genomics) to generate barcoded single-cell gel beads in emulsion, and scRNA-seq libraries were prepared using Single Cell 3’ Version 2 chemistry. Libraries were multiplexed and sequenced on 4 lanes of a Nextseq 500 sequencer (Illumina) with 3 sequencing runs. Demultiplexing and barcode processing of raw sequencing data was conducted using Cell Ranger v. 3.0.1 (10X Genomics; Dartmouth Genomics Shared Resource Core). Reads were aligned to mouse (GRCm38) and influenza A virus (A/PR8/34, genome build GCF_000865725.1) genomes to generate unique molecular index (UMI) count matrices. Gene expression data have been deposited in the NCBI GEO database and are available at accession # GSE142047.
Preprocessing of single cell RNA sequencing (scRNA-seq) data
Count matrices produced using Cell Ranger were analyzed in the R statistical working environment (version 3.6.1). Preliminary visualization and quality analysis were conducted using scran (v 1.14.3, Lun et al., 2016) and Scater (v. 1.14.1, McCarthy et al., 2017) to identify thresholds for cell quality and feature filtering. Sample matrices were imported into Seurat (v. 3.1.1, Stuart., et al., 2019) and the percentage of mitochondrial, hemoglobin, and influenza A viral transcripts calculated per cell. Cells with < 1000 or > 20,000 unique molecular identifiers (UMIs: low quality and doublets), fewer than 300 features (low quality), greater than 10% of reads mapped to mitochondrial genes (dying) or greater than 1% of reads mapped to hemoglobin genes (red blood cells) were filtered from further analysis. Total cells per sample after filtering ranged from 1895-2482, no significant difference in the number of cells was observed in arsenic vs. control. Data were then normalized using SCTransform (Hafemeister et al., 2019) and variable features identified for each sample. Integration anchors between samples were identified using canonical correlation analysis (CCA) and mutual nearest neighbors (MNNs), as implemented in Seurat V3 (Stuart., et al., 2019) and used to integrate samples into a shared space for further comparison. This process enables identification of shared populations of cells between samples, even in the presence of technical or biological differences, while also allowing for non-overlapping populations that are unique to individual samples.
Clustering and reference-based cell identity labeling of single immune cells from IAV-infected lung with scRNA-seq
Principal components were identified from the integrated dataset and were used for Uniform Manifold Approximation and Projection (UMAP) visualization of the data in two-dimensional space. A shared-nearest-neighbor (SNN) graph was constructed using default parameters, and clusters identified using the SLM algorithm in Seurat at a range of resolutions (0.2-2). The first 30 principal components were used to identify 22 cell clusters ranging in size from 25 to 2310 cells. Gene markers for clusters were identified with the findMarkers function in scran. To label individual cells with cell type identities, we used the singleR package (v. 3.1.1) to compare gene expression profiles of individual cells with expression data from curated, FACS-sorted leukocyte samples in the Immgen compendium (Aran D. et al., 2019; Heng et al., 2008). We manually updated the Immgen reference annotation with 263 sample group labels for fine-grain analysis and 25 CD45+ cell type identities based on markers used to sort Immgen samples (Guilliams et al., 2014). The reference annotation is provided in Table S2, cells that were not labeled confidently after label pruning were assigned “Unknown”.
Differential gene expression by immune cells
Differential gene expression within individual cell types was performed by pooling raw count data from cells of each cell type on a per-sample basis to create a pseudo-bulk count table for each cell type. Differential expression analysis was only performed on cell types that were sufficiently represented (>10 cells) in each sample. In droplet-based scRNA-seq, ambient RNA from lysed cells is incorporated into droplets, and can result in spurious identification of these genes in cell types where they aren’t actually expressed. We therefore used a method developed by Young and Behjati (Young et al., 2018) to estimate the contribution of ambient RNA for each gene, and identified genes in each cell type that were estimated to be > 25% ambient-derived. These genes were excluded from analysis in a cell-type specific manner. Genes expressed in less than 5 percent of cells were also excluded from analysis. Differential expression analysis was then performed in Limma (limma-voom with quality weights) following a standard protocol for bulk RNA-seq (Law et al., 2014). Significant genes were identified using MA/QC criteria of P < .05, log2FC >1.
Analysis of arsenic effect on immune cell gene expression by scRNA-seq.
Sample-wide effects of arsenic on gene expression were identified by pooling raw count data from all cells per sample to create a count table for pseudo-bulk gene expression analysis. Genes with less than 20 counts in any sample, or less than 60 total counts were excluded from analysis. Differential expression analysis was performed using limma-voom as described above.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset consists of single-cell RNA sequencing data of bone marrow cells (CD34+ stem cells, GPA+ erythroblasts, ring sideroblasts and mononuclear cells) obtained from multiple healthy bone marrow donors and MDS-RS patients. The objective of this data collection was to assess several parameters on how the bone marrow of MDS-RS patients differs from that of healthy donors.
This dataset includes raw sequencing data in .fastq format, processed count matrices and associated pseudonymized metadata.
Processing: All samples were loaded onto Chromium Single Cell Chips (10x Genomics, CA, USA) at a target capture rate of 10,000 cells per sample. Single cell libraries were prepared using Chromium Next GEM Single Cell 3ʹ Kits v3.1 (10x Genomics) as per the manufacturer’s instructions, except 1µl additive ADT primers were added to the initial cDNA PCR amplification buffer and ADT libraries prepared as described in the Total-Seq B protocol (BioLegend) from the initial cDNA SPRI clean up. Libraries were pooled and sequenced on an Illumina NovaSeq 6000 (Illumina). Read pseudoalignment was performed against the GRCh38.p13 human genome assembly through kallisto v0.46.1 and bustools v0.40.0 was used for barcode and UMI counting.
The dataset consists of 2 folders: - Processed_Count_Matrices - Raw_FASTQ
And one xlsx file: - Sample_key.xlsx
The folder Processed_Count_Matrices contains 1 rds file, 1 tsv file, 9 mtx files, and 18 txt files. The folder Raw_FASTQ contains 27 GNU zipped fastq files, and 5 txt files.
The documentation file File_list_10x.txt contains a full list of the files in the dataset.
The total size of the dataset is approximately 21 GB.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
The original intent of assembling a data set of publicly-available tumor-infiltrating T cells (TILs) with paired TCR sequencing was to expand and improve the scRepertoire R package. However, after some discussion, we decided to release the data set for everyone, a complete summary of the sequencing runs and the sample information can be found in the meta data of the Seurat object. This repository contains the code for the initial processing and annotating of the data set (we are calling this version 0.0.1). This involves several steps 1) loading the respective GE data, 2) harmonizing the data by sample and cohort information, 3) iterating through automatic annotation, 4) unifying annotation via manual inspection and enrichment analysis, and 5) adding the TCR information.
Methods
Single-Cell Data Processing
The filtered gene matrices output from Cell Ranger align function from individual sequencing runs (10x Genomics, Pleasanton, CA) loaded into the R global environment. For each sequencing run cell barcodes were appended to contain a unique prefix to prevent issues with duplicate barcodes. The results were then ported into individual Seurat objects (citation), where the cells with > 10% mitochondrial genes and/or 2.5x natural log distribution of counts were excluded for quality control purposes. At the individual sequencing run level, doublets were estimated using the scDblFinder (v1.4.0) R package. All the sequencing runs across experiments were merged into a single Seurat Object using the merge() function. All the data was then normalized using the default settings and 2,000 variable genes were identified using the "vst" method. Next the data was scaled with the default settings and principal components were calculated for 40 components. Data was integrated using the harmony (v1.0.0) R package (citation) using both cohort and sample information to correct for batch effect with up to 20 iterations. The UMAP was created using the runUMAP() function in Seurat, using 20 dimensions of the harmony calculations.
Annotation of Cells
Automatic annotation was performed using the singler (v1.4.1) R package (citation) with the HPCA (citation) and DICE (citation) data sets as references and the fine label discriminators. Individual sequencing runs were subsetted to run through the singleR algorithm in order to reduce memory demands. The output of all the singleR analyses were collated and appended to the meta data of the seurat object. Likewise, the ProjecTILs (v0.4.1) R Package (citation) was used for automatic annotation as a partially orthogonal approach. Consensus annotation was derived from all 3 databases (HPCA, DICE, ProjecTILs) using a majority approach. No annotation designation was assigned to cells that returned NA for both singleR and ProjecTILs. Mixed annotations were designated with SingleR identified non-Tcells and ProjecTILs identified T cells. Cell type designations with less than 100 cells in the entire cohort were reduced to "other". Automated annotations were checked manually using canonical marker genes and gene enrichment analysis performed using UCell (v1.0.0) R package (citation).
Addition of TCR data
The filtered contig annotation T cell receptor (TCR) data for available sequencing runs were loaded into the R global environment. Individual contigs were combined using the combineTCR() function of scRepertoire (v1.3.2) R Package (citation). Clonotypes were assigned to barcodes and were multiple duplicate chains for individual cells were filtered to select for the top expressing contig by read count. The clonotype data was then added to the Seurat Object with proportion across individual patients being used to calculate frequency.
Citations
As of right now, there is no citation associated with the assembled data set. However if using the data, please find the corresponding manuscript for each data set in the meta.data of the single-cell object. In addition, if using the processed data, feel free to modify the language in the methods section (above) and please cite the appropriate manuscripts of the software or references that were used.
Itemized List of the Software Used
Itemized List of Reference Data Used
Future Directions
There are areas in which we are actively hoping to develop to further facilitate the usefulness of the data set - if you have other suggestions, please reach out using the contact information below.
Contact
Questions, comments, suggestions, please feel free to contact Nick Borcherding via this repository, email, or using twitter.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The scDART-seq data used in the study "Statistical Modeling of Single-Cell Epitranscriptomics Enabled Trajectory and Regulatory Inference of RNA Methylation" was obtained from both the SMART-seq2 and 10x Genomics platforms.For the SMART-seq2 dataset, the aim was to profile the m6A epitranscriptome in 1,382 HEK293T cells, consisting of 991 cells with m6A modifications (APOBEC1-YTH) and 391 negative control cells (APOBEC1-YTHmut). The negative control cells, identified by their mutated YTH domain, were unable to induce m6A-associated signals, thereby serving as a means to estimate background noise. After data extraction, 510,554 candidate m6A sites were retained for further analysis. The dataset also includes two case studies using SMART-seq2 data. These case studies utilized the Odds Ratio (OR) results from SigRM for nine trajectory-related genes (MCM6, PCNA, and SLBP as markers for the G1 phase; RRM2, MCM5, and DTL associated with the S phase; and TOP2A, CCNB1, and AURKA for the G2/M phase) as well as the expression levels of five genes related to m6A modification (the well-acknowledged m6A writers METTL3, METTL14, and WTAP, and the erasers FTO and ALKBH5).For the 10x Genomics data, the dataset contains read count information for two replicates (frequency_rep_1_processed.rds, frequency_rep_3_processed.rds), and data from 2,000 single cells from another replicate, including 1,000 test cells and 1,000 control cells. The read counts for these cells are found in frequency_all_processed.rds, and expression data is provided in expression_TPM.rds. A total of 17,733 candidate m6A sites were identified, with further details available in SNP_all_processed.rds. The code used can be found in the code files.The supplementary file contains the results of the case study 2.The files associated with this study include:SNP File (SNP_all.rds):Data Details: Stored in RDS format, containing a GRanges object.Content: Each entry in the GRanges object represents a specific genomic region corresponding to an m6A modification site detected through scDART-seq.seqnames: Factor Rle object containing chromosome or genomic sequence names.ranges: IRanges object containing genomic intervals (start and end positions).strand: Factor Rle object indicating the strand (directionality) of the genomic region.mcols: DataFrame object containing optional metadata columns, such as quality scores, coverage depth, and mutation data.seqinfo: Seqinfo object providing information about the genomic sequences present in the GRanges object.Purpose: Provides detailed genomic information about identified m6A modification sites, facilitating further analysis of their distribution, characteristics, and genomic context in HEK293T cells.Frequency File (frequency_all.rds):Data Details: Also stored in RDS format, consisting of a list.Content: Each item in the list represents a single-cell, containing counts of methylated and unmethylated reads for corresponding m6A modification sites detected in scDART-seq data.Purpose: Offers quantitative data on the abundance of methylated and unmethylated sequences at each m6A site across individual cells, enabling investigation of m6A modification patterns at a single-cell level.Expression TPM File (expression_TPM.rds):Data Details: Stored as an RDS file, comprising a list.Content: Each item in the list represents a single-cell, with corresponding TPM values for gene expression.Purpose: Provides information on gene expression levels across individual cells, facilitating examination of potential correlations between m6A modification patterns and gene expression profiles in scDART-seq data from HEK293T cells.gene Information File (gene_informations.rds):Data Details: Stored as an RDS file, comprising a data frame.Content: Includes information such as Gene ID, Gene Name, Reference, Strand, Start position, End position, and Coverage.Purpose: Offers additional details about gene expression data, aiding in the interpretation and analysis of gene expression profiles in conjunction with m6A modification patterns.
Facebook
TwitterThis dataset contains the raw read counts and phased SNP counts for every single cell in the sequencing datasets of breast cancer patient S0 from “Characterizing allele- and haplotype-specific copy numbers in single cells with CHISEL” [Zaccaria & Raphael, 2020]. These data enable the full reproduction of all the results in the related manuscript for breast cancer patient S0. Specifically, the data are provided in two files for every dataset DAT of patient S0 with the following format: DAT.raw_read_counts.bed.gz is a multi-cell BED file containing the raw read counts in the following fields: CHROMOSOME: the name of a human chromosome START: the starting genomic position of a genomic bin in the chromosome END: the ending genomic position of the genomic bin in the chromosome CELL: the cell barcode that uniquely identifies a cell NORMAL: the raw read count for the specified bin from a matched-normal sample COUNT: the raw read count for the specified bin in the specified cell RDR: the estimated read-depth ratio for the specified bin in the specified cell DAT.phased_snps_counts.pos.gz is a multi-cell POS file containing the phased SNP counts in the following fields: CHROMOSOME: the name of a human chromosome POS: the genomic position in the chromosome of a germline SNP CELL: the cell barcode that uniquely identifies a cell COUNT_HAPLOTYPE_A: the count of reads that cover the SNP and that belong to haplotype A in the specified cell COUNT_HAPLOTYPE_B: the count of reads that cover the SNP and that belong to haplotype B in the specified cell All the files have been compressed using standard gzip.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The 10x Chromium single-cell RNA sequencing technology is a powerful gene expression profiling platform, which is capable of profiling expression of thousands of genes in tens of thousands of cells simultaneously. This platform can produce hundreds of million reads in a single experiment, making it a very challenging task to quantify expression levels of genes in individual cells due to the massive data volume. Here we present cellCounts, a new tool for efficient and accurate quanti-fication of 10x Chromium. cellCounts employs the seed-and-vote strategy to align reads to a refer-ence genome, collapses reads to UMIs (Unique Molecular Identifier) and then assigns UMIs to genes based on the featureCounts program. Using multiple real datasets, we showed that cell-Counts is ~3 times faster than cellRanger, a popular quantification program developed by 10x. Using simulation and real datasets with built-in ground truth, we demonstrated that cellCounts is markedly more accurate than cellRanger, cellCounts is implemented in R, making it easily inte-grated with other R programs for analysing Chromium data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset of the publication "Single-cell RNA-seq using UltraMarathonRT"
Included is the following data:
The raw fastq files for uMRT K562, uMRT UHRR and Smartseq K562 are found under the following Zenodo:
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset includes single-cell RNA sequencing (scRNA-seq) data from co-cultures of primary T cells and the HNT-34 AML (acute myeloid leukemia) cell line after treatment with a SLAMF6 antibody or an isotype-matched control antibody. Libraries were produced using the 10X Genomics Chromium GEM-X Single Cell 5ʹ Reagent Kits v3 and sequenced on an Illumina Novaseq 6000 system (Illumina). The dataset is available as raw sequencing reads (fastq; restricted access) or as an annotated matrix of scRNA count data (h5ad). Published in: Sandén et al, Nature Cancer, 2025: https://www.nature.com/articles/s43018-025-01054-6
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplemental Data 1 is single-cell response to rapamycin count data first sequenced in this work and deposited in GEO with accession GSE242556. It is a 173348 rows × 5847 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 4 columns ('Gene', 'Replicate', 'Pool', and 'Experiment') are cell-specific metadata.
Supplemental Data 2 is bulk response to rapamycin count data first sequenced in this work. It is a 33 rows × 5847 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 4 columns ('Oligo', 'Time', 'Replicate', and 'Sample_barcode') are sample-specific metadata.
Supplemental Data 3 is single-cell count data published as GSE125162 and re-analyzed with the pipeline used for single-cell quantification in this work. It is a 65068 rows × 5850 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 7 columns ('Condition', 'Sample', 'Genotype_Group', 'Genotype_Individual', 'Genotype', 'Replicate', 'Cell_Barcode') are cell-specific metadata.
Supplemental Data 4 is the four deep learning models trained in this work. It is a TAR.GZ file containing the final biophysical transcription/decay model, the pre-trained decay model, the velocity prediction model, and the count prediction model. Each model file is an h5 file containing a pytorch model that can be loaded with supirfactor_dynamical.read().
Supplemental Data 5 is the prior knowledge network used to constrain the models for TF interpretability. It is a 1574 rows × 204 columns [Genes x TFs] TSV.GZ file where the first row is a header with TF names, the first column is an index of gene names, and TF-gene interactions are indicated by non-zero values in the matrix. There are 2799 TF-gene interactions.
Supplemental Table 6 is the oligonucleotide sequences used in this work. It is a TSV file with a header row.
Supplemental Table 7 is the yeast strains used in this work. It is a TSV file with a header row.
Supplemental Table 8 is gene metadata used in this work (e.g. Ribosomal Protein gene labels, etc). It is a TSV file with a header row.
Supplemental Table 9 is FY4/5 growth curve data generated in this work. It is a 20 rows × 7 columns TSV file where the first row is a header with replicate IDs, the first column is an index of times in minutes, and values are cell densities in YPD culture, in units of 10$^6$ cells / mL.
Supplemental Data 10 is a TAR.GZ file containing the yeast SacCer3 genome, modified to add UTR sequences, that was used to generate transcripts for kallisto pseudoalignment in this work.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Gene-count tables for FACS sorted cells sequenced with Smart-Seq2 from 20 organs of 7 mice. Cells are grouped by tissue of origin.Includes data for 53,760 cells, 44,879 of which passed a QC cutoff of at least 500 genes and 50,000 reads.Cell annotations using the Cell Ontology [1] controlled vocabulary are in a separate csv.This differs from v1 by renaming "Brain_Neurons" --> "Brain_Non-microglia" to be consistent with the manuscript.Update 2018-09-20: Updated annotations to latest manuscript versionUpdate 2018-02-16: Separated Diaphragm cells from Muscle cells, and Aorta cells from Heart cells.Update 2018-02-20: Aorta and Heart erroneously contained Diaphragm and Muscle data, and have now been corrected.Update 2018-03-09: Renamed tissues for nomenclature standards: "Colon" --> "Large_Intestine" "Muscle" --> "Limb_Muscle" "Mammary" --> "Mammary_Gland" "Brain_Microglia" --> "Brain_Myeloid" "Brain_Non-microglia" --> "Brain_Non-Myeloid"Update 2018-03-22: Renamed subtissues:- tissue: Heart, subtissue: ? --> tissue: Heart, subtissue: Unknown- tissue: Skin, subtissue: NA --> tissue: Skin, subtissue: TelogenUpdate 2018-03-23: Removed row numbers in first column of metadata_FACS.csvUpdate 2018-03-27: Added tissue tSNEs and cluster ids[1] http://purl.obolibrary.org/obo/cl.owl
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Isolated Lacrimal gland for Single Cell RNA sequence data from E16 and P4 mice. These are the raw read count matrices along side the barcode for the cells and genes that span the sparse matrix. These results are published in Development: Defining epithelial cell dynamics and lineage relationships in the developing lacrimal gland.
Abstract:
The tear producing lacrimal gland is a tubular organ that protects and lubricates the ocular surface. While the lacrimal gland possesses many features that make it an excellent model to understand tubulogenesis, the cell types and lineage relationships that drive lacrimal gland formation are unclear. Using single cell sequencing and other molecular tools, we reveal novel cell identities and epithelial lineage dynamics that underlie lacrimal gland development. We show that the lacrimal gland from its earliest developmental stages is composed of multiple subpopulations of immune, epithelial, and mesenchymal cell lineages. The epithelial lineage exhibits the most substantiative cellular changes, transitioning through a series of unique transcriptional states to become terminally differentiated acinar, ductal and myoepithelial cells. Furthermore, lineage tracing in postnatal and adult glands provides the first direct evidence of unipotent KRT5+ epithelial cells in the lacrimal gland. Finally, we show conservation of developmental markers between the developing mouse and human lacrimal gland, supporting the use of mice to understand human development. Together, our data reveal critical features of lacrimal gland development that have broad implications for understanding epithelial organogenesis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single cell RNA-seq data generated and reported as part of the manuscript entitled "Dissecting the mechanisms underlying the Cytokine Release Syndrome (CRS) mediated by T Cell Bispecific Antibodies" by Leclercq-Cohen et al 2023. Raw and processed (filtered and annotated) data are provided as AnnData objects which can be directly ingested to reproduce the findings of the paper or for ab initio data reuse: 1- raw.zip provides concatenated raw/unfiltered counts for the 20 samples in the standard Market Exchange Format (MEX) format. 2- 230330_sw_besca2_LowFil_raw.h5ad contains filtered cells and raw counts in the HDF5 format. 3- 221124_sw_besca2_LowFil.annotated.h5ad contains filtered cells and log normalized counts, along with cell type annotation in the HDF5 format.
scRNAseq data generation: Whole blood from 4 donors was treated with 0.2 μg/mL CD20-TCB, or incubated in the absence of CD20- TCB. At baseline (before addition of TCB) and assay endpoints (2, 4, 6, and 20 hrs), blood was collected for total leukocyte isolation using EasySepTM red blood cell depletion reagent (Stemcell). Briefly, cells were counted and processed for single cell RNA sequencing using the BD Rhapsody platform. To load several samples on a single BD Rhapsody cartridge, sample cells were labelled with sample tags (BD Human Single-Cell Multiplexing Kit) following the manufacturer’s protocol prior to pooling. Briefly, 1x106 cells from each sample were re-suspended in 180 μL FBS Stain Buffer (BD, PharMingen) and sample tags were added to the respective samples and incubated for 20 min at RT. After incubation, 2 successive washes were performed by addition of 2 mL stain buffer and centrifugation for 5 min at 300 g. Cells were then re- suspended in 620 μL cold BD Sample Buffer, stained with 3.1 μL of both 2 mM Calcein AM (Thermo Fisher Scientific) and 0.3 mM Draq7 (BD Biosciences) and finally counted on the BD Rhapsody scanner. Samples were then diluted and/or pooled equally in 650 μL cold BD Sample Buffer. The BD Rhapsody cartridges were then loaded with up to 40 000 – 50 000 cells. Single cells were isolated using Single-Cell Capture and cDNA Synthesis with the BD Rhapsody Express Single-Cell Analysis System according to the manufacturer’s recommendations (BD Biosciences). cDNA libraries were prepared using the Whole Transcriptome Analysis Amplification Kit following the BD Rhapsody System mRNA Whole Transcriptome Analysis (WTA) and Sample Tag Library Preparation Protocol (BD Biosciences). Indexed WTA and sample tags libraries were quantified and quality controlled on the Qubit Fluorometer using the Qubit dsDNA HS Assay, and on the Agilent 2100 Bioanalyzer system using the Agilent High Sensitivity DNA Kit. Sequencing was performed on a Novaseq 6000 (Illumina) in paired-end mode (64-8- 58) with Novaseq6000 S2 v1 or Novaseq6000 SP v1.5 reagents kits (100 cycles). scRNAseq data analysis: Sequencing data was processed using the BD Rhapsody Analysis pipeline (v 1.0 https://www.bd.com/documents/guides/user-guides/GMX_BD-Rhapsody-genomics- informatics_UG_EN.pdf) on the Seven Bridges Genomics platform. Briefly, read pairs with low sequencing quality were first removed and the cell label and UMI identified for further quality check and filtering. Valid reads were then mapped to the human reference genome (GRCh38-PhiX-gencodev29) using the aligner Bowtie2 v2.2.9, and reads with the same cell label, same UMI sequence and same gene were collapsed into a single raw molecule while undergoing further error correction and quality checks. Cell labels were filtered with a multi-step algorithm to distinguish those associated with putative cells from those associated with noise. After determining the putative cells, each cell was assigned to the sample of origin through the sample tag (only for cartridges with multiplex loading). Finally, the single-cell gene expression matrices were generated and a metrics summary was provided. After pre-processing with BD’s pipeline, the count matrices and metadata of each sample were aggregated into a single adata object and loaded into the besca v2.3 pipeline for the single cell RNA sequencing analysis (43). First, we filtered low quality cells with less than 200 genes, less than 500 counts or more than 30% of mitochondrial reads. This permissive filtering was used in order to preserve the neutrophils. We further excluded potential multiplets (cells with more than 5,000 genes or 20,000 counts), and genes expressed in less than 30 cells. Normalization, log-transformed UMI counts per 10,000 reads [log(CP10K+1)], was applied before downstream analysis. After normalization, technical variance was removed by regressing out the effects of total UMI counts and percentage of mitochondrial reads, and gene expression was scaled. The 2,507 most variable genes (having a minimum mean expression of 0.0125, a maximum mean expression of 3 and a minimum dispersion of 0.5) were used for principal component analysis. Finally, the first 50 PCs were used as input for calculating the 10 nearest neighbours and the neighbourhood graph was then embedded into the two-dimensional space using the UMAP algorithm at a resolution of 2. Cell type annotation was performed using the Sig-annot semi-automated besca module, which is a signature- based hierarchical cell annotation method. The used signatures, configuration and nomenclature files can be found at https://github.com/bedapub/besca/tree/master/besca/datasets. For more details, please refer to the publication.