100+ datasets found

Example ScRNAseq Dataset 2 for Learning Web-based Tools
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Jun 29, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sagnik Yarlagadda; Sagnik Yarlagadda; Todd D Giorgio; Todd D Giorgio (2023). Example ScRNAseq Dataset 2 for Learning Web-based Tools [Dataset]. http://doi.org/10.5281/zenodo.8084706
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8084706
Dataset updated
Jun 29, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sagnik Yarlagadda; Sagnik Yarlagadda; Todd D Giorgio; Todd D Giorgio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is one of the three example ScRNAseq datasets used to follow the guided example analyses within "A Guide to Single-Cell RNA Sequencing Analysis Using Web-based Tools for Non-Bioinformaticians" in the FEBS Journal. This dataset can be downloaded and imported into a variety of web-based tools and used as a learning device to gain more familiarity with the tools. As described in the paper, this dataset represents the negative control (carrier only).
Raw and processed (filtered and annotated) scRNAseq data
figshare.com
zip
Updated Jun 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabrielle Leclercq-Cohen; Sabrina Danilin; Llucia Alberti-Servera; Stephan Schmeing; Hélène Haegel; Sina Nassiri; Marina Bacac (2023). Raw and processed (filtered and annotated) scRNAseq data [Dataset]. http://doi.org/10.6084/m9.figshare.23499192.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23499192.v1
Dataset updated
Jun 12, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Gabrielle Leclercq-Cohen; Sabrina Danilin; Llucia Alberti-Servera; Stephan Schmeing; Hélène Haegel; Sina Nassiri; Marina Bacac
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Single cell RNA-seq data generated and reported as part of the manuscript entitled "Dissecting the mechanisms underlying the Cytokine Release Syndrome (CRS) mediated by T Cell Bispecific Antibodies" by Leclercq-Cohen et al 2023. Raw and processed (filtered and annotated) data are provided as AnnData objects which can be directly ingested to reproduce the findings of the paper or for ab initio data reuse: 1- raw.zip provides concatenated raw/unfiltered counts for the 20 samples in the standard Market Exchange Format (MEX) format. 2- 230330_sw_besca2_LowFil_raw.h5ad contains filtered cells and raw counts in the HDF5 format. 3- 221124_sw_besca2_LowFil.annotated.h5ad contains filtered cells and log normalized counts, along with cell type annotation in the HDF5 format.

scRNAseq data generation: Whole blood from 4 donors was treated with 0.2 μg/mL CD20-TCB, or incubated in the absence of CD20- TCB. At baseline (before addition of TCB) and assay endpoints (2, 4, 6, and 20 hrs), blood was collected for total leukocyte isolation using EasySepTM red blood cell depletion reagent (Stemcell). Briefly, cells were counted and processed for single cell RNA sequencing using the BD Rhapsody platform. To load several samples on a single BD Rhapsody cartridge, sample cells were labelled with sample tags (BD Human Single-Cell Multiplexing Kit) following the manufacturer’s protocol prior to pooling. Briefly, 1x106 cells from each sample were re-suspended in 180 μL FBS Stain Buffer (BD, PharMingen) and sample tags were added to the respective samples and incubated for 20 min at RT. After incubation, 2 successive washes were performed by addition of 2 mL stain buffer and centrifugation for 5 min at 300 g. Cells were then re- suspended in 620 μL cold BD Sample Buffer, stained with 3.1 μL of both 2 mM Calcein AM (Thermo Fisher Scientific) and 0.3 mM Draq7 (BD Biosciences) and finally counted on the BD Rhapsody scanner. Samples were then diluted and/or pooled equally in 650 μL cold BD Sample Buffer. The BD Rhapsody cartridges were then loaded with up to 40 000 – 50 000 cells. Single cells were isolated using Single-Cell Capture and cDNA Synthesis with the BD Rhapsody Express Single-Cell Analysis System according to the manufacturer’s recommendations (BD Biosciences). cDNA libraries were prepared using the Whole Transcriptome Analysis Amplification Kit following the BD Rhapsody System mRNA Whole Transcriptome Analysis (WTA) and Sample Tag Library Preparation Protocol (BD Biosciences). Indexed WTA and sample tags libraries were quantified and quality controlled on the Qubit Fluorometer using the Qubit dsDNA HS Assay, and on the Agilent 2100 Bioanalyzer system using the Agilent High Sensitivity DNA Kit. Sequencing was performed on a Novaseq 6000 (Illumina) in paired-end mode (64-8- 58) with Novaseq6000 S2 v1 or Novaseq6000 SP v1.5 reagents kits (100 cycles). scRNAseq data analysis: Sequencing data was processed using the BD Rhapsody Analysis pipeline (v 1.0 https://www.bd.com/documents/guides/user-guides/GMX_BD-Rhapsody-genomics- informatics_UG_EN.pdf) on the Seven Bridges Genomics platform. Briefly, read pairs with low sequencing quality were first removed and the cell label and UMI identified for further quality check and filtering. Valid reads were then mapped to the human reference genome (GRCh38-PhiX-gencodev29) using the aligner Bowtie2 v2.2.9, and reads with the same cell label, same UMI sequence and same gene were collapsed into a single raw molecule while undergoing further error correction and quality checks. Cell labels were filtered with a multi-step algorithm to distinguish those associated with putative cells from those associated with noise. After determining the putative cells, each cell was assigned to the sample of origin through the sample tag (only for cartridges with multiplex loading). Finally, the single-cell gene expression matrices were generated and a metrics summary was provided. After pre-processing with BD’s pipeline, the count matrices and metadata of each sample were aggregated into a single adata object and loaded into the besca v2.3 pipeline for the single cell RNA sequencing analysis (43). First, we filtered low quality cells with less than 200 genes, less than 500 counts or more than 30% of mitochondrial reads. This permissive filtering was used in order to preserve the neutrophils. We further excluded potential multiplets (cells with more than 5,000 genes or 20,000 counts), and genes expressed in less than 30 cells. Normalization, log-transformed UMI counts per 10,000 reads [log(CP10K+1)], was applied before downstream analysis. After normalization, technical variance was removed by regressing out the effects of total UMI counts and percentage of mitochondrial reads, and gene expression was scaled. The 2,507 most variable genes (having a minimum mean expression of 0.0125, a maximum mean expression of 3 and a minimum dispersion of 0.5) were used for principal component analysis. Finally, the first 50 PCs were used as input for calculating the 10 nearest neighbours and the neighbourhood graph was then embedded into the two-dimensional space using the UMAP algorithm at a resolution of 2. Cell type annotation was performed using the Sig-annot semi-automated besca module, which is a signature- based hierarchical cell annotation method. The used signatures, configuration and nomenclature files can be found at https://github.com/bedapub/besca/tree/master/besca/datasets. For more details, please refer to the publication.
n
Data from: Large-scale integration of single-cell transcriptomic data...
data.niaid.nih.gov
dataone.org
+1more
zip
Updated Dec 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove (2021). Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration [Dataset]. http://doi.org/10.5061/dryad.t4b8gtj34
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.t4b8gtj34
Dataset updated
Dec 14, 2021
Dataset provided by
Cornell University
Authors
David McKellar; Iwijn De Vlaminck; Benjamin Cosgrove
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.

Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.

Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).

Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.

Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).

Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).

Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.

Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.

Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).

Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using
s
Single-cell RNA sequencing data on primary samples from: Aberrant expression...
figshare.scilifelab.se
researchdata.se
+1more
hdf
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carl Sandén; Henrik Lilljebjörn; Thoas Fioretos (2025). Single-cell RNA sequencing data on primary samples from: Aberrant expression of SLAMF6 constitutes a targetable immune escape mechanism in acute myeloid leukemia [Dataset]. http://doi.org/10.17044/scilifelab.28263911.v2
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.17044/scilifelab.28263911.v2
Dataset updated
Jul 1, 2025
Dataset provided by
Lund University
Authors
Carl Sandén; Henrik Lilljebjörn; Thoas Fioretos
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
This dataset includes single-cell RNA sequencing (scRNA-seq) data from primary AML (acute myeloid leukemia) samples. Libraries were produced using the 10X Genomics Chromium Single Cell 3ʹ Reagent Kits v3 and sequenced on an Illumina Novaseq 6000 system (Illumina). The dataset is available as raw sequencing reads (fastq; restricted access) or as an annotated matrix of scRNA count data (h5ad).
Example data for our Single-cell RNA sequencing (scRNA-seq) Differential...
zenodo.org
bin
Updated Feb 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephan Reichl; Stephan Reichl (2024). Example data for our Single-cell RNA sequencing (scRNA-seq) Differential Expression Analysis & Visualization Snakemake Workflow [Dataset]. http://doi.org/10.5281/zenodo.10688824
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10688824
Dataset updated
Feb 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Stephan Reichl; Stephan Reichl
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As an example data set for our Single-cell RNA sequencing (scRNA-seq) Differential Expression Analysis & Visualization Snakemake Workflow, we selected a scRNA-seq data set consisting of 15 CRC samples from Lee et al (2020) Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer. Nature Genetics. Downloaded from the Weizmann Institute - Curated Cancer Cell Atlas (3CA) - Colorectal Cancer section.
- samples/patients: 15
- cells: 21657
- features (genes): 22276
- preprocessed using the compatible MR.PARETO module for scRNA-seq data processing & visualization.
c
Data from: Reference transcriptomics of porcine peripheral immune cells...
s.cnmilf.com
agdatacommons.nal.usda.gov
+3more
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/data-from-reference-transcriptomics-of-porcine-peripheral-immune-cells-created-through-bul-e667c
Explore at:
Dataset updated
Jun 5, 2025
Dataset provided by
Agricultural Research Service
Description
This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows: matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz) *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include: nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().
n
Data from: Single cell RNA-seq analysis reveals that prenatal arsenic...
data.niaid.nih.gov
datadryad.org
+1more
zip
Updated Jun 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Britton Goodale; Kevin Hsu; Kenneth Ely; Thomas Hampton; Bruce Stanton; Richard Enelow (2020). Single cell RNA-seq analysis reveals that prenatal arsenic exposure results in long-term, adverse effects on immune gene expression in response to Influenza A infection [Dataset]. http://doi.org/10.5061/dryad.vt4b8gtp6
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.vt4b8gtp6
Dataset updated
Jun 1, 2020
Dataset provided by
Dartmouth College
Dartmouth–Hitchcock Medical Center
Authors
Britton Goodale; Kevin Hsu; Kenneth Ely; Thomas Hampton; Bruce Stanton; Richard Enelow
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Arsenic exposure via drinking water is a serious environmental health concern. Epidemiological studies suggest a strong association between prenatal arsenic exposure and subsequent childhood respiratory infections, as well as morbidity from respiratory diseases in adulthood, long after systemic clearance of arsenic. We investigated the impact of exclusive prenatal arsenic exposure on the inflammatory immune response and respiratory health after an adult influenza A (IAV) lung infection. C57BL/6J mice were exposed to 100 ppb sodium arsenite in utero, and subsequently infected with IAV (H1N1) after maturation to adulthood. Assessment of lung tissue and bronchoalveolar lavage fluid (BALF) at various time points post IAV infection reveals greater lung damage and inflammation in arsenic exposed mice versus control mice. Single-cell RNA sequencing analysis of immune cells harvested from IAV infected lungs suggests that the enhanced inflammatory response is mediated by dysregulation of innate immune function of monocyte derived macrophages, neutrophils, NK cells, and alveolar macrophages. Our results suggest that prenatal arsenic exposure results in lasting effects on the adult host innate immune response to IAV infection, long after exposure to arsenic, leading to greater immunopathology. This study provides the first direct evidence that exclusive prenatal exposure to arsenic in drinking water causes predisposition to a hyperinflammatory response to IAV infection in adult mice, which is associated with significant lung damage.

Methods Whole lung homogenate preparation for single cell RNA sequencing (scRNA-seq).

Lungs were perfused with PBS via the right ventricle, harvested, and mechanically disassociated prior to straining through 70- and 30-µm filters to obtain a single-cell suspension. Dead cells were removed (annexin V EasySep kit, StemCell Technologies, Vancouver, Canada), and samples were enriched for cells of hematopoetic origin by magnetic separation using anti-CD45-conjugated microbeads (Miltenyi, Auburn, CA). Single-cell suspensions of 6 samples were loaded on a Chromium Single Cell system (10X Genomics) to generate barcoded single-cell gel beads in emulsion, and scRNA-seq libraries were prepared using Single Cell 3’ Version 2 chemistry. Libraries were multiplexed and sequenced on 4 lanes of a Nextseq 500 sequencer (Illumina) with 3 sequencing runs. Demultiplexing and barcode processing of raw sequencing data was conducted using Cell Ranger v. 3.0.1 (10X Genomics; Dartmouth Genomics Shared Resource Core). Reads were aligned to mouse (GRCm38) and influenza A virus (A/PR8/34, genome build GCF_000865725.1) genomes to generate unique molecular index (UMI) count matrices. Gene expression data have been deposited in the NCBI GEO database and are available at accession # GSE142047.

Preprocessing of single cell RNA sequencing (scRNA-seq) data

Count matrices produced using Cell Ranger were analyzed in the R statistical working environment (version 3.6.1). Preliminary visualization and quality analysis were conducted using scran (v 1.14.3, Lun et al., 2016) and Scater (v. 1.14.1, McCarthy et al., 2017) to identify thresholds for cell quality and feature filtering. Sample matrices were imported into Seurat (v. 3.1.1, Stuart., et al., 2019) and the percentage of mitochondrial, hemoglobin, and influenza A viral transcripts calculated per cell. Cells with < 1000 or > 20,000 unique molecular identifiers (UMIs: low quality and doublets), fewer than 300 features (low quality), greater than 10% of reads mapped to mitochondrial genes (dying) or greater than 1% of reads mapped to hemoglobin genes (red blood cells) were filtered from further analysis. Total cells per sample after filtering ranged from 1895-2482, no significant difference in the number of cells was observed in arsenic vs. control. Data were then normalized using SCTransform (Hafemeister et al., 2019) and variable features identified for each sample. Integration anchors between samples were identified using canonical correlation analysis (CCA) and mutual nearest neighbors (MNNs), as implemented in Seurat V3 (Stuart., et al., 2019) and used to integrate samples into a shared space for further comparison. This process enables identification of shared populations of cells between samples, even in the presence of technical or biological differences, while also allowing for non-overlapping populations that are unique to individual samples.

Clustering and reference-based cell identity labeling of single immune cells from IAV-infected lung with scRNA-seq

Principal components were identified from the integrated dataset and were used for Uniform Manifold Approximation and Projection (UMAP) visualization of the data in two-dimensional space. A shared-nearest-neighbor (SNN) graph was constructed using default parameters, and clusters identified using the SLM algorithm in Seurat at a range of resolutions (0.2-2). The first 30 principal components were used to identify 22 cell clusters ranging in size from 25 to 2310 cells. Gene markers for clusters were identified with the findMarkers function in scran. To label individual cells with cell type identities, we used the singleR package (v. 3.1.1) to compare gene expression profiles of individual cells with expression data from curated, FACS-sorted leukocyte samples in the Immgen compendium (Aran D. et al., 2019; Heng et al., 2008). We manually updated the Immgen reference annotation with 263 sample group labels for fine-grain analysis and 25 CD45+ cell type identities based on markers used to sort Immgen samples (Guilliams et al., 2014). The reference annotation is provided in Table S2, cells that were not labeled confidently after label pruning were assigned “Unknown”.

Differential gene expression by immune cells

Differential gene expression within individual cell types was performed by pooling raw count data from cells of each cell type on a per-sample basis to create a pseudo-bulk count table for each cell type. Differential expression analysis was only performed on cell types that were sufficiently represented (>10 cells) in each sample. In droplet-based scRNA-seq, ambient RNA from lysed cells is incorporated into droplets, and can result in spurious identification of these genes in cell types where they aren’t actually expressed. We therefore used a method developed by Young and Behjati (Young et al., 2018) to estimate the contribution of ambient RNA for each gene, and identified genes in each cell type that were estimated to be > 25% ambient-derived. These genes were excluded from analysis in a cell-type specific manner. Genes expressed in less than 5 percent of cells were also excluded from analysis. Differential expression analysis was then performed in Limma (limma-voom with quality weights) following a standard protocol for bulk RNA-seq (Law et al., 2014). Significant genes were identified using MA/QC criteria of P < .05, log2FC >1.

Analysis of arsenic effect on immune cell gene expression by scRNA-seq.

Sample-wide effects of arsenic on gene expression were identified by pooling raw count data from all cells per sample to create a count table for pseudo-bulk gene expression analysis. Genes with less than 20 counts in any sample, or less than 60 total counts were excluded from analysis. Differential expression analysis was performed using limma-voom as described above.
f
Data from: COVID-19 severity correlates with airway epithelium-immune cell...
figshare.com
application/gzip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert Lorenz Chua; Soeren Lukassen; Saskia Trump; Bianca P. Hennig; Daniel Wendisch; Fabian Pott; Olivia Debnath; Loreen Thürmann; Florian Kurth; Maria Theresa Völker; Julia Kazmierski; Bernd Timmermann; Sven Twardziok; Stefan Schneider; Felix Machleidt; Holger Müller-Redetzky; Melanie Maier; Alexander Krannich; Sein Schmidt; Felix Balzer; Johannes Liebig; Jennifer Loske; Norbert Suttorp; Jürgen Eils; Naveed Ishaque; Uwe Gerd Liebert; Christof von Kalle; Andreas Hocke; Martin Witzenrath; Christine Goffinet; Christian Drosten; Sven Laudi; Irina Lehmann; Christian Conrad; Leif-Erik Sander; Roland Eils (2023). COVID-19 severity correlates with airway epithelium-immune cell interactions identified by single-cell analysis [Dataset]. http://doi.org/10.6084/m9.figshare.12436517.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12436517.v2
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Robert Lorenz Chua; Soeren Lukassen; Saskia Trump; Bianca P. Hennig; Daniel Wendisch; Fabian Pott; Olivia Debnath; Loreen Thürmann; Florian Kurth; Maria Theresa Völker; Julia Kazmierski; Bernd Timmermann; Sven Twardziok; Stefan Schneider; Felix Machleidt; Holger Müller-Redetzky; Melanie Maier; Alexander Krannich; Sein Schmidt; Felix Balzer; Johannes Liebig; Jennifer Loske; Norbert Suttorp; Jürgen Eils; Naveed Ishaque; Uwe Gerd Liebert; Christof von Kalle; Andreas Hocke; Martin Witzenrath; Christine Goffinet; Christian Drosten; Sven Laudi; Irina Lehmann; Christian Conrad; Leif-Erik Sander; Roland Eils
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Single-cell RNA-Seq of airway samples of COVID-19 patients and healthy controlsThis dataset comprises single-cell RNA-Seq data of nasopharyngeal, protected specimen brush, and bronchial lavage samples of 19 COVID-19 patients (eight moderate and eleven critical according to the WHO classification) and five healthy controls, for a total of 36 samples. An in-depth description is presented in the manuscript "Cross-talk between the airway epithelium and activated immune cells defines severity in COVID-19" (https://www.medrxiv.org/content/10.1101/2020.04.29.20084327v1). The data is uploaded as two .rds files of Seurat objects that can be imported into R. The _main file contains all samples from the nasopharynx, while the _loc file contains data from nasopharyngeal, protected specimen brush, and bronchial lavage samples of two patients. A quantification of viral RNA reads (as CPM, in total over cells and background) is provided as .xlsx file. Please note that these values may differ from viral load estimates obtained from diagnostic procedures and may be less accurate.Raw count values (cellranger output) are provided in the file count_matrices_NBT.tar.
Sample scRNA-seq data
figshare.com
hdf
Updated Nov 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonardo Garma (2022). Sample scRNA-seq data [Dataset]. http://doi.org/10.6084/m9.figshare.21356205.v1
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21356205.v1
Dataset updated
Nov 29, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Leonardo Garma
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Single cell RNA-seq subset from Lee et al.'s (https://doi.org/10.1016/j.neuron.2020.06.021) dataset
E
Single cell RNAseq of PBMC from bladder cancer patients
ega-archive.org
Updated Nov 21, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Single cell RNAseq of PBMC from bladder cancer patients [Dataset]. https://ega-archive.org/datasets/EGAD00001005481
Explore at:
Dataset updated
Nov 21, 2018
License
https://ega-archive.org/dacs/EGAC00001001380https://ega-archive.org/dacs/EGAC00001001380
Description
This dataset contains single cell RNA sequencing data of PBMC samples from 10 bladder cancer patients. cDNAs and single cell RNA libraries were prepared following manufacturer’s user guide (10x Genomics). Each library was sequenced in HiSeq4000 (Illumina) to achieve ~300 million reads following manufacturer’s sequencing specification.
f
Single-cell RNA-Seq of human primary lung and bronchial epithelium cells
figshare.com
data.mendeley.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soeren Lukassen; Robert Lorenz Chua; Timo Trefzer; Nicolas C. Kahn; Marc A. Schneider; Thomas Muley; Hauke Winter; Michael Meister; Carmen Veith; Agnes W. Boots; Bianca P. Hennig; Michael Kreuter; Christian Conrad; Roland Eils (2023). Single-cell RNA-Seq of human primary lung and bronchial epithelium cells [Dataset]. http://doi.org/10.6084/m9.figshare.11981034.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11981034.v2
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Soeren Lukassen; Robert Lorenz Chua; Timo Trefzer; Nicolas C. Kahn; Marc A. Schneider; Thomas Muley; Hauke Winter; Michael Meister; Carmen Veith; Agnes W. Boots; Bianca P. Hennig; Michael Kreuter; Christian Conrad; Roland Eils
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains count matrices and per-cells metadata tables for RNA sequencing of 39778 single nuclei from healthy primary lung samples of 12 lung adenocarcinoma patients as well as 17451 single human bronchiole epithelial cells from 4 donors. All samples were processed using the 10X Genomics Chromium platform with v2 chemistry and sequenced with one sample per lane on an Illumina HiSeq4000. Reads were aligned to the hg19 reference genome version 1.2.0 obtained from 10X Genomics. Data processing was performed using Seurat3. The metadata table includes patient ID, sex, age, smoking status, and cell type, as well as QC statistics (number of genes, number of cells, ratio of mitochondrial reads).
E
Single cell RNA-sequencing of treatment naïve PDAC patient samples
ega-archive.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Single cell RNA-sequencing of treatment naïve PDAC patient samples [Dataset]. https://ega-archive.org/datasets/EGAD00001008961
Explore at:
License
https://ega-archive.org/dacs/EGAC00001002724https://ega-archive.org/dacs/EGAC00001002724
Description
Single cell RNA-sequencing of treatment naïve PDAC patient samples. We have 10 samples, sequenced using the 10X genomics chromium platform with 3 prime chemistry. We are submitting FASTQ files representing the index files (I1), Read1 (R1) and Read2 (R2).
s
Single Cell Smart-Seq 3 RNA-Seq and Bulk Exome Seq from Breast Cancer...
figshare.scilifelab.se
researchdata.se
+1more
Updated Jan 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seong-Hwan Jun; Hosein Toosi; Jeff Mold; Camilla Engblom; Xinsong Chen; Ciara O’Flanagan; Michael Hagemann-Jensen; Rickard Sandberg; Johan Hartman; Samuel Aparicio; Andrew Roth; Jens Lagergren (2025). Single Cell Smart-Seq 3 RNA-Seq and Bulk Exome Seq from Breast Cancer Patients [Dataset]. http://doi.org/10.17044/scilifelab.15082398.v1
Explore at:
Unique identifier
https://doi.org/10.17044/scilifelab.15082398.v1
Dataset updated
Jan 15, 2025
Dataset provided by
KTH Royal Institute of Technology
Authors
Seong-Hwan Jun; Hosein Toosi; Jeff Mold; Camilla Engblom; Xinsong Chen; Ciara O’Flanagan; Michael Hagemann-Jensen; Rickard Sandberg; Johan Hartman; Samuel Aparicio; Andrew Roth; Jens Lagergren
License
https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/
Description
Data Set DescriptionSingle cell RNA sequencing (Samrt-Seq3) and Whole exome sequencing from multiple regions of individual tumors from Breast Cancer patients and also single cell RNA seq for two ovarian cancer cell lines.The dataset contains raw sequencing data for various high-throughput molecular tests performed on two sample types: tumor samples from two breast cancer patients and cell lines derived from High-grade serous carcinoma Patients. The breast cancer data comes from two patients: patient 1 (BCSA1) has two tumor regions A-B and patient 2 (BCSA2) has five regions(A-E). For a normal sample and each region from each patient Whole Exome Sequencing was performed using Twist Biosciences Human Exome Kit by the SNP&SEQ Technology platform, SciLifeLab, National Genomics Infrastructure Uppsala, Sweden. Also for each patient, EPCAM+ CD45- sorted cells from all the regions where sorted to a 384 well plate, and Smart-Seq3 libraries were prepared at Karolinska Institutet and sequenced at National Genomics Infrastructure Uppsala, Sweden.The HGSOC cell-line data comes from OV2295R2 and TOV2295R cell lines described in Laks et al Cell 2019 Nov 14; 179(5): 1207–1221.e22 doi: 10.1016/j.cell.2019.10.026 . The cell line Smart-Seq3 libraries were prepared from two 384 well plates at Karolinska Institutet and sequenced at National Genomics Infrastructure Uppsala, Sweden.Terms for accessThis dataset is to be used for research on intratumor heterogeneity and subclonal evolution of tumors. To apply for conditional access to the dataset in this publication, please contact datacentre@scilifelab.se.
m
Sample scRNA-seq Data for Cell Type Annotation
mllmcelltype.com
csv
Updated Jun 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mLLMCelltype (2025). Sample scRNA-seq Data for Cell Type Annotation [Dataset]. https://www.mllmcelltype.com/
Explore at:
csvAvailable download formats
Dataset updated
Jun 29, 2025
Dataset authored and provided by
mLLMCelltype
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Example single-cell RNA sequencing dataset with marker genes for testing cell type annotation
E
single-cell RNA-Seq samples of CRC patients
ega-archive.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
single-cell RNA-Seq samples of CRC patients [Dataset]. https://ega-archive.org/datasets/EGAD00001009634
Explore at:
License
https://ega-archive.org/dacs/EGAC00001002914https://ega-archive.org/dacs/EGAC00001002914
Description
The dataset contains samples of 11 CRC patients (2 samples for each patient, tumor and normal adjacent tissue site, 22 samples in total). Dataset is composed by fastq file (paired end) type from 10x single-cell RNA-Seq.
Bulk RNA-seq dataset of human embryonic stem cells (hESCs) differentiation
zenodo.org
data.niaid.nih.gov
csv
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tamar Hashimshony; Tamar Hashimshony; Yael Mandel-Gutfreund; Yael Mandel-Gutfreund (2023). Bulk RNA-seq dataset of human embryonic stem cells (hESCs) differentiation [Dataset]. http://doi.org/10.5281/zenodo.8009633
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8009633
Dataset updated
Jun 7, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tamar Hashimshony; Tamar Hashimshony; Yael Mandel-Gutfreund; Yael Mandel-Gutfreund
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset consists of 31 bulk samples obtained from human embryonic stem cells undergoing undirected differentiation. The samples were collected at 16 time points on days 0 and 7-21 of differentiation. The file contains the raw read counts.

More details can be found in:

SPIRAL: Significant Process InfeRence ALgorithm for single cell RNA-sequencing and spatial transcriptomics\
Hadas Biran, Tamar Hashimshony, Tamar Lahav, Or Efrat, Yael Mandel-Gutfreund, and Zohar Yakhini

https://doi.org/10.1101/2022.05.24.493189
Z
Example ScRNAseq Dataset 1 for Learning Web-based Tools
data.niaid.nih.gov
Updated Jun 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giorgio, Todd D (2023). Example ScRNAseq Dataset 1 for Learning Web-based Tools [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8084641
Explore at:
Dataset updated
Jun 28, 2023
Dataset provided by
Giorgio, Todd D
Yarlagadda, Sagnik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is one of the three example ScRNAseq datasets used to follow the guided example analyses within "A Guide to Single-Cell RNA Sequencing Analysis Using Web-based Tools for Non-Bioinformaticians" in the FEBS Journal. This dataset can be downloaded and imported into a variety of web-based tools and used as a learning device to gain more familiarity with the tools. As described in the paper, this dataset represents the untreated control.
f
Table_1_High-Order Correlation Integration for Single-Cell or Bulk RNA-seq...
frontiersin.figshare.com
zip
Updated Jun 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hui Tang; Tao Zeng; Luonan Chen (2023). Table_1_High-Order Correlation Integration for Single-Cell or Bulk RNA-seq Data Analysis.docx [Dataset]. http://doi.org/10.3389/fgene.2019.00371.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2019.00371.s001
Dataset updated
Jun 6, 2023
Dataset provided by
Frontiers
Authors
Hui Tang; Tao Zeng; Luonan Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Quantifying or labeling the sample type with high quality is a challenging task, which is a key step for understanding complex diseases. Reducing noise pollution to data and ensuring the extracted intrinsic patterns in concordance with the primary data structure are important in sample clustering and classification. Here we propose an effective data integration framework named as HCI (High-order Correlation Integration), which takes an advantage of high-order correlation matrix incorporated with pattern fusion analysis (PFA), to realize high-dimensional data feature extraction. On the one hand, the high-order Pearson's correlation coefficient can highlight the latent patterns underlying noisy input datasets and thus improve the accuracy and robustness of the algorithms currently available for sample clustering. On the other hand, the PFA can identify intrinsic sample patterns efficiently from different input matrices by optimally adjusting the signal effects. To validate the effectiveness of our new method, we firstly applied HCI on four single-cell RNA-seq datasets to distinguish the cell types, and we found that HCI is capable of identifying the prior-known cell types of single-cell samples from scRNA-seq data with higher accuracy and robustness than other methods under different conditions. Secondly, we also integrated heterogonous omics data from TCGA datasets and GEO datasets including bulk RNA-seq data, which outperformed the other methods at identifying distinct cancer subtypes. Within an additional case study, we also constructed the mRNA-miRNA regulatory network of colorectal cancer based on the feature weight estimated from HCI, where the differentially expressed mRNAs and miRNAs were significantly enriched in well-known functional sets of colorectal cancer, such as KEGG pathways and IPA disease annotations. All these results supported that HCI has extensive flexibility and applicability on sample clustering with different types and organizations of RNA-seq data.
Data Repository: Single-cell mapper (scMappR): using scRNA-seq to infer...
zenodo.org
data.niaid.nih.gov
bin
Updated Feb 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dustin Sokolowski; Mariela Faykoo-Martinez; Lauren Erdman; Huayun Hou; Cadia Chan; Helen Zhu; Melissa M. Holmes; Anna Goldenberg; Michael D Wilson; Dustin Sokolowski; Mariela Faykoo-Martinez; Lauren Erdman; Huayun Hou; Cadia Chan; Helen Zhu; Melissa M. Holmes; Anna Goldenberg; Michael D Wilson (2021). Data Repository: Single-cell mapper (scMappR): using scRNA-seq to infer cell-type specificities of differentially expressed genes [Dataset]. http://doi.org/10.1101/2020.08.24.265298
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1101/2020.08.24.265298
Dataset updated
Feb 12, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dustin Sokolowski; Mariela Faykoo-Martinez; Lauren Erdman; Huayun Hou; Cadia Chan; Helen Zhu; Melissa M. Holmes; Anna Goldenberg; Michael D Wilson; Dustin Sokolowski; Mariela Faykoo-Martinez; Lauren Erdman; Huayun Hou; Cadia Chan; Helen Zhu; Melissa M. Holmes; Anna Goldenberg; Michael D Wilson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data repository for the scMappR manuscript:

Abstract from biorXiv (https://www.biorxiv.org/content/10.1101/2020.08.24.265298v1.full).

RNA sequencing (RNA-seq) is widely used to identify differentially expressed genes (DEGs) and reveal biological mechanisms underlying complex biological processes. RNA-seq is often performed on heterogeneous samples and the resulting DEGs do not necessarily indicate the cell types where the differential expression occurred. While single-cell RNA-seq (scRNA-seq) methods solve this problem, technical and cost constraints currently limit its widespread use. Here we present single cell Mapper (scMappR), a method that assigns cell-type specificity scores to DEGs obtained from bulk RNA-seq by integrating cell-type expression data generated by scRNA-seq and existing deconvolution methods. After benchmarking scMappR using RNA-seq data obtained from sorted blood cells, we asked if scMappR could reveal known cell-type specific changes that occur during kidney regeneration. We found that scMappR appropriately assigned DEGs to cell-types involved in kidney regeneration, including a relatively small proportion of immune cells. While scMappR can work with any user supplied scRNA-seq data, we curated scRNA-seq expression matrices for ∼100 human and mouse tissues to facilitate its use with bulk RNA-seq data alone. Overall, scMappR is a user-friendly R package that complements traditional differential expression analysis available at CRAN.
c
Alternate gene annotations for rat, macaque, and marmoset for single cell...
kilthub.cmu.edu
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BaDoi Phan; Andreas Pfenning (2023). Alternate gene annotations for rat, macaque, and marmoset for single cell RNA and ATAC analyses [Dataset]. http://doi.org/10.1184/R1/21176401.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1184/R1/21176401.v1
Dataset updated
May 30, 2023
Dataset provided by
Carnegie Mellon University
Authors
BaDoi Phan; Andreas Pfenning
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Custom genome and gene annotations for single cell ATAC and RNA-seq analyses by BaDoi Phan (badoi dot phan at pitt dot edu)

This Kilthub upload is a clone of the github repository where this project may be updated or corrected in the future: https://github.com/pfenninglab/custom_ArchR_genomes_and_annotations

Premise: Not all of single-cell ATAC-seq biomedical molecular epigenetics is done in human and mouse genomes where there are high quality genomes and gene annotations. For the other species that are still highly relevant to study health and disease, here are some ArchR annotations to enable less frustration to have snATAC-seq data analyzed with ArchR.

Strategy for better gene annotations: We can use the proper that evolution of related mammalian species tend to have orthologous gene elements (TSS, exons, genes). For example, house mouse (mus musculus) is a median of 15.4MY diverged from the Norway rat (rattus norvegicus), with TimeTree. Humans are a median of 28.9 MY diverged from rhesus macaques. To borrow the higher quality and more complete gene annotations, we can use a gene-aware method of lifting gene annotations from one genome to another, liftoff, Shumate and Salzberg, 2021. For the source of "high quality" gene annotation, we use the NCBI Refseq annotations from the hg38/GRCh38 and mm10/GRCm38 annotations downloaded from the UCSC Genome browser.

For single cell RNA-seq, He, Kleyman et al. 2021 Current Biology (https://pubmed.ncbi.nlm.nih.gov/34727523/) found that using a regular liftOver of the human NCBI Refseq to rheMac10 was able to recover higher number of UMI counts to genes. This is likely due to incomplete annotations in either rheMac8 or rheMac10 genomes for the 3' UTRs that are usually targeted by common single cell/nucleus RNA-seq technologies. This allow more reads that would otherwise be found "outside" a gene because of incomplete 3' UTRs in a target species to be appropriately attributed to that gene using the orthologs of that gene from a more complete annotation in a related species. Furthermore, the complex splicing is better measured in humans, so more "intergenic" annotations by the rheMac10 annotations became "intronic" and better able to be mapped to a liftOvered annotation from human. For this reason, we create alternate annotations for the rhesus macaque, marmoset, and rat genomes borrowing orthology as identified with the newer liftoff method from more complete human or mouse annotations.

Similarly, for single cell ATAC-seq seq, a more complete map of genes and transcription start sites (TSS) enable aggregate metrics like a "gene score" to better calculate gene-based measures to perform co-clustering with single cell RNA-seq dataset. A more complete annotation would be able to accurately discern single cell open chromatin regions and not falsely report exonic regions or alternate promoters that were missed from primary transcriptomic data in monkey, marmoset, or rat but can be bioinformatically inferred.

Lastly, work by the ENCODE Consortium has found with the large human and mouse epigenomic data that certain regions of the genome in these species have artifactual signals and need to be excluded from epigenomic analsyes, Amemiya et al., 2021. These regions were pulled from and human and mouse from here and used the liftOver to map to the target genomes below, for simplicity.

list of resources by file name Surprisingly, all these files are small enough to put on github for a couple custom genomes. Below are the organizations - *.gtf.gz and *.gff3.gz: the gzipped annotation from the higher quality annotations to the target genome using liftoff - *liftOver*blacklist.v2.bed: the ENCODE regions to exclude from epigenomic analyses mapped to the target genome using liftOver - *ArchRGenome.R: the Rscript used to make the custom ArchR annotations - *ArchR_annotations.rda: the R Data object that contains the geneAnnotation and objects to use with ArchR::createArrowFiles()

list of species/genomes/source files For most of these files, the genome fasta sequences were grabbed from the UCSC Genome Browser at https://hgdownload.soe.ucsc.edu/goldenPath/${GENOME_VERSION}/, where ${GENOME_VERSION} is any of the version below except mCalJac1. Some of these genomes were updated from the Vertebrate Genome Project, which seeks to create complete rather than draft genome assemblies of all mammals on the planet, Rhie et al. 2021. These genomes have VGP and that naming version if there's an alternate naming scheme. The VGP is pretty cool and they make good genome assemblies.

rn6: rat genome v6, BCM-Baylor version

rn7: rat genome also called VGP mRatBN7.2

rheMac8: rhesus macaque v8

rheMac10: rhesus macaque v10

mCalJac1: marmoset VGP genome, fasta from the maternal assembly here

Facebook

Twitter

Click to copy link

Link copied

Cite

Sagnik Yarlagadda; Sagnik Yarlagadda; Todd D Giorgio; Todd D Giorgio (2023). Example ScRNAseq Dataset 2 for Learning Web-based Tools [Dataset]. http://doi.org/10.5281/zenodo.8084706

Example ScRNAseq Dataset 2 for Learning Web-based Tools

Explore at:

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.8084706

Dataset updated

Jun 29, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Sagnik Yarlagadda; Sagnik Yarlagadda; Todd D Giorgio; Todd D Giorgio

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is one of the three example ScRNAseq datasets used to follow the guided example analyses within "A Guide to Single-Cell RNA Sequencing Analysis Using Web-based Tools for Non-Bioinformaticians" in the FEBS Journal. This dataset can be downloaded and imported into a variety of web-based tools and used as a learning device to gain more familiarity with the tools. As described in the paper, this dataset represents the negative control (carrier only).

Clear search

Close search

Google apps

Main menu

Example ScRNAseq Dataset 2 for Learning Web-based Tools

Raw and processed (filtered and annotated) scRNAseq data

Data from: Large-scale integration of single-cell transcriptomic data...

Single-cell RNA sequencing data on primary samples from: Aberrant expression...

Example data for our Single-cell RNA sequencing (scRNA-seq) Differential...

Data from: Reference transcriptomics of porcine peripheral immune cells...

Data from: Single cell RNA-seq analysis reveals that prenatal arsenic...

Data from: COVID-19 severity correlates with airway epithelium-immune cell...

Sample scRNA-seq data

Single cell RNAseq of PBMC from bladder cancer patients

Single-cell RNA-Seq of human primary lung and bronchial epithelium cells

Single cell RNA-sequencing of treatment naïve PDAC patient samples

Single Cell Smart-Seq 3 RNA-Seq and Bulk Exome Seq from Breast Cancer...

Sample scRNA-seq Data for Cell Type Annotation

single-cell RNA-Seq samples of CRC patients

Bulk RNA-seq dataset of human embryonic stem cells (hESCs) differentiation

Example ScRNAseq Dataset 1 for Learning Web-based Tools

Table_1_High-Order Correlation Integration for Single-Cell or Bulk RNA-seq...

Data Repository: Single-cell mapper (scMappR): using scRNA-seq to infer...

Alternate gene annotations for rat, macaque, and marmoset for single cell...

Example ScRNAseq Dataset 2 for Learning Web-based ToolsSee More Versions

Example ScRNAseq Dataset 2 for Learning Web-based Tools