CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
"*.csv" files contain the single cell gene expression values (log2(tpm+1)) for all genes in each cell from melanoma and squamous cell carcinoma of head and neck (HNSCC) tumors. The cell type and origin of tumor for each cell is also included in "*.csv" files.The "MalignantCellSubtypes.xlsx" defines the tumor subtype."CCLE_RNAseq_rsem_genes_tpm_20180929.zip" is downloaded from CCLE database.
This online resource provides supplementary items used to analyze data in the work, "Single-cell RNA sequencing characterization of Holstein cattle blood and milk immune cells during a chronic Staphylococcus aureus mastitis infection", by Wiarda et al. Cells were collected from milk and blood of three cattle with chronic mastitis infections caused by experimental Staphylococcus aureus challenge. Isolated cells were processed for single-cell RNA sequencing, resulting in a dataset of 35,338 cells distributed across 62 cells clusters. Cell clusters were classified as granulocytes, monocyte/macrophage/conventional dendritic cells, B cells/antibody-secreting cells, T cells/innate lymphoid cells, plasmacytoid dendritic cells, and non-immune cells. A data subset consisting of 30 granulocyte clusters was also created. Data objects of total cell and granulocyte datasets are included here (.h5seurat files), as well as results of pairwise differential gene expression of all cell clusters (resulting in over 4.3 million differentially expressed genes), and a data object containing cell neighborhoods used for differential abundance testing.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single cell RNA-seq data generated and reported as part of the manuscript entitled "Dissecting the mechanisms underlying the Cytokine Release Syndrome (CRS) mediated by T Cell Bispecific Antibodies" by Leclercq-Cohen et al 2023. Raw and processed (filtered and annotated) data are provided as AnnData objects which can be directly ingested to reproduce the findings of the paper or for ab initio data reuse: 1- raw.zip provides concatenated raw/unfiltered counts for the 20 samples in the standard Market Exchange Format (MEX) format. 2- 230330_sw_besca2_LowFil_raw.h5ad contains filtered cells and raw counts in the HDF5 format. 3- 221124_sw_besca2_LowFil.annotated.h5ad contains filtered cells and log normalized counts, along with cell type annotation in the HDF5 format.
scRNAseq data generation: Whole blood from 4 donors was treated with 0.2 μg/mL CD20-TCB, or incubated in the absence of CD20- TCB. At baseline (before addition of TCB) and assay endpoints (2, 4, 6, and 20 hrs), blood was collected for total leukocyte isolation using EasySepTM red blood cell depletion reagent (Stemcell). Briefly, cells were counted and processed for single cell RNA sequencing using the BD Rhapsody platform. To load several samples on a single BD Rhapsody cartridge, sample cells were labelled with sample tags (BD Human Single-Cell Multiplexing Kit) following the manufacturer’s protocol prior to pooling. Briefly, 1x106 cells from each sample were re-suspended in 180 μL FBS Stain Buffer (BD, PharMingen) and sample tags were added to the respective samples and incubated for 20 min at RT. After incubation, 2 successive washes were performed by addition of 2 mL stain buffer and centrifugation for 5 min at 300 g. Cells were then re- suspended in 620 μL cold BD Sample Buffer, stained with 3.1 μL of both 2 mM Calcein AM (Thermo Fisher Scientific) and 0.3 mM Draq7 (BD Biosciences) and finally counted on the BD Rhapsody scanner. Samples were then diluted and/or pooled equally in 650 μL cold BD Sample Buffer. The BD Rhapsody cartridges were then loaded with up to 40 000 – 50 000 cells. Single cells were isolated using Single-Cell Capture and cDNA Synthesis with the BD Rhapsody Express Single-Cell Analysis System according to the manufacturer’s recommendations (BD Biosciences). cDNA libraries were prepared using the Whole Transcriptome Analysis Amplification Kit following the BD Rhapsody System mRNA Whole Transcriptome Analysis (WTA) and Sample Tag Library Preparation Protocol (BD Biosciences). Indexed WTA and sample tags libraries were quantified and quality controlled on the Qubit Fluorometer using the Qubit dsDNA HS Assay, and on the Agilent 2100 Bioanalyzer system using the Agilent High Sensitivity DNA Kit. Sequencing was performed on a Novaseq 6000 (Illumina) in paired-end mode (64-8- 58) with Novaseq6000 S2 v1 or Novaseq6000 SP v1.5 reagents kits (100 cycles). scRNAseq data analysis: Sequencing data was processed using the BD Rhapsody Analysis pipeline (v 1.0 https://www.bd.com/documents/guides/user-guides/GMX_BD-Rhapsody-genomics- informatics_UG_EN.pdf) on the Seven Bridges Genomics platform. Briefly, read pairs with low sequencing quality were first removed and the cell label and UMI identified for further quality check and filtering. Valid reads were then mapped to the human reference genome (GRCh38-PhiX-gencodev29) using the aligner Bowtie2 v2.2.9, and reads with the same cell label, same UMI sequence and same gene were collapsed into a single raw molecule while undergoing further error correction and quality checks. Cell labels were filtered with a multi-step algorithm to distinguish those associated with putative cells from those associated with noise. After determining the putative cells, each cell was assigned to the sample of origin through the sample tag (only for cartridges with multiplex loading). Finally, the single-cell gene expression matrices were generated and a metrics summary was provided. After pre-processing with BD’s pipeline, the count matrices and metadata of each sample were aggregated into a single adata object and loaded into the besca v2.3 pipeline for the single cell RNA sequencing analysis (43). First, we filtered low quality cells with less than 200 genes, less than 500 counts or more than 30% of mitochondrial reads. This permissive filtering was used in order to preserve the neutrophils. We further excluded potential multiplets (cells with more than 5,000 genes or 20,000 counts), and genes expressed in less than 30 cells. Normalization, log-transformed UMI counts per 10,000 reads [log(CP10K+1)], was applied before downstream analysis. After normalization, technical variance was removed by regressing out the effects of total UMI counts and percentage of mitochondrial reads, and gene expression was scaled. The 2,507 most variable genes (having a minimum mean expression of 0.0125, a maximum mean expression of 3 and a minimum dispersion of 0.5) were used for principal component analysis. Finally, the first 50 PCs were used as input for calculating the 10 nearest neighbours and the neighbourhood graph was then embedded into the two-dimensional space using the UMAP algorithm at a resolution of 2. Cell type annotation was performed using the Sig-annot semi-automated besca module, which is a signature- based hierarchical cell annotation method. The used signatures, configuration and nomenclature files can be found at https://github.com/bedapub/besca/tree/master/besca/datasets. For more details, please refer to the publication.
This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows: matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz) *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include: nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().
Table of Contents
Main Description File Descriptions Linked Files Installation and Instructions
This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R
was used to generate the figures from the single-cell RNA sequencing data.
The following libraries are required for script execution:
Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap
The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.
This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:
Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).
Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment.
Description: This submission contains the raw sequencing or .fastq.gz
files, which are tab delimited text files.
Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).
Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)
Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.
The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:
Ensure you have R version 4.1.2 or higher for compatibility.
Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.
marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt
You can use the following code to set the working directory in R:
setwd(directory)
This is the GitHub repository for the single cell RNA sequencing data analysis for the human manuscript. The following essential libraries are required for script execution: Seurat scReportoire ggplot2 dplyr ggridges ggrepel ComplexHeatmap Linked File: -------------------------------------- This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. Provided below are descriptions of the linked datasets: 1. Gene Expression Omnibus (GEO) ID: GSE229626 - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the matrix.mtx
, barcodes.tsv
, and genes.tsv
files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token"(https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). 2. Sequence read archive (SRA) repository - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the "raw sequencing" or .fastq.gz
files, which are tab delimited text files. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token" (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). Please note that since the GSE submission is private, the raw data deposited at SRA may not be accessible until the embargo on GSE229626 has been lifted. Installation and Instructions -------------------------------------- The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation: > Ensure you have R version 4.1.2 or higher for compatibility. > Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code. The following code can be used to set working directory in R: > setwd(directory) Steps: 1. Download the "Human_code_April2023.R" and "Install_Packages.R" R scripts, and the processed data from GSE229626. 2. Open "R-Studios"(https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R. 3. Set your working directory to where the following files are located: - Human_code_April2023.R - Install_Packages.R 4. Open the file titled Install_Packages.R
and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies. 5. Open the Human_code_April2023.R
R script and execute commands as necessary.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Scripts and data for the paper: Consequences and opportunities arising due to sparser single-cell RNA-seq datasets
With the number of cells measured in single-cell RNA sequencing (scRNA-seq) datasets increasing exponentially and concurrent increased sparsity due to more zero counts being measured for many genes, we demonstrate here that downstream analyses on binary-based gene expression give similar results as count-based analyses. Moreover, a binary representation scales up to ~ 50-fold more cells that can be analyzed using the same computational resources. We also highlight the possibilities provided by binarized scRNA-seq data. Development of specialized tools for bit-aware implementations of downstream analytical tasks will enable a more fine-grained resolution of biological heterogeneity.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data repository for the scMappR manuscript:
Abstract from biorXiv (https://www.biorxiv.org/content/10.1101/2020.08.24.265298v1.full).
RNA sequencing (RNA-seq) is widely used to identify differentially expressed genes (DEGs) and reveal biological mechanisms underlying complex biological processes. RNA-seq is often performed on heterogeneous samples and the resulting DEGs do not necessarily indicate the cell types where the differential expression occurred. While single-cell RNA-seq (scRNA-seq) methods solve this problem, technical and cost constraints currently limit its widespread use. Here we present single cell Mapper (scMappR), a method that assigns cell-type specificity scores to DEGs obtained from bulk RNA-seq by integrating cell-type expression data generated by scRNA-seq and existing deconvolution methods. After benchmarking scMappR using RNA-seq data obtained from sorted blood cells, we asked if scMappR could reveal known cell-type specific changes that occur during kidney regeneration. We found that scMappR appropriately assigned DEGs to cell-types involved in kidney regeneration, including a relatively small proportion of immune cells. While scMappR can work with any user supplied scRNA-seq data, we curated scRNA-seq expression matrices for ∼100 human and mouse tissues to facilitate its use with bulk RNA-seq data alone. Overall, scMappR is a user-friendly R package that complements traditional differential expression analysis available at CRAN.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset includes single-cell RNA sequencing (scRNA-seq) data from primary AML (acute myeloid leukemia) samples. Libraries were produced using the 10X Genomics Chromium Single Cell 3ʹ Reagent Kits v3 and sequenced on an Illumina Novaseq 6000 system (Illumina). The dataset is available as raw sequencing reads (fastq; restricted access) or as an annotated matrix of scRNA count data (h5ad).
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair, and do not inform the spatial context that is important for myogenic differentiation. Here, we demonstrate how large-scale integration of single-cell and spatial transcriptomic data can overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq) RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans a wide range of ages, injury, and repair conditions. Together, these data enabled identification of the predominant cell types in skeletal muscle, and resolved cell subtypes, including endothelial subtypes distinguished by vessel-type of origin, fibro/adipogenic progenitors defined by functional roles, and many distinct immune populations. The representation of different experimental conditions and the depth of transcriptome coverage enabled robust profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of myogenesis, from stem cell quiescence to myofiber maturation and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual datasets. We performed spatial RNA sequencing of mouse muscle at three time points after injury and used the integrated dataset as a reference to achieve a high-resolution, local deconvolution of cell subtypes. We also used the integrated dataset to explore ligand-receptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury response. We provide a public web tool to enable interactive exploration and visualization of the data. Our work supports the utility of large-scale integration of single-cell transcriptomic data as a tool for biological discovery.
Methods Mice. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines. Adult C57BL/6J mice (mus musculus) were obtained from Jackson Laboratories (#000664; Bar Harbor, ME) and were used at 4-7 months of age. Aged C57BL/6J mice were obtained from the National Institute of Aging (NIA) Rodent Aging Colony and were used at 20 months of age. For new scRNAseq experiments, female mice were used in each experiment.
Mouse injuries and single-cell isolation. To induce muscle injury, both tibialis anterior (TA) muscles of old (20 months) C57BL/6J mice were injected with 10 µl of notexin (10 µg/ml; Latoxan; France). At 0, 1, 2, 3.5, 5, or 7 days post-injury (dpi), mice were sacrificed and TA muscles were collected and processed independently to generate single-cell suspensions. Muscles were digested with 8 mg/ml Collagenase D (Roche; Switzerland) and 10 U/ml Dispase II (Roche; Switzerland), followed by manual dissociation to generate cell suspensions. Cell suspensions were sequentially filtered through 100 and 40 μm filters (Corning Cellgro #431752 and #431750) to remove debris. Erythrocytes were removed through incubation in erythrocyte lysis buffer (IBI Scientific #89135-030).
Single-cell RNA-sequencing library preparation. After digestion, single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. Cells were counted manually with a hemocytometer to determine their concentration. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, PN-1000075; Pleasanton, CA) following the manufacturer’s protocol. Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes. After preparation, libraries were sequenced using on a NextSeq 500 (Illumina; San Diego, CA) using 75 cycle high output kits (Index 1 = 8, Read 1 = 26, and Read 2 = 58). Details on estimated sequencing saturation and the number of reads per sample are shown in Sup. Data 1.
Spatial RNA sequencing library preparation. Tibialis anterior muscles of adult (5 mo) C57BL6/J mice were injected with 10µl notexin (10 µg/ml) at 2, 5, and 7 days prior to collection. Upon collection, tibialis anterior muscles were isolated, embedded in OCT, and frozen fresh in liquid nitrogen. Spatially tagged cDNA libraries were built using the Visium Spatial Gene Expression 3’ Library Construction v1 Kit (10x Genomics, PN-1000187; Pleasanton, CA) (Fig. S7). Optimal tissue permeabilization time for 10 µm thick sections was found to be 15 minutes using the 10x Genomics Visium Tissue Optimization Kit (PN-1000193). H&E stained tissue sections were imaged using Zeiss PALM MicroBeam laser capture microdissection system and the images were stitched and processed using Fiji ImageJ software. cDNA libraries were sequenced on an Illumina NextSeq 500 using 150 cycle high output kits (Read 1=28bp, Read 2=120bp, Index 1=10bp, and Index 2=10bp). Frames around the capture area on the Visium slide were aligned manually and spots covering the tissue were selected using Loop Browser v4.0.0 software (10x Genomics). Sequencing data was then aligned to the mouse reference genome (mm10) using the spaceranger v1.0.0 pipeline to generate a feature-by-spot-barcode expression matrix (10x Genomics).
Download and alignment of single-cell RNA sequencing data. For all samples available via SRA, parallel-fastq-dump (github.com/rvalieris/parallel-fastq-dump) was used to download raw .fastq files. Samples which were only available as .bam files were converted to .fastq format using bamtofastq from 10x Genomics (github.com/10XGenomics/bamtofastq). Raw reads were aligned to the mm10 reference using cellranger (v3.1.0).
Preprocessing and batch correction of single-cell RNA sequencing datasets. First, ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX). Samples were then preprocessed using the standard Seurat (v3.2.1) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat). Cells with fewer than 750 features, fewer than 1000 transcripts, or more than 30% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0) was used to identify putative doublets in each dataset, individually. BCmvn optimization was used for PK parameterization. Estimated doublet rates were computed by fitting the total number of cells after quality filtering to a linear regression of the expected doublet rates published in the 10x Chromium handbook. Estimated homotypic doublet rates were also accounted for using the modelHomotypic function. The default PN value (0.25) was used. Putative doublets were then removed from each individual dataset. After preprocessing and quality filtering, we merged the datasets and performed batch-correction with three tools, independently- Harmony (github.com/immunogenomics/harmony) (v1.0), Scanorama (github.com/brianhie/scanorama) (v1.3), and BBKNN (github.com/Teichlab/bbknn) (v1.3.12). We then used Seurat to process the integrated data. After initial integration, we removed the noisy cluster and re-integrated the data using each of the three batch-correction tools.
Cell type annotation. Cell types were determined for each integration method independently. For Harmony and Scanorama, dimensions accounting for 95% of the total variance were used to generate SNN graphs (Seurat::FindNeighbors). Louvain clustering was then performed on the output graphs (including the corrected graph output by BBKNN) using Seurat::FindClusters. A clustering resolution of 1.2 was used for Harmony (25 initial clusters), BBKNN (28 initial clusters), and Scanorama (38 initial clusters). Cell types were determined based on expression of canonical genes (Fig. S3). Clusters which had similar canonical marker gene expression patterns were merged.
Pseudotime workflow. Cells were subset based on the consensus cell types between all three integration methods. Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.4) (github.com/KrishnaswamyLab/phateR).
Deconvolution of spatial RNA sequencing spots. Spot deconvolution was performed using the deconvolution module in BayesPrism (previously known as “Tumor microEnvironment Deconvolution”, TED, v1.0; github.com/Danko-Lab/TED). First, myogenic cells were re-labeled, according to binning along the first PHATE dimension, as “Quiescent MuSCs” (bins 4-5), “Activated MuSCs” (bins 6-7), “Committed Myoblasts” (bins 8-10), and “Fusing Myoctes” (bins 11-18). Culture-associated muscle stem cells were ignored and myonuclei labels were retained as “Myonuclei (Type IIb)” and “Myonuclei (Type IIx)”. Next, highly and differentially expressed genes across the 25 groups of cells were identified with differential gene expression analysis using Seurat (FindAllMarkers, using Wilcoxon Rank Sum Test; results in Sup. Data 2). The resulting genes were filtered based on average log2-fold change (avg_logFC > 1) and the percentage of cells within the cluster which express each gene (pct.expressed > 0.5), yielding 1,069 genes. Mitochondrial and ribosomal protein genes were also removed from this list, in line with recommendations in the BayesPrism vignette. For each of the cell types, mean raw counts were calculated across the 1,069 genes to generate a gene expression profile for BayesPrism. Raw counts for each spot were then passed to the run.Ted function, using
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We developed a single-cell transcriptomics pipeline for high-throughput pharmacotranscriptomic screening. We explored the transcriptional landscape of three HGSOC models (JHOS2, a representative cell line; PDC2 and PDC3, two patient-derived samples) after treating their cells for 24 hours with 45 drugs representing 13 distinct classes of mechanism of action. Our work establishes a new precision oncology framework for the study of molecular mechanisms activated by a broad array of drug responses in cancer. . ├── 3D UMAPs/ → Interactive 3D UMAPs of cells treated with the 45 drugs used for multiplexed scRNA-seq. Related to Figure 4. Coordinates: x = UMAP 1; y = UMAP 2; z = UMAP 3. Legend: green = PDC1; blue = PDC2; red = JHOS2. │ ├── DMSO_3D_UMAP_Dini.et.al.html → 3D UMAP of untreated cells. │ └── drug_3D_UMAP_Dini.et.al.html → 3D UMAP of cells treated with (drug). ├── QC_plots/ → Diagnostic plots. Related to Figures 2–4. │ ├── model_QC_violin_plot_2023.pdf → Violin plots of the QC metrics used to filter the data. │ ├── model_col_HTO or model_row_HTO before and after filt → Heatmaps of the row or column HTO expression in each cell. │ └── model_counts_histogram_2023.pdf → Histogram of the distribution of the total counts per cell after filtering for high-quality cells. ├── scRNAseq/ → scRNA-seq data. Related to Figures 2–4. │ ├── AllData_subsampled_DGE_edgeR.csv.gz → Differential gene expression analyses results between treated and untreated cells via pseudobulk of aggregate subsamples, for each of the three models. Related to Figure 3. │ └── All_vs_all_RNAclusters_DEG_signif.txt → Differential gene expression analysis results (p.adj < 0.05) of FindAllMarkers for the Leiden/RNA clusters. ├── PDCs.transcript.counts.tsv → Bulk RNA-seq count data for PDCs 1–3 processed by Kallisto. Related to Figure S6. └── PDCs.transcript.TPM.tsv → Bulk RNA-seq TPM data for PDCs 1–3 processed by Kallisto. Related to Figure S6.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of tumor microenvironment scRNA-seq datasets included in TMExplorer.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This page includes the data and code necessary to reproduce the results of the following paper: Yang Liao, Dinesh Raghu, Bhupinder Pal, Lisa Mielke and Wei Shi. cellCounts: fast and accurate quantification of 10x Chromium single-cell RNA sequencing data. Under review. A Linux computer running an operating system of CentOS 7 (or later) or Ubuntu 20.04 (or later) is recommended for running this analysis. The computer should have >2 TB of disk space and >64 GB of RAM. The following software packages need to be installed before running the analysis. Software executables generated after installation should be included in the $PATH environment variable.
R (v4.0.0 or newer) https://www.r-project.org/ Rsubread (v2.12.2 or newer) http://bioconductor.org/packages/3.16/bioc/html/Rsubread.html CellRanger (v6.0.1) https://support.10xgenomics.com/single-cell-gene-expression/software/overview/welcome STARsolo (v2.7.10a) https://github.com/alexdobin/STAR sra-tools (v2.10.0 or newer) https://github.com/ncbi/sra-tools Seurat (v3.0.0 or newer) https://satijalab.org/seurat/ edgeR (v3.30.0 or newer) https://bioconductor.org/packages/edgeR/ limma (v3.44.0 or newer) https://bioconductor.org/packages/limma/ mltools (v0.3.5 or newer) https://cran.r-project.org/web/packages/mltools/index.html
Reference packages generated by 10x Genomics are also required for this analysis and they can be downloaded from the following link (2020-A version for individual human and mouse reference packages should be selected): https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest After all these are done, you can simply run the shell script ‘test-all-new.bash’ to perform all the analyses carried out in the paper. This script will automatically download the mixture scRNA-seq data from the SRA database, and it will output a text file called ‘test-all.log’ that contains all the screen outputs and speed/accuracy results of CellRanger, STARsolo and cellCounts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Lung endothelial cells from three published scRNA-seq datasets (GSE122960, GSE149878, GSE171668) of healthy subjects and COVID-19 patients were collected for further integrative analyses. The endothelial cells were classified into three sub-groups according to their distinguished expression of IL7R, DKK2, and EDNRB. For differential analysis of gene expression, counts per million of aggregated UMIs in each group were adopted in Wilcoxon rank-sum test.
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
This repository contains metadata and single-cell data used to generate figures in the manuscript entitled: "Post-infusion Treg-like CAR T cells identify patients resistant to CD19-CAR therapy". Included here: CSV files containing patient cohort metadata, summary statistics and quantitative PCR results; FCS files for flow and mass cytometry data; processed Seurat object for single-cell sequencing data. Raw single-cell sequencing data, cellranger alignment results, and metadata are available through the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo; GEO accession number: GSE168940). With questions, please reach out to Zinaida Good (zinaida@stanford.edu) or Crystal L. Mackall (cmackall@stanford.edu).
Files pertaining to data analyses performed and presented in the preprint, 'Porcine intestinal innate lymphoid cells and lymphocyte spatial context revealed through single-cell RNA sequencing' by Wiarda et al. 2022 are provided in this dataset. Single cell suspensions enriched for lymphocytes were obtained from ileum of two seven-week-old pigs and subjected to single-cell RNA sequencing (scRNA-seq). Peripheral blood mononuclear cells (PBMCs) were collected and processed for scRNA-seq in parallel. scRNA-seq was performed to provide transcriptomic profiles of lymphocytes in porcine ileum, with 31,983 cells annotated into 26 cell types. Deeper interrogation of data revealed previously undescribed cells in porcine intestine, including SELLhi γδ T cells, group 1 and group 3 innate lymphoid cells (ILCs), and four subsets of B cells. Single-cell transcriptomes in ileum were compared to those in porcine blood, and subsets of activated lymphocytes were detected in ileum but not periphery. Comparison to scRNA-seq human and murine ileum data revealed a general consensus of ileal lymphocytes across species. Lymphocyte spatial context in porcine ileum was conferred through differential tissue dissection prior to scRNA-seq. Antibody-secreting cells, B cells, follicular CD4 αβ T cells, and cycling T/ILCs were enriched in ileum with Peyer’s patches, while non-cycling γδ T, CD8 αβ T, and group 1 ILCs were enriched in ileum without Peyer’s patches. Data files included herein are .h5seurat files of the various cell subsets included in analyses of the manuscript. Files may be used to reconstruct different analyses and perform further data query. Scripts for original data analyses are found at https://github.com/USDA-FSEPRU/scRNAseq_Porcine_Ileum_PBMC. Raw data are available at GEO accession GSE196388. Data are available for online query at https://singlecell.broadinstitute.org/single_cell/study/SCP1921/intestinal-single-cell-atlas-reveals-novel-lymphocytes-in-pigs-with-similarities-to-human-cells. Resources in this dataset:Resource Title: Ileum_AllCells. File Name: Ileum_AllCells.tarResource Description: .h5seurat object of all the cells derived from ileum samples. Untar into .h5seurat file before use.Resource Title: GutBlood_IntegratedILCs. File Name: GutBlood_IntegratedILCs.tarResource Description: .h5seurat object of ILCs derived from both ileum and PBMC samples. Untar into .h5seurat file before use.Resource Title: Ileum_Bonly. File Name: Ileum_Bonly.tarResource Description: .h5seurat object of B cells and antibody-secreting cells derived from ileum samples. Untar into .h5seurat file before use.Resource Title: Ileum_CD4Tonly. File Name: Ileum_CD4Tonly.tarResource Description: .h5seurat object of non-naive CD4 ab T cells derived from ileum samples. Untar into .h5seurat file before use.Resource Title: Ileum_gdCD8Tonly. File Name: Ileum_gdCD8Tonly.tarResource Description: .h5seurat object of gd and CD8 ab T cells derived from ileum samples. Untar into .h5seurat file before use.Resource Title: Ileum_ILConly. File Name: Ileum_ILConly.tarResource Description: .h5seurat object of innate lymphoid cells (ILCs) derived from ileum samples. Untar into .h5seurat file before use.Resource Title: Ileum_MyeloidOnly. File Name: Ileum_MyeloidOnly.tarResource Description: .h5seurat object of myeloid lineage leukocytes derived from ileum samples. Untar into .h5seurat file before use.Resource Title: Ileum_NonImmuneOnly. File Name: Ileum_NonImmuneOnly.tarResource Description: .h5seurat object of non-immune cells derived from ileum samples. Untar into .h5seurat file before use.Resource Title: Ileum_TILConly. File Name: Ileum_TILConly.tarResource Description: .h5seurat object of all T cells and innate lymphoid cells (ILCs) derived from ileum samples. Untar into .h5seurat file before use.Resource Title: PBMC_AllCells. File Name: PBMC_AllCells.tarResource Description: .h5seurat object of all cells derived from PBMC samples. Untar into .h5seurat file before use.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Gene-count tables for FACS sorted cells sequenced with Smart-Seq2 from 20 organs of 7 mice. Cells are grouped by tissue of origin.Includes data for 53,760 cells, 44,879 of which passed a QC cutoff of at least 500 genes and 50,000 reads.Cell annotations using the Cell Ontology [1] controlled vocabulary are in a separate csv.This differs from v1 by renaming "Brain_Neurons" --> "Brain_Non-microglia" to be consistent with the manuscript.Update 2018-09-20: Updated annotations to latest manuscript versionUpdate 2018-02-16: Separated Diaphragm cells from Muscle cells, and Aorta cells from Heart cells.Update 2018-02-20: Aorta and Heart erroneously contained Diaphragm and Muscle data, and have now been corrected.Update 2018-03-09: Renamed tissues for nomenclature standards: "Colon" --> "Large_Intestine" "Muscle" --> "Limb_Muscle" "Mammary" --> "Mammary_Gland" "Brain_Microglia" --> "Brain_Myeloid" "Brain_Non-microglia" --> "Brain_Non-Myeloid"Update 2018-03-22: Renamed subtissues:- tissue: Heart, subtissue: ? --> tissue: Heart, subtissue: Unknown- tissue: Skin, subtissue: NA --> tissue: Skin, subtissue: TelogenUpdate 2018-03-23: Removed row numbers in first column of metadata_FACS.csvUpdate 2018-03-27: Added tissue tSNEs and cluster ids[1] http://purl.obolibrary.org/obo/cl.owl
https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/
Meta data record for processed VDJ data from the publication "Multi-modal single cell sequencing of B cells in primary Sjögren’s Syndrome". Abstract: Primary Sjögren’s syndrome (pSS) is an autoimmune disease characterized by lymphocytic infiltration in the salivary and lacrimal glands, B cell activation, SSA/SSB autoantibodies and an increased risk of B cell lymphoma. By generating sorted B cell single-cell gene expression and BCR libraries from 24 pSS patients stratified by SSA/SSB antibodies and four healthy controls, we defined 16 B cell subtypes. Interferon response genes were upregulated in pSS across all B cell subtypes, with the highest levels in pSS with both SSAB antibodies. The SSAB group showed a higher proportion of naïve B cells and lower proportion of memory B cells compared with controls. Memory B cells from SSAB patients were not class switched and expressed unmutated VDJ sequences. IGHV1-69 repertoire frequencies were higher in pSS patients than controls and 1287 clonotypes were unique for pSS. The present study describes molecular differences which may enable stratification of pSS patients at improved resolution.
Repository content:
10X Genomics 5' VDJ (v1.1) BCR data from B cells from Primary Sjögren's Syndrome (pSS) patients and healthy controls.
Output from cellranger (5.0.1) for all samples where targeted VDJ libraries were successfully created and sequenced (23/24 samples). Files to be used as input to, for instance, the immcantation workflow or the Bioconductor R package scRepertoire. Data is available upon reasonable request.
Processed data files included for each sample:
*_filtered_contig_annotations.csv
*_filtered_contig.fasta
Command used to generate the files:
cellranger vdj
--id="${sample}B"
--sample=${bcrsamples}
--fastqs=${fqdir}
--reference="$CELLRANGER_VDJ_DATA/refdata-cellranger-vdj-GRCh38-alts-ensembl-5.0.0"
--localcores=16
--localmem=112
List of files:
C001_B_filtered_contig.fasta 3.1M
C001_B_filtered_contig_annotations.csv 1.0M
C002_B_filtered_contig.fasta 9.8M
C002_B_filtered_contig_annotations.csv 3.2M
C003_B_filtered_contig.fasta 9.8M
C003_B_filtered_contig_annotations.csv 3.2M
C004_B_filtered_contig.fasta 12M
C004_B_filtered_contig_annotations.csv 3.8M
P001_B_filtered_contig.fasta 7.4M
P001_B_filtered_contig_annotations.csv 2.5M
P002_B_filtered_contig.fasta 9.1M
P002_B_filtered_contig_annotations.csv 3.1M
P003_B_filtered_contig.fasta 3.7M
P003_B_filtered_contig_annotations.csv 1.2M
P004_B_filtered_contig.fasta 11M
P004_B_filtered_contig_annotations.csv 3.6M
P005_B_filtered_contig.fasta 10M
P005_B_filtered_contig_annotations.csv 3.4M
P006_B_filtered_contig.fasta 13M
P006_B_filtered_contig_annotations.csv 4.4M
P007a_B_filtered_contig.fasta 4.9M
P007a_B_filtered_contig_annotations.csv 1.6M
P007b_B_filtered_contig.fasta 8.0M
P007b_B_filtered_contig_annotations.csv 2.7M
P008a_B_filtered_contig.fasta 8.8M
P008a_B_filtered_contig_annotations.csv 2.8M
P008b_B_filtered_contig.fasta 12M
P008b_B_filtered_contig_annotations.csv 3.8M
P009_B_filtered_contig.fasta 7.5M
P009_B_filtered_contig_annotations.csv 2.5M
P010_B_filtered_contig.fasta 15M
P010_B_filtered_contig_annotations.csv 5.0M
P011_B_filtered_contig.fasta 9.7M
P011_B_filtered_contig_annotations.csv 3.3M
P012_B_filtered_contig.fasta 12M
P012_B_filtered_contig_annotations.csv 4.1M
P013_B_filtered_contig.fasta 9.1M
P013_B_filtered_contig_annotations.csv 3.0M
P014_B_filtered_contig.fasta 11M
P014_B_filtered_contig_annotations.csv 3.8M
P015_B_filtered_contig.fasta 11M
P015_B_filtered_contig_annotations.csv 3.7M
P016_B_filtered_contig.fasta 13M
P016_B_filtered_contig_annotations.csv 4.2M
P017_B_filtered_contig.fasta 7.2M
P017_B_filtered_contig_annotations.csv 2.4M
P018_B_filtered_contig.fasta 15M
P018_B_filtered_contig_annotations.csv 4.9M
P019_B_filtered_contig.fasta 20M
P019_B_filtered_contig_annotations.csv 6.5M
P020_B_filtered_contig.fasta 10M
P020_B_filtered_contig_annotations.csv 3.5M
P021_B_filtered_contig.fasta 7.8M
P021_B_filtered_contig_annotations.csv 2.6M
P022_B_filtered_contig.fasta 9.4M
P022_B_filtered_contig_annotations.csv 3.1M
P023_B_filtered_contig.fasta 7.2M
P023_B_filtered_contig_annotations.csv 2.4M
_CHECKSUMS.txt 3.8K
_README.txt 637B
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset accompanies the publication "FedscGen: Privacy-Aware Federated Batch Effect Correction of Single-Cell RNA Sequencing Data" and includes eight single-cell RNA sequencing (scRNA-seq) datasets used to benchmark the FedscGen and scGen methods. The datasets are provided in .h5ad
format and include comprehensive metadata necessary for replication and further analysis.
We analyze various datasets to compare FedscGen against scGen (centralized) in terms of batch correction. For simplicity, we refer to the dataset by abbreviations:
Cell Line (CL):
Human Dendritic Cells (HDC):
Human Pancreas (HP):
Mouse Brain (MB):
Mouse Cell Atlas (MCA):
Mouse Hematopoietic Stem and Progenitor Cells (MHSPC):
Mouse Retina (MR):
PBMC (human Peripheral Blood Mononuclear Cell):
Usage Notes: Each dataset is provided in .h5ad
format, compatible with common single-cell analysis tools such as Scanpy. Detailed metadata is included within each file.
Keywords: Single-cell RNA sequencing, scRNA-seq, Batch effect correction, Privacy-aware, Federated learning, scGen, FedscGen, Clinical multi-center studies, Genomics, Bioinformatics
Contact: For questions or further information, please contact Mohammad Bakhtiari at mohammad.bakhtiari@uni-hamburg.de.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
There is a growing need for integration of “Big Data” into undergraduate biology curricula. Transcriptomics is one venue to examine biology from an informatics perspective. RNA sequencing has largely replaced the use of microarrays for whole genome gene expression studies. Recently, single cell RNA sequencing (scRNAseq) has unmasked population heterogeneity, offering unprecedented views into the inner workings of individual cells. scRNAseq is transforming our understanding of development, cellular identity, cell function, and disease. As a ‘Big Data,’ scRNAseq can be intimidating for students to conceptualize and analyze, yet it plays an increasingly important role in modern biology. To address these challenges, we created an engaging case study that guides students through an exploration of scRNAseq technologies. Students work in groups to explore external resources, manipulate authentic data and experience how single cell RNA transcriptomics can be used for personalized cancer treatment. This five-part case study is intended for upper-level life science majors and graduate students in genetics, bioinformatics, molecular biology, cell biology, biochemistry, biology, and medical genomics courses. The case modules can be completed sequentially, or individual parts can be separately adapted. The first module can also be used as a stand-alone exercise in an introductory biology course. Students need an intermediate mastery of Microsoft Excel but do not need programming skills. Assessment includes both students’ self-assessment of their learning as answers to previous questions are used to progress through the case study and instructor assessment of final answers. This case provides a practical exercise in the use of high-throughput data analysis to explore the molecular basis of cancer at the level of single cells.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
"*.csv" files contain the single cell gene expression values (log2(tpm+1)) for all genes in each cell from melanoma and squamous cell carcinoma of head and neck (HNSCC) tumors. The cell type and origin of tumor for each cell is also included in "*.csv" files.The "MalignantCellSubtypes.xlsx" defines the tumor subtype."CCLE_RNAseq_rsem_genes_tpm_20180929.zip" is downloaded from CCLE database.