Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.
Normalization of RNA-sequencing data is essential for accurate downstream inference, but the assumptions upon which most methods are based do not hold in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of scRNA-seq data. Total 183 single cells (92 H1 cells, 91 H9 cells), sequenced twice, were used to evaluate SCnorm in normalizing single cell RNA-seq experiments. Total 48 bulk H1 samples were used to compare bulk and single cell properties. For single-cell RNA-seq, the identical single-cell indexed and fragmented cDNA were pooled at 96 cells per lane or at 24 cells per lane to test the effects of sequencing depth, resulting in approximately 1 million and 4 million mapped reads per cell in the two pooling groups, respectively.
This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows: matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz) *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include: nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single cell RNA-sequencing dataset of peripheral blood mononuclear cells (pbmc: T, B, NK and monocytes) extracted from two healthy donors.
Cells labeled as C26 come from a 30 years old female and cells labeled as C27 come from a 53 years old male. Cells have been isolated from blood using ficoll. Samples were sequenced using standard 3' v3 chemistry protocols by 10x genomics. Cellranger v4.0.0 was used for the processing, and reads were aligned to the ensembl GRCg38 human genome (GRCg38_r98-ensembl_Sept2019). QC metrics were calculated on the count matrix generated by cellranger (filtered_feature_bc_matrix). Cells with less than 3 genes per cells, less than 500 reads per cell and more than 20% of mithocondrial genes were discarded.
The processing steps was performed with the R package Seurat (https://satijalab.org/seurat/), including sample integration, data normalisation and scaling, dimensional reduction, and clustering. SCTransform method was adopted for the normalisation and scaling steps. The clustered cells were manually annotated using known cell type markers.
Files content:
- raw_dataset.csv: raw gene counts
- normalized_dataset.csv: normalized gene counts (single cell matrix)
- cell_types.csv: cell types identified from annotated cell clusters
- cell_types_macro.csv: cell macro types
- UMAP_coordinates.csv: 2d cell coordinates computed with UMAP algorithm in Seurat
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Enriched GO terms for library size normalization. This file is in a tab-separated format and contains the top 200 GO terms that were enriched in the set of DE genes unique to library size normalization. The fields are the same as described for Additional file 2. (13 KB PDF)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single-cell RNA-seq dataset from sorted IL-10+, TCRb+, CD4+ mouse small intestine lamina propria cells in naive or Giardia intestinalis-infected animals at 7 d.p.i. Data analyses and results are described in Sardinha-Silva et al., Nature Microbiology, 2025: "Giardia intestinalis-induced Type 2 mucosal immunity attenuates bystander intestinal inflammation". Data are Seurat objects in RDS format. Filtered-out potential doublets, low quality cells and dying cells (excluded cells with <800 genes detected, cells with >5000 genes detected and cells with mitochondrial gene expression > 10%). Data normalization, scaling and integration performed using Seurat v 4.4.0.
Full filtered dataset in the "alineGiardia.combined_v4.rds" file. Related R code is found in "giardia_mouse_integration.R".
T cells of interest only in the "T.seurat.rds" file. Related R code is in "TcellSubsets_sc_analysis.R".
Dataset was also mapped to a reference dataset by Kiner et al., Nature Immunology, 2021. The post-mapping data is found in the "refmap_kiner.seurat.rds" file. Related R code is in "referenceMapping_Kineretal2021.R".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Enriched GO terms for deconvolution. This file is in a tab-separated format and contains the top 200 GO terms that were enriched in the set of DE genes unique to deconvolution. The identifier and name of each term is shown along with the total number of genes associated with the term, the number of associated genes that are also DE, the expected number under the null hypothesis, and the Fisher p value. (13 KB PDF)
(1) qPCR Gene Expression Data The THP-1 cell line was sub-cloned and one clone (#5) was selected for its ability to differentiate relatively homogeneously in response to phorbol 12-myristate-13-acetate (PMA) (Sigma). THP-1.5 was used for all subsequent experiments. THP-1.5 cells were cultured in RPMI, 10% FBS, Penicillin/Streptomycin, 10mM HEPES, 1mM Sodium Pyruvate, 50uM 2-Mercaptoethanol. THP-1.5 were treated with 30ng/ml PMA over a time-course of 96h. Total cell lysates were harvested in TRIzol reagent at 1, 2, 4, 6, 12, 24, 48, 72, 96 hours, including an undifferentiated control. Undifferentiated cells were harvested in TRIzol reagent at the beginning of the LPS time-course. One biological replicate was prepared for each time point. Total RNA was purified from TRIzol lysates according to manufacturer’s instructions. Genespecific primer pairs were designed using Primer3 software, with an optimal primer size of 20 bases, amplification size of 140bp, and annealing temperature of 60°C. Primer sequences were designed for 2,396 candidate genes including four potential controls: GAPDH, beta actin (ACTB), beta-2-microglobulin (B2M), phosphoglycerate kinase 1 (PGK1). The RNA samples were reverse transcribed to produce cDNA and then subjected to quantitative PCR using SYBR Green (Molecular Probes) using the ABI Prism 7900HT system (Applied Biosystems, Foster City, CA, USA) with a 384-well amplification plate; genes for each sample were assayed in triplicate. Reactions were carried out in 20μL volumes in 384-well plates; each reaction contained: 0.5 U of HotStar Taq DNA polymerase (Qiagen) and the manufacturer’s 1× amplification buffer adjusted to a final concentration of 1mM MgCl2, 160μM dNTPs, 1/38000 SYBR Green I (Molecular Probes), 7% DMSO, 0.4% ROX Reference Dye (Invitrogen), 300 nM of each primer (forward and reverse), and 2μL of 40-fold diluted first-strand cDNA synthesis reaction mixture (12.5ng total RNA equivalent). Polymerase activation at 95ºC for 15 min was followed by 40 cycles of 15 s at 94ºC, 30 s at 60ºC, and 30 s at 72ºC. The dissociation curve analysis, which evaluates each PCR product to be amplified from single cDNA, was carried out in accordance with the manufacturer’s protocol. Expression levels were reported as Ct values. The large number of genes assayed and the replicates measures required that samples be distributed across multiple amplification plates, with an average of twelve plates per sample. Because it was envisioned that GAPDH would serve as a single-gene normalization control, this gene was included on each plate. All primer pairs were replicated in triplicates. Raw qPCR expression measures were quantified using Applied Biosystems SDS software and reported as Ct values. The Ct value represents the number of cycles or rounds of amplification required for the fluorescence of a gene or primer pair to surpass an arbitrary threshold. The magnitude of the Ct value is inversely proportional to the expression level so that a gene expressed at a high level will have a low Ct value and vice versa. Replicate Ct values were combined by averaging, with additional quality control constraints imposed by a standard filtering method developed by the RIKEN group for the preprocessing of their qPCR data. Briefly this method entails: 1. Sort the triplicate Ct values in ascending order: Ct1, Ct2, Ct3. Calculate differences between consecutive Ct values: difference1 = Ct2 – Ct1 and difference2 = Ct3 – Ct2. 2. Four regions are defined (where Region4 overrides the other regions): Region1: difference ≦ 0.2, Region2: 0.2 < difference ≦ 1.0, Region3: 1.0 < difference, Region4: one of the Ct values in the difference calculation is 40 If difference1 and difference2 fall in the same region, then the three replicate Ct values are averaged to give a final representative measure. If difference1 and difference2 are in different regions, then the two replicate Ct values that are in the small number region are averaged instead. This particular filtering method is specific to the data set we used here and does not represent a part of the normalization procedure itself; Alternate methods of filtering can be applied if appropriate prior to normalization. Moreover while the presentation in this manuscript has used Ct values as an example, any measure of transcript abundance, including those corrected for primer efficiency can be used as input to our data-driven methods. (2) Quantile Normalization Algorithm Quantile normalization proceeds in two stages. First, if samples are distributed across multiple plates, normalization is applied to all of the genes assayed for each sample to remove plate-to-plate effects by enforcing the same quantile distribution on each plate. Then, an overall quantile normalization is applied between samples, assuring that each sample has the same distribution of expression values as all of the other samples to be compared. A similar approach using quantile ormalization has been previously described in the context of microarray normalization. Briefly, our method entails the following steps: i) qPCR data from a single RNA sample are stored in a matrix M of dimension k (maximum number of genes or primer pairs on a plate) rows by p (number of plates) columns. Plates with differing numbers of genes are made equivalent by padded plates with missing values to constrain M to a rectangular structure. ii) Each column is sorted into ascending order and stored in matrix M’. The sorted columns correspond to the quantile distribution of each plate. The missing values are placed at the end of each ordered column. All calculations in quantile normalization are performed on non-missing values. iii) The average quantile distribution is calculated by taking the average of each row in M’. Each column in M’ is replaced by this average quantile distribution and rearranged to have the same ordering as the original row order in M. This gives the within-sample normalized data from one RNA sample. iv) Steps analogous to 1 – 3 are repeated for each sample. Between-sample normalization is performed by storing the within-normalized data as a new matrix N of dimension k (total number of genes, in our example k = 2,396) rows by n (number of samples) columns. Steps 2 and 3 are then applied to this matrix. (3) Rank-Invariant Set Normalization Algorithm We describe an extension of this method for use on qPCR data with any number of experimental conditions or samples in which we identify a set of stably-expressed genes from within the measured expression data and then use these to adjust expression between samples. Briefly, i) qPCR data from all samples are stored in matrix R of dimension g (total number of genes or primer pairs used for all plates) rows by s (total number of samples). ii) We first select gene sets that are rank-invariant across a single sample compared to a common reference. The reference may be chosen in a variety of ways, depending on the experimental design and aims of the experiment. As described in Tseng et al., the reference may be designated as a particular sample from the experiment (e.g. time zero in a time course experiment), the average or median of all samples, or selecting the sample which is closest to the average or median of all samples. Genes are considered to be rank-invariant if they retain their ordering or rank with respect to expression across the experimental sample versus the common reference sample. We collect sets of rank-invariant genes for all of the s pairwise comparisons, relative to a common reference. We take the intersection of all s sets to obtain the final set of rank-invariant genes that is used for normalization. iii) Let αj represent the average expression value of the rank-invariant genes in sample j. (α1, …, αs) then represents the vector of rank-invariant average expression values for all conditions 1 to s iv) We calculate the scale f The THP-1 cell line was sub-cloned and one clone (#5) was selected for its ability to differentiate relatively homogeneously in response to phorbol 12-myristate-13-acetate (PMA) (Sigma). THP-1.5 was used for all subsequent experiments. THP-1.5 cells were cultured in RPMI, 10% FBS, Penicillin/Streptomycin, 10mM HEPES, 1mM Sodium Pyruvate, 50uM 2-Mercaptoethanol. THP-1.5 were treated with 30ng/ml PMA over a time-course of 96h. Total cell lysates were harvested in TRIzol reagent at 1, 2, 4, 6, 12, 24, 48, 72, 96 hours, including an undifferentiated control. Total RNA was purifed from TRIzol lysates according to manufacturer’s instructions. The RNA samples were reverse transcribed to produce cDNA and then subjected to quantitative PCR using SYBR Green (Molecular Probes) using the ABI Prism 7900HT system (Applied Biosystems, Foster City, CA,USA) with a 384-well amplification plate; genes for each sample were assayed in triplicate.
We performed CODEX (co-detection by indexing) multiplexed imaging on 64 sections of the human intestine (~16 mm2) from 8 donors (B004, B005, B006, B008, B009, B010, B011, and B012) using a panel of 57 oligonucleotide-barcoded antibodies. Subsequently, images underwent standard CODEX image processing (tile stitching, drift compensation, cycle concatenation, background subtraction, deconvolution, and determination of best focal plane), single cell segmentation, and column marker z-normalization by tissue. The outputs of this process were data frames of 2.6 million cells with 57 antibody fluorescence values quantified from each marker. Each cell has its cell type, cellular neighborhood, community of neighborhooods, and tissue unit defined with x, y coordinates representing pixel location in the original image. This is from a total of 25 cell types, 20 multicellular neighborhoods, 10 communities of neighborhoods, and 3 tissue segments that could be used to understand the cellular interactio...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single-cell RNA-seq dataset from sorted CD11bInt, F4/80Hi, CD64+ mouse liver cells in naive or Leishmania infantum-infected animals at 42 d.p.i.. Data analyses and results are described in manuscript: "Kupffer cell and recruited macrophage heterogeneity orchestrate granuloma maturation and hepatic immunity in visceral leishmaniasis". Data files are Seurat objects in RDS format. Filtered-out potential doublets, low quality cells and dying cells (excluded cells with <1000 genes detected, cells with >6000 genes detected, cells with mitochondrial gene expression > 10% and cells with <5000 transcript molecules). Data normalization, scaling and integration performed using Seurat.
Filtered dataset containing all KCs and macrophages is in the "pessenda_KC_Macro_seurat" file.
Our data was then mapped to a reference dataset by Remmerie et al. (DOI: 10.1016/j.immuni.2020.08.004) for annotation consistent with the literature. The reference mapped object can be found in the "pessenda_refmap_KC_Macro_seurat" file.
Background & Aims: Pancreatic ductal adenocarcinomas (PDAC) are characterized by fibrosis and an abundance of cancer-associated fibroblasts (CAFs). We investigated strategies to disrupt interactions among CAFs, the immune system, and cancer cells, focusing on adhesion molecule cadherin 11 (CDH11), which has been associated with other fibrotic disorders and is expressed by activated fibroblasts. Methods: We compared levels of CDH11mRNA in human pancreatitis and pancreatic cancer tissues and cells, compared with normal pancreas, and measured levels of CDH11 protein in human and mouse pancreatic lesions and normal tissues. We crossed p48-Cre;LSL-KrasG12D/+;LSL-Trp53R172H/+(KPC) mice with CDH11-knockout mice and measured survival times of offspring. Pancreata were collected and analyzed by histology, immunohistochemistry, and (single-cell) RNA sequencing; RNA and proteins were identified by imaging mass cytometry. Some mice were given injections of PD1 antibody or gemcitabine and survival was monitored. Pancreatic cancer cells from KPC mice were subcutaneously injected into Cdh11+/+ and Cdh11–/– mice and tumor growth was monitored. Pancreatic cancer cells (mT3) from KPC mice (C57BL/6), were subcutaneously injected into Cdh11+/+ (C57BL/6J) mice and mice were given injections of antibody against CDH11, gemcitabine, or small molecule inhibitor of CDH11 (SD133) and tumor growth was monitored. Results: Levels of CDH11mRNA and protein were significantly higher in CAFs than in pancreatic cancer epithelial cells, human or mouse pancreatic cancer cell lines, or immune cells. KPC/Cdh11+/– and KPC/Cdh11–/– mice survived significantly longer than KPC/Cdh11+/+ mice. Markers of stromal activation entirely surrounded pancreatic intraepithelial neoplasias in KPC/Cdh11+/+ mice and incompletely in KPC/Cdh11+/– and KPC/Cdh11–/– mice, whose lesions also contained fewer FOXP3+cells in the tumor center. Compared with pancreatic tumors inKPC/Cdh11+/+ mice, tumors of KPC/Cdh11+/– mice had increased markers of antigen processing and presentation; more lymphocytes and associated cytokines; decreased extracellular matrix components; and reductions in markers and cytokines associated with immunosuppression. Administration of the PD1 antibody did not prolong survival of KPC mice with 0, 1, or 2 alleles of Cdh11. Gemcitabine extended survival only of KPC/Cdh11+/– and KPC/Cdh11–/– mice or reduced subcutaneous tumor growth in mT3 engrafted Cdh11+/+ mice given in combination with the CDH11 antibody. A small molecule inhibitor of CDH11 reduced growth of pre-established mT3 subcutaneous tumors only if T and B cells were present in mice. Conclusions: Knockout or inhibition of CDH11, which is expressed by CAFs in the pancreatic tumor stroma, reduces growth of pancreatic tumors, increases their response to gemcitabine, and significantly extends survival of mice. CDH11 promotes immunosuppression and extracellular matrix deposition, and might be developed as a therapeutic target for pancreatic cancer mT3 tumor was generated by injecting 25,000 mT3 cells (derived from a PDAC of a KPC C57BL/6 mouse) subcutaneously into the back flank of 10-week-old female C57BL/6 mice in a 1:1 suspension of Matrigel (Cat# 354234, Corning) and PBS. At 3 weeks post injection, the tumor was dissected and processed as described before to obtain single cell suspensions. Subsequently, immune cells and blood cells were removed by CD45+ magnetic bead-based depletion (Cat# 130-052-301, Miltenyi Biotech) and ACK lysis buffer (Cat# A1049201, Gibco), respectively, following manufacturer’s guidelines. Remaining cells were prepared for single cell sequencing using Chromium Single Cell 3ʹ GEM, Library & Gel Bead Kit v3 (Cat# 1000075, 10X Genomics) on a 10X Genomics Chromium Controller following manufacturers protocol and sequenced using Illumina NextSeq 500 sequencer. The Cell Ranger Single-Cell Software Suite (10X Genomics) was used to perform sample demultiplexing, barcode processing, and single-cell 3′ gene counting. Sequencing data was aligned to the mouse reference genome (mm10) using “cellranger mkfastq” with default parameters. Unique molecular identifier (UMI) counts were generated using “cellranger count”. Further analysis was performed in R using the Seurat package. Briefly, cells with fewer than 500 detected genes per cell and genes that were expressed by fewer than 5 cells were filtered out. Subsequently, cells with >7800 genes were filtered out to remove noise from droplets containing more than one cell. Dead cells were excluded by retaining cells with <5% mitochondrial reads. The data was subsequently normalized by employing a global-scaling normalization method “LogNormalize” followed by identification of 2,000 most variable genes in the dataset, data scaling and subsequently dimensionality reduction by principal component analysis (PCA) using the 2000 variable genes. Then, a gra...
NGS-Based Rna-Seq Market Size 2024-2028
The NGS-based RNA-seq market size is forecast to increase by USD 6.66 billion, at a CAGR of 20.52% between 2023 and 2028.
The market is witnessing significant growth, driven by the increased adoption of next-generation sequencing (NGS) methods for RNA-Seq analysis. The advanced capabilities of NGS techniques, such as high-throughput, cost-effectiveness, and improved accuracy, have made them the preferred choice for researchers and clinicians in various fields, including genomics, transcriptomics, and personalized medicine. However, the market faces challenges, primarily from the lack of clinical validation on direct-to-consumer genetic tests. As the use of NGS technology in consumer applications expands, ensuring the accuracy and reliability of results becomes crucial.
The absence of standardized protocols and regulatory oversight in this area poses a significant challenge to market growth and trust. Companies seeking to capitalize on market opportunities must focus on addressing these challenges through collaborations, partnerships, and investments in research and development to ensure the clinical validity and reliability of their NGS-based RNA-Seq offerings.
What will be the Size of the NGS-based RNA-Seq market during the forecast period?
Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
Request Free Sample
The market continues to evolve, driven by advancements in NGS technology and its applications across various sectors. Spatial transcriptomics, a novel approach to studying gene expression in its spatial context, is gaining traction in disease research and precision medicine. Splice junction detection, a critical component of RNA-seq data analysis, enhances the accuracy of gene expression profiling and differential gene expression studies. Cloud computing plays a pivotal role in handling the massive amounts of data generated by NGS platforms, enabling real-time data analysis and storage. Enrichment analysis, gene ontology, and pathway analysis facilitate the interpretation of RNA-seq data, while data normalization and quality control ensure the reliability of results.
Precision medicine and personalized therapy are key applications of RNA-seq, with single-cell RNA-seq offering unprecedented insights into the complexities of gene expression at the single-cell level. Read alignment and variant calling are essential steps in RNA-seq data analysis, while bioinformatics pipelines and RNA-seq software streamline the process. NGS technology is revolutionizing drug discovery by enabling the identification of biomarkers and gene fusion detection in various diseases, including cancer and neurological disorders. RNA-seq is also finding applications in infectious diseases, microbiome analysis, environmental monitoring, agricultural genomics, and forensic science. Sequencing costs are decreasing, making RNA-seq more accessible to researchers and clinicians.
The ongoing development of sequencing platforms, library preparation, and sample preparation kits continues to drive innovation in the field. The dynamic nature of the market ensures that it remains a vibrant and evolving field, with ongoing research and development in areas such as data visualization, clinical trials, and sequencing depth.
How is this NGS-based RNA-Seq industry segmented?
The NGS-based RNA-seq industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.
End-user
Acamedic and research centers
Clinical research
Pharma companies
Hospitals
Technology
Sequencing by synthesis
Ion semiconductor sequencing
Single-molecule real-time sequencing
Others
Geography
North America
US
Europe
Germany
UK
APAC
China
Singapore
Rest of World (ROW)
.
By End-user Insights
The acamedic and research centers segment is estimated to witness significant growth during the forecast period.
The global next-generation sequencing (NGS) market for RNA sequencing (RNA-Seq) is primarily driven by academic and research institutions, including those from universities, research institutes, government entities, biotechnology organizations, and pharmaceutical companies. These institutions utilize NGS technology for various research applications, such as whole-genome sequencing, epigenetics, and emerging fields like agrigenomics and animal research, to enhance crop yield and nutritional composition. NGS-based RNA-Seq plays a pivotal role in translational research, with significant investments from both private and public organizations fueling its growth. The technology is instrumental in disease research, enabling the identification
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These R scripts are used for scRNA-seq data integration and pseudo-time analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The study aims to investigate how the cellular origin of lung adenocarcinoma (LUAD), specifically whether it arises from alveolar type I (AT1) or alveolar type II (AT2) cells, influences the tumor immune microenvironment (TIME), immune cell composition, and metastatic potential. The hypothesis is that AT1- and AT2-derived LUADs exhibit distinct immune landscapes and functional pathways, impacting tumor progression and therapeutic response.
Data Description Supplemental File 1: Myeloid_Annotated.RDS (and .zip) Description: Annotated single-nucleus RNA sequencing (snRNA-seq) data focused on myeloid cells from AT1- and AT2-derived LUAD samples. Supplemental File 2: R code for snRNA-seq analyses (R file and .zip) Description: R scripts for preprocessing, clustering, and differential expression analysis of snRNA-seq data. Supplemental File 3: Trajectory analysis (ipynb and .zip) Description: Jupyter notebooks for trajectory inference to trace cell differentiation paths and lineage relationships. Supplemental Files 4-5: CCCObj_in_AT1LUAD.RDS/.zip and CCCObj_in_AT2LUAD.RDS/.zip Description: Cell-cell communication (CCC) analysis objects for AT1- and AT2-derived LUAD, respectively. Supplemental File 6: CCC_analysis via LIANA.Rmd and .zip Description: LIANA analysis scripts for cell-cell communication using snRNA-seq data. Supplemental File 7: STSeq_LUAD_xzcompressed.Rds Description: Spatial transcriptomics (ST) data for LUAD samples, capturing gene expression with spatial context. Supplemental File 8: TIME Visium Analysis (R file and .zip) Description: R scripts for Visium spatial transcriptomics analysis, including data normalization and spatial clustering.
Supplemental Tables Supplemental Table 1: Overall Cell Composition (.pdf and .xlsx) Description: Quantitative breakdown of overall cell populations within LUAD samples. Supplemental Table 2: Myeloid Cell Composition (.pdf and .xlsx) Description: Detailed cell-type composition focusing specifically on myeloid populations. Supplemental Table 3: Myeloid Cell Composition per Mouse ID (.pdf and .xlsx) Description: Myeloid cell counts stratified by individual mouse IDs, providing insights into sample variability. Supplemental Table 4: FDR-Corrected MP DEGs_AT1 vs. AT2 (.pdf) Description: Differentially expressed genes (DEGs) between AT1- and AT2-derived LUAD, corrected for false discovery rate (FDR). Supplemental Table 5: PANTHER Pathways for MP DEGs_AT1 vs. AT2 (.pdf and .xlsx) Description: Pathway analysis results for DEGs, highlighting enriched biological processes and signaling pathways.
Notable Findings and Key Insights AT1-derived LUAD exhibits a more immunoreactive TIME, with increased T cell infiltration and reduced immunosuppressive MDSCs, compared to AT2-derived LUAD. Spatial transcriptomics reveals distinct localization patterns of immune cells, suggesting differential immune cell recruitment based on tumor cell origin.
A Transcriptome Database for Astrocytes, Neurons, and Oligodendrocytes: A New Resource for Understanding Brain Development and Function Understanding the cell-cell interactions that control CNS development and function has long been limited by the lack of methods to cleanly separate astrocytes, neurons, and oligodendrocytes. Here we describe the first method for the isolation and purification of developing and mature astrocytes from mouse forebrain. This method takes advantage of the expression of S100β by astrocytes. We used fluorescent activated cell sorting (FACS) to isolate EGFP positive cells from transgenic mice that express EGFP under the control of an S100β promoter. By depletion of astrocytes and oligodendrocytes we obtained purified populations of neurons, while by panning with oligodendrocyte-specific antibodies we obtained purified populations of oligodendrocytes. Using GeneChip Arrays we then created a transcriptome database of the expression levels of over 20,000 genes by gene profiling these three main CNS neural cell types at postnatal ages day 1 to 30. This database provides the first global characterization of the genes expressed by mammalian astrocytes in vivo and is the first direct comparison between the astrocyte, neuron, and oligodendrocyte transcriptomes. We demonstrate that Aldh1L1, a highly expressed astrocyte gene, is a highly specific antigenic marker for astrocytes with a substantially broader, and therefore potentially more useful, pattern of astrocyte expression than the traditional astrocyte marker GFAP. This transcriptome database of acutely isolated and highly pure populations of astrocytes, neurons and oligodendrocytes provides a resource to the neuroscience community by providing improved cell type specific markers and for better understanding of neural development, function, and disease. We acutely purified mouse astrocytes from early postnatal ages (P1) to later postnatal ages (P30), when astrocyte differentiation is morphologically complete (Bushong et al., 2004), and acutely purified mouse OL-lineage cells from stages ranging from OPCs to newly differentiated OLs to myelinating OLs. We extracted RNA from each of these highly purified, acutely isolated cell types and used GeneChip Arrays to determine the expression levels of over 20,000 genes and construct a comprehensive database of cell type specific gene expression in the mouse forebrain. Analysis of this database confirms cell type specific expression of many well characterized and functionally important genes. In addition, we have identified thousands of new cell type enriched genes, thereby providing important new information about astrocyte, OL, and neuron interactions, metabolism, development, and function. This database provides a comparison of the genome-wide transcriptional profiles of the main CNS cell types and is a resource to the neuroscience community for better understanding the development, physiology, and pathology of the CNS. Keywords: Developmental CNS Cell type comparision FACS purification of astrocytes: Dissociated forebrains from S100β-EGFP mice were resuspended in panning buffer (DBPS containing 0.02% BSA and 12.5 U/ml DNase) and sequentially incubated on the following panning plates: secondary antibody only plate to deplete microglia, O4 plate to deplete OLs, PDGFRα plate to deplete OPCs, and a second O4 plate to deplete any remaining OLs. This procedure was sufficient to deplete all OL-lineage cells from animals P8 and younger, however, in older animals that had begun to myelinate, additional depletion of OLs and myelin debris was accomplished as follows. The nonadherent cells from the last O4 dish were harvested by centrifugation, and the cells were resuspended in panning buffer containing GalC, MOG, and O1 supernatant and incubated for 15 minutes at room temperature. The cell suspension was washed and then resuspended in panning buffer containing 20 μg donkey anti-mouse APC for 15 minutes. The cells were washed and resuspended in panning buffer containing propidium iodide (PI). EGFP+ astrocytes were then purified by fluorescence activated cell sorting (FACS). Dead cells were gated out using high PI staining and forward light scatter. Astrocytes were identified based on high EGFP fluorescence and negative APC fluorescence from indirect immunostaining for OL markers GalC, MOG, and O1. Cells were sorted twice and routinely yielded >99.5% purity based on reanalysis of double sorted cells.; FACS purification of neurons: EGFP- cells were the remaining forebrain cells after microglia, OLs, and astrocytes had been removed, and were primarily composed of neurons, and to a lesser extent, endothelial cells (we estimate < 4% endothelial cells at P7 and < 20% endothelial cells at P17). EGFP- cells from S100β-EGFP dissociated forebrain were FACS purified in parallel with astrocyte purification and were sorted based on their negative EGFP fluorescence immunofluorescence. Cells were sorted twice and routinely yielded >99.9% purity. In independent preparations, the EGFP- cell population was additionally depleted of endothelial cells and pericytes by sequentially labeling with biotin-BSL1 lectin and streptavidin-APC while also labeling for OL markers as described above. Cells were sorted twice and routinely yielded >99.9% purity.; Panning purification of oligodendrocyte lineage cells: Dissociated mouse forebrains were resuspended in panning buffer. In order to deplete microglia, the single-cell suspension was sequentially panned on four BSL1 panning plates. The cell suspension was then sequentially incubated on two PDGFRα plates (to purify and deplete OPCs), one A2B5 plate (to deplete any remaining OPCs), two MOG plates (to purify and deplete myelinating OLs), and one GalC plate (to purify the remaining PDGFRα-, MOG-, OLs). The adherent cells on the first PDGFRα, MOG, and GalC plates were washed to remove all antigen-negative nonadherent cells. The cells were then lysed while still attached to the panning plate with Qiagen RLT lysis buffer, and total RNA was purified. Purified OPCs were >95% NG2 positive and 0% MOG positive. Purified Myelin OLs were 100% MOG positive, >95% MBP positive, and 0% NG2 positive. Purified GalC OLs depleted of OPCs and Myelin OLs were <10% MOG positive and ~50% weakly NG2 positive, a reflection of their recent development as early OLs.; Data normalization and analysis: Raw image files were processed using Affymetrix GCOS and the MAS 5.0 algorithm. Intensity data was normalized per chip to a target intensity TGT value of 500, and expression data and absent/present calls for individual probe sets were determined. Gene expression values were normalized and modeled across arrays using the dChip software package with invariant-set normalization and a PM model. (www.dchip.org, Li and Wong, 2001). The 29 samples were grouped into 9 sample types: Astros P7-P8, Astros P17, Astros P17-gray matter (P17g), Neurons P7, Neurons P17, Neurons-endothelial cell depleted (P7n, P17n), OPCs, GalC-OLs, and MOG-OLs. Gene filtering was performed to select probe sets that were consistently expressed in at least one cell type, where consistently expressed was defined as being called present and having a MAS 5.0 intensity level greater than 200 in at least two-thirds of the samples in the cell type. We identified 20,932 of the 45,037 probe sets that were consistently expressed in at least one of the nine cell types. The Significance Analysis of Microarrays (SAM) method (Tusher et al., 2001) was used to determine genes that were significantly differentially expressed between different cell types (see Supplemental Table S2 for SAM cell type groupings). Clustering was performed using the hclust method with complete linkage in R. Expression values were transformed for clustering by computing a mean expression value for the gene using those samples in the corresponding SAM statistical analysis, and then subtracting the mean from expression intensities. In order to preserve the log2 scale of the data, unless otherwise indicated, no normalization by variance was performed. Plots were created using the gplots package in R. The Bioconductor software package (Gentleman et al., 2004) was used throughout the expression analyses. Functional analyses were performed through the use of Ingenuity Pathways Analysis (Ingenuity® Systems, www.ingenuity.com).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Community package (https://github.com/SoloveyMaria/community) is an R package designed for the analysis of single-cell RNA sequencing data, specifically for inferring interactions between different cell types. The dataset provided here is compatible with the Community tool, allowing for direct utilization. The dataset associated with this research has undergone peer review and has been published in the journal Nature Cancer. The publication can be accessed via the following link: https://doi.org/10.1038/s43018-022-00480-0. For access to the raw data, please visit: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE185381. It's important to note that the data in this repository has undergone batch correction and normalization, and the corresponding metadata has been appropriately adjusted. This processed data serves as the input for the Community tool.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the R Markdown files with the analysis of CyTOF and scRNA-seq data corresponding to cohort 1 (Berlin) analysed in Georg et al. 2021 "Complement activation induces excessive T cell cytotoxicity in severe COVID-19". Additionally, here we include the necessary CyTOF data to reproduce this analysis.
CyTOF data:
The debarcoded fcs files (before batch-correction) can be found in https://flowrepository.org/id/FR-FCM-Z4P5. \
Here you can find the necessary data to reproduce the analysis (cytof_analysis.Rmd, cytof_analysis.html):
data_norm_all.csv: single-cell protein expression data (after batch-normalization and in linear scale).
data_Tcells_annotated.csv: single-cell protein expression of gated T cells with cluster annotation.
phenograph_CD4_k30.csv, phenograph_CD8_k30.csv, phenograph_TCRgd_k30.csv: output from Louvain Clustering computed with PhenoGraph (https://github.com/jacoblevine/PhenoGraph) per T cell compartment.
clusterannotation.csv: annotation for each cluster and metacluster
scRNA-seq data:
The raw data can be found in https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE175450
Other files to reproduce the analysis (scRNAseq_analysis_1preprocessing.Rmd, scRNAseq_analysis_2clustering.Rmd, scRNAseq_analysis_3convalescent.Rmd):
scRNAseq_Sawitzki_RECAST_09_2021.xlsx: Metadata
scRNAseq_genelist_annotation.xlsx: Gene list for the annotation of T cells (Also in Mendeley, see Data and Code Availability).
scRNAseq_GO_RESPONSE_TO_TYPE_I_INTERFERON.txt, scRNAseq_GO_DEFENSE_RESPONSE_TO_VIRUS.txt, , scRNAseq_GO_T_CELL_MEDIATED_CYTOTOXICITY.txt: Gene lists for the signatures “Response to Type I Interferon” , “Defense Response to virus” and “Cytotoxicity” used for GSEA. (Also in Table S2).
scRNAseq_traj18_trav10.txt,scRNAseq_trbv25.txt: sequences to determine the proportion of TRAV10-TRAJ18-TRBV25 pairing T cell clones across all T cell clusters.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.