Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Enriched GO terms for library size normalization. This file is in a tab-separated format and contains the top 200 GO terms that were enriched in the set of DE genes unique to library size normalization. The fields are the same as described for Additional file 2. (13 KB PDF)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Enriched GO terms for deconvolution. This file is in a tab-separated format and contains the top 200 GO terms that were enriched in the set of DE genes unique to deconvolution. The identifier and name of each term is shown along with the total number of genes associated with the term, the number of associated genes that are also DE, the expected number under the null hypothesis, and the Fisher p value. (13 KB PDF)
Normalization of RNA-sequencing data is essential for accurate downstream inference, but the assumptions upon which most methods are based do not hold in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of scRNA-seq data. Total 183 single cells (92 H1 cells, 91 H9 cells), sequenced twice, were used to evaluate SCnorm in normalizing single cell RNA-seq experiments. Total 48 bulk H1 samples were used to compare bulk and single cell properties. For single-cell RNA-seq, the identical single-cell indexed and fragmented cDNA were pooled at 96 cells per lane or at 24 cells per lane to test the effects of sequencing depth, resulting in approximately 1 million and 4 million mapped reads per cell in the two pooling groups, respectively.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single cell RNA-sequencing dataset of peripheral blood mononuclear cells (pbmc: T, B, NK and monocytes) extracted from two healthy donors.
Cells labeled as C26 come from a 30 years old female and cells labeled as C27 come from a 53 years old male. Cells have been isolated from blood using ficoll. Samples were sequenced using standard 3' v3 chemistry protocols by 10x genomics. Cellranger v4.0.0 was used for the processing, and reads were aligned to the ensembl GRCg38 human genome (GRCg38_r98-ensembl_Sept2019). QC metrics were calculated on the count matrix generated by cellranger (filtered_feature_bc_matrix). Cells with less than 3 genes per cells, less than 500 reads per cell and more than 20% of mithocondrial genes were discarded.
The processing steps was performed with the R package Seurat (https://satijalab.org/seurat/), including sample integration, data normalisation and scaling, dimensional reduction, and clustering. SCTransform method was adopted for the normalisation and scaling steps. The clustered cells were manually annotated using known cell type markers.
Files content:
- raw_dataset.csv: raw gene counts
- normalized_dataset.csv: normalized gene counts (single cell matrix)
- cell_types.csv: cell types identified from annotated cell clusters
- cell_types_macro.csv: cell macro types
- UMAP_coordinates.csv: 2d cell coordinates computed with UMAP algorithm in Seurat
This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows: matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz) *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include: nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SingleCellExperiment objects containing raw and normalized counts, as well as reduced dimension representations and cell type annotations for both the follicular lymphoma samples (sce_follicular_annotated_final.rds) and high grade serous ovarian cancer samples (sce_hgsc_annotated_final.rds) as detailed in the paper.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
CD4 T lymphocytes belong to diverse cellular subsets whose sensitivity or resistance to HIV-associated killing remains to be defined. Working with lymphoid cells from human tonsils, we characterized the HIV-associated depletion of various CD4 T cell subsets using mass cytometry and single-cell RNA-seq. CD4 T cell subsets preferentially killed by HIV are phenotypically distinct from those resistant to HIV-associated cell death, in a manner not fully accounted for by their susceptibility to productive infection. Preferentially-killed subsets express CXCR5 and CXCR4 while preferentially-infected subsets exhibit an activated and exhausted effector memory cell phenotype. Single-cell RNA-seq analysis reveals that the subsets of preferentially-killed cells express genes favoring abortive infection and pyroptosis. These studies emphasize a complex interplay between HIV and distinct tissue-based CD4 T cell subsets, and the important contribution of abortive infection and inflammatory programmed cell death to the overall depletion of CD4 T cells that accompanies untreated HIV infection. Methods mass cytometry; single-cell RNA-seq mass cytometry data has been pre-gated on live singlets and normalized by CD8 cell number single-cell RNA-seq data are raw data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes information relevant to the following manuscript from the labs of Prof. Carlos Caldas (University of Cambridge), and Dr. Long V. Nguyen (Princess Margaret Cancer Centre, University Health Network):
Nguyen LV et al. Dynamics and plasticity of human breast cancer single cell-derived clones. Under consideration for publication.
Bulk RNA sequencing raw count matrices are provided (RawCounts.csv) along with the normalized count matrices (LogCPMNormCounts.csv).
Single cell RNA sequencing count matrix processed from R package metacell is provided (mat.pdx_LN_v2_filt.Rda), along with the mc and mc2d files with information on metacell partitions (mc.pdx_LN_v2_filt.Rda and mc2d.pdx_LN_v2_filt.Rda).
Code and information on data analysis is provided for reviewers in our unpublished manuscript and on Github (https://github.com/cclab-brca/clone-dynamics).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
bioRxiv preprint: https://www.biorxiv.org/content/10.1101/2020.04.02.022566v4
Contact: Grigorios Georgolopoulos (ggeorgol@altius.org); Jeff Vierstra (jvierstra@altius.org)
Lineage commitment and differentiation is driven by the concerted action of master transcriptional regulators at their target chromatin sites. Multiple efforts have characterized the key transcription factors (TFs) that determine the various hematopoietic lineages. However, the temporal interactions between individual TFs and their chromatin targets during differentiation and how these interactions dictate lineage commitment remains poorly understood. Here we delineate the temporal interplay between the cis- and the trans-regulatory landscape in establishing lineage commitment and differentiation in human hematopoiesis by performing a dense timecourse of chromatin accessibility (DNase I-seq), and gene expression (total and single cell RNA-seq).
All data uploaded correspond to human genome build version GRCh38.
Contents
DNase I Hotspot (DHS) metadata: Supplementary_Data_1.txt
DNase I Hotspot quantile-normalized counts: A tab-separated matrix with quantile-normalized DNase I density counts from 79,085 FDR 5% hotspots, across 12 erythroid differentiation timepoints from 3 donors, present in at least n=2 samples. Rows correspond to DHS information in Supplementary_Data_1.txt (hotspots.fdr.0.05.qnorm.counts.tsv.gz)
Column information for DNase I Hotspot quantile-normalized counts: hotspots.fdr.0.05.qnorm.counts.info.tsv
Developmentally regulated gene metadata (erythroid): Supplementary_Data_2.csv
Gene matrix of quantile-normalized FPKM values (erythroid): A tab-separated matrix with the quantile-normalized FPKM values of all detected genes, across 13 erythroid differentiation timepoints from 3 donors. (fpkm_erythroid_qnorm.tsv.gz)
Column information for the quantile-normalized FPKM gene matrix (erythroid): A tab-separated table (fpkm_erythroid_qnorm.info.tsv)
CD34+ HSPC TADs at 10kb resolution: Supplementary_Data_3.bed
Day 11 ex vivo erythroid progenitor TADs at 10kb resolution: Supplementary_Data_4.bed
Transcription factor motif enrichment per DHS cluster: Supplementary_Data_5.csv
Correlation information (links) between developmentally regulated DHS and target genes: Supplementary_Data_6.csv
Chromatin anchor loops called from 10kb resolution Hi-C data: Supplementary_Data_7.bedgraph
Developmentally regulated gene metadata (megakaryocytic): Supplementary_Data_8.csv
Gene matrix of quantile-normalized FPKM values (megakaryocytic): A tab-separated matrix with the quantile-normalized FPKM values of all detected genes, across 13 megakaryocytic differentiation timepoints from 3 donors. (fpkm_megakaryocyte_qnorm.tsv.gz)
Column information for the quantile-normalized FPKM gene matrix (megakaryocytic): A tab-separated table (fpkm_megakaryocyte_qnorm.info.tsv)
Marker (differentially expressed) genes per single cell population: Supplementary_Data_9.csv
A SCANPY h5ad Annotated DataFrame object: Annotated Data frame anndata
in h5ad format including the gene-by-cell count matrix, Velocyto splicing kinetics (RNA velocity) information layer, along with obs, obsm, var, varm, and uns layers. (SCANPY_anndata_object.h5ad)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study was designed to capture the changes in CD8 T cell phenotypes in murine glioma (CT-2A) model post immunotherapy with AAV-LIGHT (TNFSF14). Tumour infiltrating CD45 cells were isolated by flow sorting and subject to targeted single-cell transcriptome sequencing and downstream analysis. 1. The 3 .bam files contain aligned sequence data for sc-RNAseq of all the CD8+T cells in the 3 libraries that were analysed in the mentioned study. 2. md5 checksum file to verify integrity of the data 3. Metadata including cell id to annotate the cells to treatment and outcome 4. Normalized counts 4. Normalized gene expression data of CD8 T cells.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.
https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/
Backgroud informationThe dataset contains single cell targeted RNA sequencing (RNAseq) and targeted antibody-oligonucleotide conjugates sequencing (Abseq) data from chimeric antigen receptor (CAR)-engineered T cells used to treat each individual cancer patients in a clinical study. The starting material was in all cases autologous T cells harvested from peripheral blood of patients. The data is collected from 24 participants of which 23 were adult patients with relapsed or refractory B cell lymphoma and one was a pediatric patient with relapsed B cell acute lymphoblastic leukemia. The data were generated as part of a study by Sarén et. al, Clinical Cancer Research (2023).Targeted RNA and protein single-cell libraries were generated using the BD Rhapsody™ platform (BD Biosciences). Cells were labeled with sample tags from the BD Human Immune Single-Cell Multiplexing Kit and BD Ab-seq Ab-Oligos and live cells were collected by flow cytometry. CAR-T cells were loaded on BD Rhapsody cartridge and mRNA captured with cell capture beads and used as template for cDNA synthesis. Four separate targeted libraries were produced and pooled for paired-end sequencing on NovaSeq 6000 S1 sequencer (Illumina) at the SNP&SEQ Technology Platform (Uppsala, Sweden).Terms of accessSequencing data generated during the current study are not publicly available due to the European General Data Protection Regulation (GDPR) to protect patients’ privacy but are available from the corresponding author on reasonable request (see contact info). The dataset is only to be used for research that is seeking to advance the understanding of CAR-T cell treatment of cancer.Ancillary datasets and codeProcessed RNAseq and AbSeq data, in the form of raw and normalized count matrices, are available on BioStudies (Accession: E-MTAB-12407).R code used to process the data is available on the study GitHub repository:https://github.com/magnessa/EudraCT_2016-004043-36
Genomic profiling in post-mortem brain from autistic individuals has consistently revealed convergent molecular changes. What drives these changes and how they relate to genetic susceptibility in this complex condition is not understood. We performed deep single nuclear RNA sequencing (snRNAseq) to examine cell composition and transcriptomics, identifying dysregulation of cell type-specific gene regulatory networks (GRNs) in autism, which we corroborated using snATAC-seq and spatial transcriptomics. Transcriptomic changes were primarily cell type-specific, involving multiple cell types, most prominently interhemispheric and callosal-projecting neurons, interneurons within superficial laminae, and distinct glial reactive states involving oligodendrocytes, microglia, and astrocytes. Autism-associated GRN drivers and their targets were enriched in rare and common genetic risk variants, connecting autism genetic susceptibility and cellular and circuit alterations in the human brain. This da..., Please see Manuscript for detailed information. In Brief: we generated Pseudobulk expression ASD vs CTL analysis by cell type. We generated pseudobulk counts for each sample by adding counts from the same cell type. Then pseudobulk counts are normalized by variance stabilizing transformation method. To identify genes differentially expressed in ASD compared to control in each cell type, we examined covariates with top 5 PCs from normalized pseudo-bulk expression matrix. We identified the following covariates consistently correlated with top 5PCs for each cell type: age, PMI, BrainRegion, SeqBatch, Mito_perc, and ngenes. We then randomly selected subjects 500 times and calculated average beta to regress out effects of these covariates. Then we used limma-voom to identify differentially expressed genes for each cluster., , # Raw differential gene expression data, data S1, from: Molecular cascades and cell type-specific signatures in ASD revealed by single cell genomics
Raw Differential Gene Expression data, Data S1, from "Molecular cascades and cell type-specific signatures in ASD revealed by single cell genomics"
Excell document with raw differential gene expression ASD vs. CTL per cell-type cluster. The first column is a drop-down selection to select which cell-type to view the differential gene expression results. The second (ASDvCTL) and third (CTLvASD) columns are the LOGFC value for each differential gene for a given cell type, the fourth (ASDvCTL) and fifth (CTLvASD) column are the p-values for each gene for a given cell type, the sixth (ASDvCTL) and seventh (CTLvASD) columns are the FDR-values for each gene for a given cell type, and the last column is the gene name.
Links to other publicly accessible locations of t...
NGS-Based Rna-Seq Market Size 2024-2028
The NGS-based RNA-seq market size is forecast to increase by USD 6.66 billion, at a CAGR of 20.52% between 2023 and 2028.
The market is witnessing significant growth, driven by the increased adoption of next-generation sequencing (NGS) methods for RNA-Seq analysis. The advanced capabilities of NGS techniques, such as high-throughput, cost-effectiveness, and improved accuracy, have made them the preferred choice for researchers and clinicians in various fields, including genomics, transcriptomics, and personalized medicine. However, the market faces challenges, primarily from the lack of clinical validation on direct-to-consumer genetic tests. As the use of NGS technology in consumer applications expands, ensuring the accuracy and reliability of results becomes crucial.
The absence of standardized protocols and regulatory oversight in this area poses a significant challenge to market growth and trust. Companies seeking to capitalize on market opportunities must focus on addressing these challenges through collaborations, partnerships, and investments in research and development to ensure the clinical validity and reliability of their NGS-based RNA-Seq offerings.
What will be the Size of the NGS-based RNA-Seq market during the forecast period?
Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
Request Free Sample
The market continues to evolve, driven by advancements in NGS technology and its applications across various sectors. Spatial transcriptomics, a novel approach to studying gene expression in its spatial context, is gaining traction in disease research and precision medicine. Splice junction detection, a critical component of RNA-seq data analysis, enhances the accuracy of gene expression profiling and differential gene expression studies. Cloud computing plays a pivotal role in handling the massive amounts of data generated by NGS platforms, enabling real-time data analysis and storage. Enrichment analysis, gene ontology, and pathway analysis facilitate the interpretation of RNA-seq data, while data normalization and quality control ensure the reliability of results.
Precision medicine and personalized therapy are key applications of RNA-seq, with single-cell RNA-seq offering unprecedented insights into the complexities of gene expression at the single-cell level. Read alignment and variant calling are essential steps in RNA-seq data analysis, while bioinformatics pipelines and RNA-seq software streamline the process. NGS technology is revolutionizing drug discovery by enabling the identification of biomarkers and gene fusion detection in various diseases, including cancer and neurological disorders. RNA-seq is also finding applications in infectious diseases, microbiome analysis, environmental monitoring, agricultural genomics, and forensic science. Sequencing costs are decreasing, making RNA-seq more accessible to researchers and clinicians.
The ongoing development of sequencing platforms, library preparation, and sample preparation kits continues to drive innovation in the field. The dynamic nature of the market ensures that it remains a vibrant and evolving field, with ongoing research and development in areas such as data visualization, clinical trials, and sequencing depth.
How is this NGS-based RNA-Seq industry segmented?
The NGS-based RNA-seq industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.
End-user
Acamedic and research centers
Clinical research
Pharma companies
Hospitals
Technology
Sequencing by synthesis
Ion semiconductor sequencing
Single-molecule real-time sequencing
Others
Geography
North America
US
Europe
Germany
UK
APAC
China
Singapore
Rest of World (ROW)
.
By End-user Insights
The acamedic and research centers segment is estimated to witness significant growth during the forecast period.
The global next-generation sequencing (NGS) market for RNA sequencing (RNA-Seq) is primarily driven by academic and research institutions, including those from universities, research institutes, government entities, biotechnology organizations, and pharmaceutical companies. These institutions utilize NGS technology for various research applications, such as whole-genome sequencing, epigenetics, and emerging fields like agrigenomics and animal research, to enhance crop yield and nutritional composition. NGS-based RNA-Seq plays a pivotal role in translational research, with significant investments from both private and public organizations fueling its growth. The technology is instrumental in disease research, enabling the identification
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Single cell RNA-seq data, like data from other sequencing technology, contain systematic technical noise. Such noise results from a combined effect of unequal efficiencies in the capturing and counting of mRNA molecules, such as extraction/amplification efficiency and sequencing depth. We show that such technical effects are not only cell-specific, but also affect genes differently, thus a simple cell-wise size factor adjustment may not be sufficient. We present a non-linear normalization approach that provides a cell- and gene-specific normalization factor for each gene in each cell. We show that the proposed normalization method (implemented in “SC2P" package) reduces more technical variation than competing methods, without reducing biological variation. When technical effects such as sequencing depths are not balanced between cell populations, SC2P normalization also removes the bias due to uneven technical noise. This method is applicable to scRNA-seq experiments that do not use unique molecular identifier (UMI) thus retain amplification biases.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports the manuscript titled "Molecular and developmental deficits in Smith-Magenis syndrome patient hiPSC-derived cortical neural models". It includes processed single-cell RNA sequencing (scRNA-seq) and bulk RNA sequencing (bulk RNA-seq) data derived from human induced pluripotent stem cell (hiPSC)-derived cortical neural progenitor cells and neurons obtained from Smith-Magenis syndrome (SMS) patients and matched healthy controls.
The dataset comprises:
bulkRNA-seq.zip
: Normalized gene expression count tables and differential gene expression results from bulk RNA-seq analysis, along with patient-level metadata including demographic information (e.g., age, sex, diagnosis group).
The data capture transcriptomic changes across developmental stages and enable the study of disease-associated molecular and cellular alterations in SMS. These files are intended for secondary analysis and reproducibility; raw FASTQ files are not included in this deposit. A companion GitHub repository with code for data preprocessing and analysis will be provided.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This record contains analysis products for the paper "Transcription factor stoichiometry, motif affinity and syntax regulate single cell chromatin dynamics during fibroblast reprogramming to pluripotency" by Nair, Ameen et al. Please refer to the READMEs in the directories, which are summarized below.
The record contains the following files:
`clusters.tsv`: contains the cluster id, name and colour of clusters in the paper
scATAC.zip
Analysis products for the single-cell ATAC-seq data. Contains:
- `cells.tsv`: list of barcodes that pass QC. Columns include:
- `barcode`
- `sample`: (time point)
- `umap1`
- `umap2`
- `cluster`
- `dpt_pseudotime_fibr_root`: pseudotime values treating a fibroblast cell as root
- `dpt_pseudotime_xOSK_root`: pseudotime values treating xOSK cell as root
- `peaks.bed`: list of peaks of 500bp across all cell states. 4th column contains the peak set label. Note that ~5000 peaks are not assigned to any peak set and are marked as NA.
- `features.tsv`: 50 dimensional representation of each cell
- `cell_x_peak.mtx.gz`: sparse matrix of fragment counts within peaks. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (combine sample + barcode). Rows correspond to peaks in `peaks.bed`
scATAC_clusters.zip
Analysis products corresponding to cluster pseudo-bulks of the single-cell ATAC-seq data.
- `clusters.tsv`: contains the cluster id, name and colour used in the paper
- `peaks`: contains `overlap_reproducibilty/overlap.optimal_peak` peaks called using ENCODE bulk ATAC-seq pipeline in the narrowPeak format.
- `fragments`: contains per cluster fragment files
scATAC_scRNA_integration.zip
Analysis products from the integration of scATAC with scRNA. Contains:
- `peak_gene_links_fdr1e-4.tsv`: file with peak gene links passing FDR 1e-4. For analyses in the paper, we filter to peaks with absolute correlation >0.45.
- `harmony.cca.30.feat.tsv`: 30 dimensional co-embedding for scATAC and scRNA cells obtained by CCA followed by applying Harmony over assay type.
- `harmony.cca.metadata.tsv`: UMAP coordinates for scATAC and scRNA cells derived from the Harmony CCA embedding. First column contains barcode.
scRNA.zip
Analysis products for the single-cell RNA-seq data. Contains:
- `seurat.rds`: seurat object that contains expression data (raw counts, normalized, and scaled), reductions (umap, pca), knn graphs, all associated metadata. Note that barcode suffix (1-9 corresponds to samples D0, D2, ..., D14, iPSC)
- `genes.txt`: list of all genes
- `cells.tsv`: list of barcodes that pass QC across samples. Contains:
- `barcode_sample`: barcode with index of sample (1-9 corresponding to D0, D2, ..., D14, iPSC)
- `sample`: sample name (D0, D2, .., D14, iPSC)
- `umap1`
- `umap2`
- `nCount_RNA`
- `nFeature_RNA`
- `cluster`
- `percent.mt`: percent of mitochondrial transcripts in cell
- `percent.oskm`: percent of OSKM transcripts in cell
- `gene_x_cell.mtx.gz`: sparse matrix of gene counts. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (barcode suffix contains sample information). Rows correspond to genes in `genes.txt`
- `pca.tsv`: first 50 PC of each cell
- `oskm_endo_sendai.tsv`: estimated raw counts (cts, may not be integers) and log(1+ tp10k) normalized expression (norm) for endogenous and exogenous (Sendai derived) counts of POU5F1 (OCT4), SOX2, KLF4 and MYC genes. Rows are consistent with `seurat.rds` and `cells.tsv`
multiome.zip
multiome/snATAC:
These files are derived from the integration of nuclei from multiome (D1M and D2M), with cells from day 2 of scATAC-seq (labeled D2).
- `cells.tsv`: This is the list of nuclei barcodes that pass QC from multiome AND also cell barcodes from D2 of scATAC-seq. Includes:
- `barcode`
- `umap1`: These are the coordinates used for the figures involving multiome in the paper.
- `umap2`: ^^^
- `sample`: D1M and D2M correspond to multiome, D2 corresponds to day 2 of scATAC-seq
- `cluster`: For multiome barcodes, these are labels transfered from scATAC-seq. For D2 scATAC-seq, it is the original cluster labels.
- `peaks.bed`: This is the same file as scATAC/peaks.bed. List of peaks of 500bp. 4th column contains the peak set label. Note that ~5000 peaks are not assigned to any peak set and are marked as NA.
- `cell_x_peak.mtx.gz`: sparse matrix of fragment counts within peaks. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (combine sample + barcode). Rows correspond to peaks in `peaks.bed`.
- `features.no.harmony.50d.tsv`: 50 dimensional representation of each cell prior to running Harmony (to correct for batch effect between D2 scATAC and D1M,D2M snMultiome). Rows correspond to cells from `cells.tsv`.
- `features.harmony.10d.tsv`: 10 dimensional representation of each cell after running Harmony. Rows correspond to cells from `cells.tsv`.
multiome/snRNA:
- `seurat.rds`: seurat object that contains expression data (raw counts, normalized, and scaled), reductions (umap, pca),associated metadata. Note that barcode suffix (1,2 corresponds to samples D1M, D2M). Please use the UMAP/features from snATAC/ for consistency.
- `genes.txt`: list of all genes (this is different from the list in scRNA analysis)
- `cells.tsv`: list of barcodes that pass QC across samples. Contains:
- `barcode_sample`: barcode with index of sample (1,2 corresponding to D1M, D2M respectively)
- `sample`: sample name (D1M, D2M)
- `nCount_RNA`
- `nFeature_RNA`
- `percent.oskm`: percent of OSKM genes in cell
- `gene_x_cell.mtx.gz`: sparse matrix of gene counts. Load using scipy.io.mmread in python or readMM in R. Columns correspond to cells from `cells.tsv` (barcode suffix contains sample information). Rows correspond to genes in `genes.txt`
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Enriched GO terms for library size normalization. This file is in a tab-separated format and contains the top 200 GO terms that were enriched in the set of DE genes unique to library size normalization. The fields are the same as described for Additional file 2. (13 KB PDF)