100+ datasets found
  1. f

    A graph-based algorithm for RNA-seq data normalization

    • figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diem-Trang Tran; Aditya Bhaskara; Balagurunathan Kuberan; Matthew Might (2023). A graph-based algorithm for RNA-seq data normalization [Dataset]. http://doi.org/10.1371/journal.pone.0227760
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Diem-Trang Tran; Aditya Bhaskara; Balagurunathan Kuberan; Matthew Might
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The use of RNA-sequencing has garnered much attention in recent years for characterizing and understanding various biological systems. However, it remains a major challenge to gain insights from a large number of RNA-seq experiments collectively, due to the normalization problem. Normalization has been challenging due to an inherent circularity, requiring that RNA-seq data be normalized before any pattern of differential (or non-differential) expression can be ascertained; meanwhile, the prior knowledge of non-differential transcripts is crucial to the normalization process. Some methods have successfully overcome this problem by the assumption that most transcripts are not differentially expressed. However, when RNA-seq profiles become more abundant and heterogeneous, this assumption fails to hold, leading to erroneous normalization. We present a normalization procedure that does not rely on this assumption, nor prior knowledge about the reference transcripts. This algorithm is based on a graph constructed from intrinsic correlations among RNA-seq transcripts and seeks to identify a set of densely connected vertices as references. Application of this algorithm on our synthesized validation data showed that it could recover the reference transcripts with high precision, thus resulting in high-quality normalization. On a realistic data set from the ENCODE project, this algorithm gave good results and could finish in a reasonable time. These preliminary results imply that we may be able to break the long persisting circularity problem in RNA-seq normalization.

  2. f

    Table_1_Comparison of Normalization Methods for Analysis of TempO-Seq...

    • datasetcatalog.nlm.nih.gov
    Updated Jun 23, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bushel, Pierre R.; Ramaiahgari, Sreenivasa C.; Auerbach, Scott S.; Paules, Richard S.; Ferguson, Stephen S. (2020). Table_1_Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data.XLSX [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000579045
    Explore at:
    Dataset updated
    Jun 23, 2020
    Authors
    Bushel, Pierre R.; Ramaiahgari, Sreenivasa C.; Auerbach, Scott S.; Paules, Richard S.; Ferguson, Stephen S.
    Description

    Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.

  3. f

    Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data...

    • frontiersin.figshare.com
    zip
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zip [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

  4. f

    Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data...

    • frontiersin.figshare.com
    application/cdfv2
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s001
    Explore at:
    application/cdfv2Available download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

  5. d

    Single cell RNA-seq data of human hESCs to evaluate SCnorm: robust...

    • datamed.org
    Updated Dec 27, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Single cell RNA-seq data of human hESCs to evaluate SCnorm: robust normalization of single-cell rna-seq data [Dataset]. https://datamed.org/display-item.php?repository=0008&id=5914e6815152c67771b63e90&query=H1-4
    Explore at:
    Dataset updated
    Dec 27, 2016
    Description

    Normalization of RNA-sequencing data is essential for accurate downstream inference, but the assumptions upon which most methods are based do not hold in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of scRNA-seq data. Overall design: Total 183 single cells (92 H1 cells, 91 H9 cells), sequenced twice, were used to evaluate SCnorm in normalizing single cell RNA-seq experiments. Total 48 bulk H1 samples were used to compare bulk and single cell properties. For single-cell RNA-seq, the identical single-cell indexed and fragmented cDNA were pooled at 96 cells per lane or at 24 cells per lane to test the effects of sequencing depth, resulting in approximately 1 million and 4 million mapped reads per cell in the two pooling groups, respectively.

  6. f

    File S1 - Normalization of RNA-Sequencing Data from Samples with Varying...

    • datasetcatalog.nlm.nih.gov
    Updated Feb 25, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Collas, Philippe; Rognes, Torbjørn; Aanes, Håvard; Winata, Cecilia; Moen, Lars F.; Aleström, Peter; Østrup, Olga; Mathavan, Sinnakaruppan (2014). File S1 - Normalization of RNA-Sequencing Data from Samples with Varying mRNA Levels [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001266682
    Explore at:
    Dataset updated
    Feb 25, 2014
    Authors
    Collas, Philippe; Rognes, Torbjørn; Aanes, Håvard; Winata, Cecilia; Moen, Lars F.; Aleström, Peter; Østrup, Olga; Mathavan, Sinnakaruppan
    Description

    Table S1 and Figures S1–S6. Table S1. List of primers. Forward and reverse primers used for qPCR. Figure S1. Changes in total and polyA+ RNA during development. a) Amount of total RNA per embryo at different developmental stages. b) Amount of polyA+ RNA per 100 embryos at different developmental stages. Vertical bars represent standard errors. Figure S2. The TMM scaling factor. a) The TMM scaling factor estimated using dataset 1 and 2. We observe very similar values. b) The TMM scaling factor obtained using the replicates in dataset 2. The TMM values are very reproducible. c) The TMM scale factor when RNA-seq data based on total RNA was used. Figure S3. Comparison of scales. We either square-root transformed or used that scales directly and compared the normalized fold-changes to RT-qPCR results. a) Transcripts with dynamic change pre-ZGA. b) Transcripts with decreased abundance post-ZGA. c) Transcripts with increased expression post-ZGA. Vertical bars represent standard deviations. Figure S4. Comparison of RT-qPCR results depending on RNA template (total or poly+ RNA) and primers (random or oligo(dT) primers) for setd3 (a), gtf2e2 (b) and yy1a (c). The increase pre-ZGA is dependent on template (setd3 and gtf2e2) and not primer type. Figure S5. Efficiency calibrated fold-changes for a subset of transcripts. Vertical bars represent standard deviations. Figure S6. Comparison normalization methods using dataset 2 for transcripts with decreased expression post-ZGA (a) and increased expression post-ZGA (b). Vertical bars represent standard deviations. (PDF)

  7. f

    Table_4_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...

    • frontiersin.figshare.com
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Lytal; Di Ran; Lingling An (2023). Table_4_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00041.s005
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Nicholas Lytal; Di Ran; Lingling An
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.

  8. f

    RNA-seq Titration Results supporting "Cross-platform normalization enables...

    • plus.figshare.com
    bin
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven Foltz; Jaclyn Taroni; Casey Greene (2023). RNA-seq Titration Results supporting "Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously" [Dataset]. http://doi.org/10.25452/figshare.plus.19629864.v5
    Explore at:
    binAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figshare+
    Authors
    Steven Foltz; Jaclyn Taroni; Casey Greene
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This data accompanies the manuscript "Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously" by Foltz, Taroni, and Greene https://doi.org/10.1038/s42003-023-04588-6 Please refer to our github page. The file contains all the raw input data, output files needed for plotting, and the intermediate files (including models and normalized data) from one repeat (seed 3274).

    Abstract: Large compendia of gene expression data have proven valuable for the discovery of novel biological relationships. Historically, the majority of available RNA assays were run on microarray, while RNA-seq is now the platform of choice for many new experiments. The data structure and distributions between the platforms differ, making it challenging to combine them directly. Here we perform supervised and unsupervised machine learning evaluations to assess which existing normalization methods are best suited for combining microarray and RNA-seq data. We find that quantile and Training Distribution Matching normalization allow for supervised and unsupervised model training on microarray and RNA-seq data simultaneously. Nonparanormal normalization and z-scores are also appropriate for some applications, including pathway analysis with Pathway-Level Information Extractor (PLIER). We demonstrate that it is possible to perform effective cross-platform normalization using existing methods to combine microarray and RNA-seq data for machine learning applications.

  9. RNA_Seq_Data_Preprocessing_DGE analysis

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). RNA_Seq_Data_Preprocessing_DGE analysis [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/rna-seq-data-preprocessing-dge-analysis
    Explore at:
    zip(75256 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains RNA-Seq data preprocessing and differential gene expression (DGE) analysis.

    It is designed for researchers, bioinformaticians, and students interested in transcriptomics.

    The dataset includes raw count data and step-by-step preprocessing instructions.

    It demonstrates quality control, normalization, and filtering of RNA-Seq data.

    Differential expression analysis using popular tools and methods is included.

    Results include differentially expressed genes with statistical significance.

    It provides visualizations like PCA plots, heatmaps, and volcano plots.

    The dataset is suitable for learning and reproducing RNA-Seq workflows.

    Both human-readable explanations and code snippets are included for guidance.

    It can serve as a reference for new RNA-Seq projects and research pipelines.

  10. Global Ngs-Based Rna-Seq Growth Analysis - Size and Forecast 2024 - 2028 |...

    • technavio.com
    pdf
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). Global Ngs-Based Rna-Seq Growth Analysis - Size and Forecast 2024 - 2028 | Technavio [Dataset]. https://www.technavio.com/report/ngs-based-rna-seq-market-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2024 - 2028
    Description

    snapshot-tab-pane NGS-Based Rna-Seq Market Size 2024-2028The NGS-based RNA-seq market size is forecast to increase by USD 6.66 billion, at a CAGR of 20.52% between 2023 and 2028.The market is witnessing significant growth, driven by the increased adoption of next-generation sequencing (NGS) methods for RNA-Seq analysis. The advanced capabilities of NGS techniques, such as high-throughput, cost-effectiveness, and improved accuracy, have made them the preferred choice for researchers and clinicians in various fields, including genomics, transcriptomics, and personalized medicine. However, the market faces challenges, primarily from the lack of clinical validation on direct-to-consumer genetic tests. As the use of NGS technology in consumer applications expands, ensuring the accuracy and reliability of results becomes crucial.The absence of standardized protocols and regulatory oversight in this area poses a significant challenge to market growth and trust. Companies seeking to capitalize on market opportunities must focus on addressing these challenges through collaborations, partnerships, and investments in research and development to ensure the clinical validity and reliability of their NGS-based RNA-Seq offerings.What will be the Size of the NGS-based RNA-Seq market during the forecast period?Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report. Request Free SampleThe market continues to evolve, driven by advancements in NGS technology and its applications across various sectors. Spatial transcriptomics, a novel approach to studying gene expression in its spatial context, is gaining traction in disease research and precision medicine. Splice junction detection, a critical component of RNA-seq data analysis, enhances the accuracy of gene expression profiling and differential gene expression studies. Cloud computing plays a pivotal role in handling the massive amounts of data generated by NGS platforms, enabling real-time data analysis and storage. Enrichment analysis, gene ontology, and pathway analysis facilitate the interpretation of RNA-seq data, while data normalization and quality control ensure the reliability of results.Precision medicine and personalized therapy are key applications of RNA-seq, with single-cell RNA-seq offering unprecedented insights into the complexities of gene expression at the single-cell level. Read alignment and variant calling are essential steps in RNA-seq data analysis, while bioinformatics pipelines and RNA-seq software streamline the process. NGS technology is revolutionizing drug discovery by enabling the identification of biomarkers and gene fusion detection in various diseases, including cancer and neurological disorders. RNA-seq is also finding applications in infectious diseases, microbiome analysis, environmental monitoring, agricultural genomics, and forensic science. Sequencing costs are decreasing, making RNA-seq more accessible to researchers and clinicians.The ongoing development of sequencing platforms, library preparation, and sample preparation kits continues to drive innovation in the field. The dynamic nature of the market ensures that it remains a vibrant and evolving field, with ongoing research and development in areas such as data visualization, clinical trials, and sequencing depth.How is this NGS-based RNA-Seq industry segmented?The NGS-based RNA-seq industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in "USD million" for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.End-user Acamedic and research centersClinical researchPharma companiesHospitalsTechnology Sequencing by synthesisIon semiconductor sequencingSingle-molecule real-time sequencingOthersGeography North America USEurope GermanyUKAPAC ChinaSingaporeRest of World (ROW) . By End-user InsightsThe acamedic and research centers segment is estimated to witness significant growth during the forecast period.The global next-generation sequencing (NGS) market for RNA sequencing (RNA-Seq) is primarily driven by academic and research institutions, including those from universities, research institutes, government entities, biotechnology organizations, and pharmaceutical companies. These institutions utilize NGS technology for various research applications, such as whole-genome sequencing, epigenetics, and emerging fields like agrigenomics and animal research, to enhance crop yield and nutritional composition. NGS-based RNA-Seq plays a pivotal role in translational research, with significant investments from both private and public organizations fueling its growth. The technology is instrumental in disease research, enabling the identificati

  11. Z

    Dataset for: A graph-based algorithm for RNA-seq data normalization

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diem-Trang Tran (2020). Dataset for: A graph-based algorithm for RNA-seq data normalization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2667313
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    University of Utah
    Authors
    Diem-Trang Tran
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    mRNA-seq assays on mouse tissues were downloaded from the ENCODE project and consolidated into matrices of expression

  12. Summary of current normalization methods to correct the technical biases for...

    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farnoosh Abbas-Aghababazadeh; Qian Li; Brooke L. Fridley (2023). Summary of current normalization methods to correct the technical biases for RNA-Seq data. [Dataset]. http://doi.org/10.1371/journal.pone.0206312.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Farnoosh Abbas-Aghababazadeh; Qian Li; Brooke L. Fridley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary of current normalization methods to correct the technical biases for RNA-Seq data.

  13. Processed data - DegNorm: Normalization of generalized transcript...

    • zenodo.org
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bin Xiong; Yiben Yang; Frank R Fineis; Ji-Ping Wang; Bin Xiong; Yiben Yang; Frank R Fineis; Ji-Ping Wang (2020). Processed data - DegNorm: Normalization of generalized transcript degradation improves accuracy in RNA-seq analysis [Dataset]. http://doi.org/10.5281/zenodo.2595303
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bin Xiong; Yiben Yang; Frank R Fineis; Ji-Ping Wang; Bin Xiong; Yiben Yang; Frank R Fineis; Ji-Ping Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Processed data from DegNorm:

    • "_raw.txt": raw read counts matrix;
    • "_DI.txt": Degradation index score matrix;
    • "_DegNorm.txt": normalized read counts matrix from DegNorm output;
    • "_coverage.Rdata": list of coverage matrix for the sample;
    • "_countsTIN.txt": TIN normalized counts.

  14. Datasets, reproducible codes, and results for evaluating differential...

    • zenodo.org
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boris P Hejblum; Boris P Hejblum; Kalidou Ba; Rodolphe Thiébaut; Denis Agniel; Kalidou Ba; Rodolphe Thiébaut; Denis Agniel (2023). Datasets, reproducible codes, and results for evaluating differential expression analysis methods on population-level RNA-seq data [Dataset]. http://doi.org/10.5281/zenodo.6554347
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Boris P Hejblum; Boris P Hejblum; Kalidou Ba; Rodolphe Thiébaut; Denis Agniel; Kalidou Ba; Rodolphe Thiébaut; Denis Agniel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This upload contains the necessary R codes and data to reproduce the FDR and Power results described in our correspondence "Neglecting normalization impact in semi-synthetic RNA-seq data simulation generates artificial false positives" to Li Y, Ge X, Peng F, Li W, Li JJ, Exaggerated false positives by popular differential expression methods when analyzing human population samples, Genome Biology 23, 79, 2022, DOI: 10.1186/s13059-022-02648-4.

  15. u

    Data from: Reference transcriptomics of porcine peripheral immune cells...

    • agdatacommons.nal.usda.gov
    • datasets.ai
    • +1more
    zip
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juber Herrera-Uribe; Jayne Wiarda; Sathesh K. Sivasankaran; Lance Daharsh; Haibo Liu; Kristen A. Byrne; Timothy P. L. Smith; Joan K. Lunney; Crystal L. Loving; Christopher K. Tuggle (2025). Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing [Dataset]. http://doi.org/10.15482/USDA.ADC/1522411
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    Juber Herrera-Uribe; Jayne Wiarda; Sathesh K. Sivasankaran; Lance Daharsh; Haibo Liu; Kristen A. Byrne; Timothy P. L. Smith; Joan K. Lunney; Crystal L. Loving; Christopher K. Tuggle
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows:

    matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz)

    *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include:

    nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().

  16. d

    Extended data tables to Haering and Habermann, F1000Res, RNfuzzyApp: an R...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated May 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bianca Habermann; Margaux Haering (2025). Extended data tables to Haering and Habermann, F1000Res, RNfuzzyApp: an R shiny RNA-seq data analysis app for visualisation, differential expression analysis, time-series clustering and enrichment analysis [Dataset]. http://doi.org/10.5061/dryad.8pk0p2nnd
    Explore at:
    Dataset updated
    May 18, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Bianca Habermann; Margaux Haering
    Time period covered
    Jul 8, 2021
    Description

    BackgroundÂ

    RNA-seq is a widely adopted affordable method for large scale gene expression profiling. However, user-friendly and versatile tools for wet-lab biologists to analyse RNA-seq data beyond standard analyses such as differential expression, are rare. Especially, the analysis of time-series data is difficult for wet-lab biologists lacking advanced computational training. Furthermore, most meta-analysis tools are tailored for model organisms and not easily adaptable to other species.

    Results

    With RNfuzzyApp, we provide a user-friendly, web-based R-shiny app for differential expression analysis, as well as time-series analysis of RNA-seq data. RNfuzzyApp offers several methods for normalization and differential expression analysis of RNA-seq data, providing easy-to-use toolboxes, interactive plots and downloadable results. For time-series analysis, RNfuzzyApp presents the first web-based, automated pipeline for soft clustering with the Mfuzz R package, including methods to...

  17. f

    Data_Sheet_1_SMIXnorm: Fast and Accurate RNA-Seq Data Normalization for...

    • frontiersin.figshare.com
    pdf
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shen Yin; Xiaowei Zhan; Bo Yao; Guanghua Xiao; Xinlei Wang; Yang Xie (2023). Data_Sheet_1_SMIXnorm: Fast and Accurate RNA-Seq Data Normalization for Formalin-Fixed Paraffin-Embedded Samples.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.650795.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    Frontiers
    Authors
    Shen Yin; Xiaowei Zhan; Bo Yao; Guanghua Xiao; Xinlei Wang; Yang Xie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    RNA-sequencing (RNA-seq) provides a comprehensive quantification of transcriptomic activities in biological samples. Formalin-Fixed Paraffin-Embedded (FFPE) samples are collected as part of routine clinical procedure, and are the most widely available biological sample format in medical research and patient care. Normalization is an essential step in RNA-seq data analysis. A number of normalization methods, though developed for RNA-seq data from fresh frozen (FF) samples, can be used with FFPE samples as well. The only extant normalization method specifically designed for FFPE RNA-seq data, MIXnorm, which has been shown to outperform the normalization methods, but at the cost of a complex mixture model and a high computational burden. It is therefore important to adapt MIXnorm for simplicity and computational efficiency while maintaining superior performance. Furthermore, it is critical to develop an integrated tool that performs commonly used normalization methods for both FF and FFPE RNA-seq data. We developed a new normalization method for FFPE RNA-seq data, named SMIXnorm, based on a simplified two-component mixture model compared to MIXnorm to facilitate computation. The expression levels of expressed genes are modeled by normal distributions without truncation, and those of non-expressed genes are modeled by zero-inflated Poisson distributions. The maximum likelihood estimates of the model parameters are obtained by a nested Expectation-Maximization algorithm with a less complicated latent variable structure, and closed-form updates are available within each iteration. Real data applications and simulation studies show that SMIXnorm greatly reduces computing time compared to MIXnorm, without sacrificing the performance. More importantly, we developed a web-based tool, RNA-seq Normalization (RSeqNorm), that offers a simple workflow to compute normalized RNA-seq data for both FFPE and FF samples. It includes SMIXnorm and MIXnorm for FFPE RNA-seq data, together with five commonly used normalization methods for FF RNA-seq data. Users can easily upload a raw RNA-seq count matrix and select one of the seven normalization methods to produce a downloadable normalized expression matrix for any downstream analysis. The R package is available at https://github.com/S-YIN/RSEQNORM. The web-based tool, RSeqNorm is available at http://lce.biohpc.swmed.edu/rseqnorm with no restriction to use or redistribute.

  18. scRNA-seq human embryonic stem H1, H9 cell lines

    • kaggle.com
    zip
    Updated Jul 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Chervov (2021). scRNA-seq human embryonic stem H1, H9 cell lines [Dataset]. https://www.kaggle.com/alexandervc/scrnaseq-human-embryonic-stem-h1-h9-cell-lines
    Explore at:
    zip(65064956 bytes)Available download formats
    Dataset updated
    Jul 28, 2021
    Authors
    Alexander Chervov
    Description

    Remark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev

    Data and Context

    Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (csv file is vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics

    Particular data from: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76381 There are original TXT files and reconversion to *.h5ad format which is more easy to work with. There are several subdatasets human/mouse/different cell types.

    Paper: SCnorm: robust normalization of single-cell RNA-seq data https://pubmed.ncbi.nlm.nih.gov/28418000/ Bacher R, Chu LF, Leng N, Gasch AP et al. SCnorm: robust normalization of single-cell RNA-seq data. Nat Methods 2017 Jun;14(6):584-586

    Abstract: The normalization of RNA-seq data is essential for accurate downstream inference, but the assumptions upon which most normalization methods are based are not applicable in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of single-cell RNA-seq data.

    Total 183 single cells (92 H1 cells, 91 H9 cells), sequenced twice, were used to evaluate SCnorm in normalizing single cell RNA-seq experiments. Total 48 bulk H1 samples were used to compare bulk and single cell properties. For single-cell RNA-seq, the identical single-cell indexed and fragmented cDNA were pooled at 96 cells per lane or at 24 cells per lane to test the effects of sequencing depth, resulting in approximately 1 million and 4 million mapped reads per cell in the two pooling groups, respectively.

    Inspiration

    Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6

    Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x

  19. f

    Additional file 2: of Comparing the normalization methods for the...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Dec 15, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryu, Keun; Piao, Yongjun; Shon, Ho; Li, Peipei (2016). Additional file 2: of Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001915419
    Explore at:
    Dataset updated
    Dec 15, 2016
    Authors
    Ryu, Keun; Piao, Yongjun; Shon, Ho; Li, Peipei
    Description

    Detailed spearman correlation coefficient results for all normalization methods. (XLSX 17Â kb)

  20. f

    Gene counts normalization of the raw RNAseq data.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jan 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raju, Asha; Abu-Shach, Ulrike Bening; Broday, Limor; Carvalho, Cátia A.; Vershinin, Zlata; Boxem, Mike; Levy, Dan (2025). Gene counts normalization of the raw RNAseq data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001277044
    Explore at:
    Dataset updated
    Jan 6, 2025
    Authors
    Raju, Asha; Abu-Shach, Ulrike Bening; Broday, Limor; Carvalho, Cátia A.; Vershinin, Zlata; Boxem, Mike; Levy, Dan
    Description

    Normalization was done using “DESeq2” R package. The WT samples are LBN207, LBN208, LBN209. The ulp-2(tv380) samples are LBNX39910, LBNX39911, LBNX39912. S2C Fig. (CSV)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Diem-Trang Tran; Aditya Bhaskara; Balagurunathan Kuberan; Matthew Might (2023). A graph-based algorithm for RNA-seq data normalization [Dataset]. http://doi.org/10.1371/journal.pone.0227760

A graph-based algorithm for RNA-seq data normalization

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
pdfAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Diem-Trang Tran; Aditya Bhaskara; Balagurunathan Kuberan; Matthew Might
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The use of RNA-sequencing has garnered much attention in recent years for characterizing and understanding various biological systems. However, it remains a major challenge to gain insights from a large number of RNA-seq experiments collectively, due to the normalization problem. Normalization has been challenging due to an inherent circularity, requiring that RNA-seq data be normalized before any pattern of differential (or non-differential) expression can be ascertained; meanwhile, the prior knowledge of non-differential transcripts is crucial to the normalization process. Some methods have successfully overcome this problem by the assumption that most transcripts are not differentially expressed. However, when RNA-seq profiles become more abundant and heterogeneous, this assumption fails to hold, leading to erroneous normalization. We present a normalization procedure that does not rely on this assumption, nor prior knowledge about the reference transcripts. This algorithm is based on a graph constructed from intrinsic correlations among RNA-seq transcripts and seeks to identify a set of densely connected vertices as references. Application of this algorithm on our synthesized validation data showed that it could recover the reference transcripts with high precision, thus resulting in high-quality normalization. On a realistic data set from the ENCODE project, this algorithm gave good results and could finish in a reasonable time. These preliminary results imply that we may be able to break the long persisting circularity problem in RNA-seq normalization.

Search
Clear search
Close search
Google apps
Main menu