100+ datasets found
  1. f

    Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data...

    • frontiersin.figshare.com
    zip
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zip [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

  2. f

    Table_1_Comparison of Normalization Methods for Analysis of TempO-Seq...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    xlsx
    Updated Jun 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach (2023). Table_1_Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data.XLSX [Dataset]. http://doi.org/10.3389/fgene.2020.00594.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.

  3. f

    File S1 - Normalization of RNA-Sequencing Data from Samples with Varying...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Håvard Aanes; Cecilia Winata; Lars F. Moen; Olga Østrup; Sinnakaruppan Mathavan; Philippe Collas; Torbjørn Rognes; Peter Aleström (2023). File S1 - Normalization of RNA-Sequencing Data from Samples with Varying mRNA Levels [Dataset]. http://doi.org/10.1371/journal.pone.0089158.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Håvard Aanes; Cecilia Winata; Lars F. Moen; Olga Østrup; Sinnakaruppan Mathavan; Philippe Collas; Torbjørn Rognes; Peter Aleström
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Table S1 and Figures S1–S6. Table S1. List of primers. Forward and reverse primers used for qPCR. Figure S1. Changes in total and polyA+ RNA during development. a) Amount of total RNA per embryo at different developmental stages. b) Amount of polyA+ RNA per 100 embryos at different developmental stages. Vertical bars represent standard errors. Figure S2. The TMM scaling factor. a) The TMM scaling factor estimated using dataset 1 and 2. We observe very similar values. b) The TMM scaling factor obtained using the replicates in dataset 2. The TMM values are very reproducible. c) The TMM scale factor when RNA-seq data based on total RNA was used. Figure S3. Comparison of scales. We either square-root transformed or used that scales directly and compared the normalized fold-changes to RT-qPCR results. a) Transcripts with dynamic change pre-ZGA. b) Transcripts with decreased abundance post-ZGA. c) Transcripts with increased expression post-ZGA. Vertical bars represent standard deviations. Figure S4. Comparison of RT-qPCR results depending on RNA template (total or poly+ RNA) and primers (random or oligo(dT) primers) for setd3 (a), gtf2e2 (b) and yy1a (c). The increase pre-ZGA is dependent on template (setd3 and gtf2e2) and not primer type. Figure S5. Efficiency calibrated fold-changes for a subset of transcripts. Vertical bars represent standard deviations. Figure S6. Comparison normalization methods using dataset 2 for transcripts with decreased expression post-ZGA (a) and increased expression post-ZGA (b). Vertical bars represent standard deviations. (PDF)

  4. N

    Single cell RNA-seq data of human hESCs to evaluate SCnorm: robust...

    • data.niaid.nih.gov
    Updated May 15, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bacher R; Chu L; Kendziorski C; Swanson S (2019). Single cell RNA-seq data of human hESCs to evaluate SCnorm: robust normalization of single-cell rna-seq data [Dataset]. https://data.niaid.nih.gov/resources?id=gse85917
    Explore at:
    Dataset updated
    May 15, 2019
    Dataset provided by
    University of Florida
    Authors
    Bacher R; Chu L; Kendziorski C; Swanson S
    Description

    Normalization of RNA-sequencing data is essential for accurate downstream inference, but the assumptions upon which most methods are based do not hold in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of scRNA-seq data. Total 183 single cells (92 H1 cells, 91 H9 cells), sequenced twice, were used to evaluate SCnorm in normalizing single cell RNA-seq experiments. Total 48 bulk H1 samples were used to compare bulk and single cell properties. For single-cell RNA-seq, the identical single-cell indexed and fragmented cDNA were pooled at 96 cells per lane or at 24 cells per lane to test the effects of sequencing depth, resulting in approximately 1 million and 4 million mapped reads per cell in the two pooling groups, respectively.

  5. f

    Additional file 2: of Comparing the normalization methods for the...

    • datasetcatalog.nlm.nih.gov
    • springernature.figshare.com
    Updated Dec 15, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryu, Keun; Piao, Yongjun; Shon, Ho; Li, Peipei (2016). Additional file 2: of Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001915419
    Explore at:
    Dataset updated
    Dec 15, 2016
    Authors
    Ryu, Keun; Piao, Yongjun; Shon, Ho; Li, Peipei
    Description

    Detailed spearman correlation coefficient results for all normalization methods. (XLSX 17Â kb)

  6. Z

    Dataset for: A graph-based algorithm for RNA-seq data normalization

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diem-Trang Tran (2020). Dataset for: A graph-based algorithm for RNA-seq data normalization [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_2667313
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Diem-Trang Tran
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    mRNA-seq assays on mouse tissues were downloaded from the ENCODE project and consolidated into matrices of expression

  7. Processed data - DegNorm: Normalization of generalized transcript...

    • zenodo.org
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bin Xiong; Yiben Yang; Frank R Fineis; Ji-Ping Wang; Bin Xiong; Yiben Yang; Frank R Fineis; Ji-Ping Wang (2020). Processed data - DegNorm: Normalization of generalized transcript degradation improves accuracy in RNA-seq analysis [Dataset]. http://doi.org/10.5281/zenodo.2595303
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bin Xiong; Yiben Yang; Frank R Fineis; Ji-Ping Wang; Bin Xiong; Yiben Yang; Frank R Fineis; Ji-Ping Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Processed data from DegNorm:

    • "_raw.txt": raw read counts matrix;
    • "_DI.txt": Degradation index score matrix;
    • "_DegNorm.txt": normalized read counts matrix from DegNorm output;
    • "_coverage.Rdata": list of coverage matrix for the sample;
    • "_countsTIN.txt": TIN normalized counts.

  8. f

    Summary of current normalization methods to correct the technical biases for...

    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farnoosh Abbas-Aghababazadeh; Qian Li; Brooke L. Fridley (2023). Summary of current normalization methods to correct the technical biases for RNA-Seq data. [Dataset]. http://doi.org/10.1371/journal.pone.0206312.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Farnoosh Abbas-Aghababazadeh; Qian Li; Brooke L. Fridley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary of current normalization methods to correct the technical biases for RNA-Seq data.

  9. f

    Gene counts normalization of the raw RNAseq data.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jan 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raju, Asha; Abu-Shach, Ulrike Bening; Broday, Limor; Carvalho, Cátia A.; Vershinin, Zlata; Boxem, Mike; Levy, Dan (2025). Gene counts normalization of the raw RNAseq data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001277044
    Explore at:
    Dataset updated
    Jan 6, 2025
    Authors
    Raju, Asha; Abu-Shach, Ulrike Bening; Broday, Limor; Carvalho, Cátia A.; Vershinin, Zlata; Boxem, Mike; Levy, Dan
    Description

    Normalization was done using “DESeq2” R package. The WT samples are LBN207, LBN208, LBN209. The ulp-2(tv380) samples are LBNX39910, LBNX39911, LBNX39912. S2C Fig. (CSV)

  10. f

    Table_2_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...

    • figshare.com
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Lytal; Di Ran; Lingling An (2023). Table_2_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00041.s003
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers
    Authors
    Nicholas Lytal; Di Ran; Lingling An
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.

  11. Ngs-Based Rna-Seq Market Analysis North America, Europe, Asia, Rest of World...

    • technavio.com
    pdf
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). Ngs-Based Rna-Seq Market Analysis North America, Europe, Asia, Rest of World (ROW) - US, UK, Germany, Singapore, China - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/ngs-based-rna-seq-market-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2024 - 2028
    Area covered
    United Kingdom, United States
    Description

    Snapshot img

    NGS-Based Rna-Seq Market Size 2024-2028

    The NGS-based RNA-seq market size is forecast to increase by USD 6.66 billion, at a CAGR of 20.52% between 2023 and 2028.

    The market is witnessing significant growth, driven by the increased adoption of next-generation sequencing (NGS) methods for RNA-Seq analysis. The advanced capabilities of NGS techniques, such as high-throughput, cost-effectiveness, and improved accuracy, have made them the preferred choice for researchers and clinicians in various fields, including genomics, transcriptomics, and personalized medicine. However, the market faces challenges, primarily from the lack of clinical validation on direct-to-consumer genetic tests. As the use of NGS technology in consumer applications expands, ensuring the accuracy and reliability of results becomes crucial.
    The absence of standardized protocols and regulatory oversight in this area poses a significant challenge to market growth and trust. Companies seeking to capitalize on market opportunities must focus on addressing these challenges through collaborations, partnerships, and investments in research and development to ensure the clinical validity and reliability of their NGS-based RNA-Seq offerings.
    

    What will be the Size of the NGS-based RNA-Seq market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
    Request Free Sample

    The market continues to evolve, driven by advancements in NGS technology and its applications across various sectors. Spatial transcriptomics, a novel approach to studying gene expression in its spatial context, is gaining traction in disease research and precision medicine. Splice junction detection, a critical component of RNA-seq data analysis, enhances the accuracy of gene expression profiling and differential gene expression studies. Cloud computing plays a pivotal role in handling the massive amounts of data generated by NGS platforms, enabling real-time data analysis and storage. Enrichment analysis, gene ontology, and pathway analysis facilitate the interpretation of RNA-seq data, while data normalization and quality control ensure the reliability of results.

    Precision medicine and personalized therapy are key applications of RNA-seq, with single-cell RNA-seq offering unprecedented insights into the complexities of gene expression at the single-cell level. Read alignment and variant calling are essential steps in RNA-seq data analysis, while bioinformatics pipelines and RNA-seq software streamline the process. NGS technology is revolutionizing drug discovery by enabling the identification of biomarkers and gene fusion detection in various diseases, including cancer and neurological disorders. RNA-seq is also finding applications in infectious diseases, microbiome analysis, environmental monitoring, agricultural genomics, and forensic science. Sequencing costs are decreasing, making RNA-seq more accessible to researchers and clinicians.

    The ongoing development of sequencing platforms, library preparation, and sample preparation kits continues to drive innovation in the field. The dynamic nature of the market ensures that it remains a vibrant and evolving field, with ongoing research and development in areas such as data visualization, clinical trials, and sequencing depth.

    How is this NGS-based RNA-Seq industry segmented?

    The NGS-based RNA-seq industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    End-user
    
      Acamedic and research centers
      Clinical research
      Pharma companies
      Hospitals
    
    
    Technology
    
      Sequencing by synthesis
      Ion semiconductor sequencing
      Single-molecule real-time sequencing
      Others
    
    
    Geography
    
      North America
    
        US
    
    
      Europe
    
        Germany
        UK
    
    
      APAC
    
        China
        Singapore
    
    
      Rest of World (ROW)
    

    .

    By End-user Insights

    The acamedic and research centers segment is estimated to witness significant growth during the forecast period.

    The global next-generation sequencing (NGS) market for RNA sequencing (RNA-Seq) is primarily driven by academic and research institutions, including those from universities, research institutes, government entities, biotechnology organizations, and pharmaceutical companies. These institutions utilize NGS technology for various research applications, such as whole-genome sequencing, epigenetics, and emerging fields like agrigenomics and animal research, to enhance crop yield and nutritional composition. NGS-based RNA-Seq plays a pivotal role in translational research, with significant investments from both private and public organizations fueling its growth. The technology is instrumental in disease research, enabling the identification of nov

  12. f

    Table_5_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...

    • frontiersin.figshare.com
    docx
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Lytal; Di Ran; Lingling An (2023). Table_5_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00041.s006
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Nicholas Lytal; Di Ran; Lingling An
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.

  13. In silico Gene Perturbation Results from GeneRAIN Models

    • zenodo.org
    application/gzip
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zheng Su; Zheng Su (2025). In silico Gene Perturbation Results from GeneRAIN Models [Dataset]. http://doi.org/10.5281/zenodo.15354184
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 20, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zheng Su; Zheng Su
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the processed results of in silico gene perturbation experiments conducted using various GeneRAIN models. GeneRAIN is a suite of Transformer-based models developed for learning gene expression relationships from large-scale bulk RNA-seq data. These experiments were designed to evaluate the ability of different GeneRAIN model architectures and normalization strategies to simulate transcriptomic responses to genetic perturbations.

    The dataset comprises six gzipped CSV files, each representing the results from a specific GeneRAIN model and normalization method combination:

    • GeneRAIN.GPT_Binning_by_genes.perturb_gene_level_details.csv.gz: Results from the GeneRAIN GPT model using the Binning-By-Gene normalization method.
    • GeneRAIN.GPT_Z-Score.perturb_gene_level_details.csv.gz: Results from the GeneRAIN GPT model using the Z-Score normalization method.
    • GeneRAIN.Pred_expr_Binning_by_genes.perturb_gene_level_details.csv.gz: Results from the GeneRAIN BERT-Pred-Expr model using the Binning-By-Gene normalization method.
    • GeneRAIN.Pred_expr_Z-Score.perturb_gene_level_details.csv.gz: Results from the GeneRAIN BERT-Pred-Expr model using the Z-Score normalization method.
    • GeneRAIN.Pred_genes_Binning_by_genes.perturb_gene_level_details.csv.gz: Results from the GeneRAIN BERT-Pred-Genes model using the Binning-By-Gene normalization method.
    • GeneRAIN.Pred_genes_Z-Score.perturb_gene_level_details.csv.gz: Results from the GeneRAIN BERT-Pred-Genes model using the Z-Score normalization method.

    Each CSV file contains one row for each gene within each processed sample used in the in silico perturbation analysis. The columns provide detailed information about the sample, the gene, its expression state, the applied perturbation, and the resulting gene embeddings from the model:

    • Batch_Index: The index of the batch the sample belonged to.
    • Sample_Index_in_Batch: The index of the sample within its batch.
    • Dataset_Label: The label of the dataset partition (e.g., 'K562_essential').
    • Gene_Pos_In_Input: The position of the gene in the input sequence fed to the model (0-based index), typically based on expression ranking.
    • Gene_ID_Index: The numerical index representing the specific gene in the gene embedding space.
    • Gene_Symbol: The gene symbol corresponding to the Gene_ID_Index.
    • Input_Binned_Expr: The binned expression value of this gene in the baseline input fed to the model (relevant for binning-based models).
    • Output_Binned_Expr_True: The true binned expression value of this gene after perturbation, as provided by the input dataset (not predicted by the model).
    • Perturbed_Gene_ID: The Gene_ID_Index of the gene whose expression was artificially altered in the in silico perturbation for this specific sample. This value is the same for all rows corresponding to the same sample.
    • Is_Perturbed_Input_Gene: A boolean (True/False) indicating if this specific gene (Gene_ID_Index in this row) is the one that was perturbed in silico for this sample.
    • Gene_Emb_Baseline: A comma-separated string representing the embedding vector of this gene derived from the baseline (unperturbed) input.
    • Gene_Emb_Perturbed: A comma-separated string representing the embedding vector of this gene derived from the perturbed input.
    • Gene_ID_Perturbed_Input (Optional): If the model is 'GPT' or 'Bert_pred_tokens', this column shows the Gene_ID_Index present at this Gene_Pos_In_Input in the perturbed input sequence (which might differ from the baseline input sequence).

    These data can be used to analyze and compare the effects of in silico gene perturbations on gene representations across different GeneRAIN model configurations and normalization methods, supporting research into how these models capture and simulate biological responses.

    Notes on usage:

    • The embedding vectors (Gene_Emb_Baseline and Gene_Emb_Perturbed) are stored as comma-separated strings and need to be converted to numerical arrays for analysis.
    • Github repo: https://github.com/suzheng/GeneRAIN
  14. Datasets, reproducible codes, and results for evaluating differential...

    • zenodo.org
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boris P Hejblum; Boris P Hejblum; Kalidou Ba; Rodolphe Thiébaut; Denis Agniel; Kalidou Ba; Rodolphe Thiébaut; Denis Agniel (2023). Datasets, reproducible codes, and results for evaluating differential expression analysis methods on population-level RNA-seq data [Dataset]. http://doi.org/10.5281/zenodo.6554347
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Boris P Hejblum; Boris P Hejblum; Kalidou Ba; Rodolphe Thiébaut; Denis Agniel; Kalidou Ba; Rodolphe Thiébaut; Denis Agniel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This upload contains the necessary R codes and data to reproduce the FDR and Power results described in our correspondence "Neglecting normalization impact in semi-synthetic RNA-seq data simulation generates artificial false positives" to Li Y, Ge X, Peng F, Li W, Li JJ, Exaggerated false positives by popular differential expression methods when analyzing human population samples, Genome Biology 23, 79, 2022, DOI: 10.1186/s13059-022-02648-4.

  15. d

    Extended data tables to Haering and Habermann, F1000Res, RNfuzzyApp: an R...

    • search.dataone.org
    • datadryad.org
    Updated May 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bianca Habermann; Margaux Haering (2025). Extended data tables to Haering and Habermann, F1000Res, RNfuzzyApp: an R shiny RNA-seq data analysis app for visualisation, differential expression analysis, time-series clustering and enrichment analysis [Dataset]. http://doi.org/10.5061/dryad.8pk0p2nnd
    Explore at:
    Dataset updated
    May 18, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Bianca Habermann; Margaux Haering
    Time period covered
    Jul 8, 2021
    Description

    BackgroundÂ

    RNA-seq is a widely adopted affordable method for large scale gene expression profiling. However, user-friendly and versatile tools for wet-lab biologists to analyse RNA-seq data beyond standard analyses such as differential expression, are rare. Especially, the analysis of time-series data is difficult for wet-lab biologists lacking advanced computational training. Furthermore, most meta-analysis tools are tailored for model organisms and not easily adaptable to other species.

    Results

    With RNfuzzyApp, we provide a user-friendly, web-based R-shiny app for differential expression analysis, as well as time-series analysis of RNA-seq data. RNfuzzyApp offers several methods for normalization and differential expression analysis of RNA-seq data, providing easy-to-use toolboxes, interactive plots and downloadable results. For time-series analysis, RNfuzzyApp presents the first web-based, automated pipeline for soft clustering with the Mfuzz R package, including methods to...

  16. f

    Data from: Filtration and Normalization of Sequencing Read Data in...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Oct 20, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Losada, Patricia Moran; Chouvarine, Philippe; DeLuca, David S.; Wiehlmann, Lutz; Tümmler, Burkhard (2016). Filtration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001539099
    Explore at:
    Dataset updated
    Oct 20, 2016
    Authors
    Losada, Patricia Moran; Chouvarine, Philippe; DeLuca, David S.; Wiehlmann, Lutz; Tümmler, Burkhard
    Description

    Ever-increasing affordability of next-generation sequencing makes whole-metagenome sequencing an attractive alternative to traditional 16S rDNA, RFLP, or culturing approaches for the analysis of microbiome samples. The advantage of whole-metagenome sequencing is that it allows direct inference of the metabolic capacity and physiological features of the studied metagenome without reliance on the knowledge of genotypes and phenotypes of the members of the bacterial community. It also makes it possible to overcome problems of 16S rDNA sequencing, such as unknown copy number of the 16S gene and lack of sufficient sequence similarity of the “universal” 16S primers to some of the target 16S genes. On the other hand, next-generation sequencing suffers from biases resulting in non-uniform coverage of the sequenced genomes. To overcome this difficulty, we present a model of GC-bias in sequencing metagenomic samples as well as filtration and normalization techniques necessary for accurate quantification of microbial organisms. While there has been substantial research in normalization and filtration of read-count data in such techniques as RNA-seq or Chip-seq, to our knowledge, this has not been the case for the field of whole-metagenome shotgun sequencing. The presented methods assume that complete genome references are available for most microorganisms of interest present in metagenomic samples. This is often a valid assumption in such fields as medical diagnostics of patient microbiota. Testing the model on two validation datasets showed four-fold reduction in root-mean-square error compared to non-normalized data in both cases. The presented methods can be applied to any pipeline for whole metagenome sequencing analysis relying on complete microbial genome references. We demonstrate that such pre-processing reduces the number of false positive hits and increases accuracy of abundance estimates.

  17. Processed datasets and codes for differential expression analysis on...

    • zenodo.org
    txt, zip
    Updated Dec 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yumei Li; Yumei Li; Xinzhou Ge; Xinzhou Ge (2024). Processed datasets and codes for differential expression analysis on polulation-level RNA-seq data [Dataset]. http://doi.org/10.5281/zenodo.14550340
    Explore at:
    txt, zipAvailable download formats
    Dataset updated
    Dec 24, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yumei Li; Yumei Li; Xinzhou Ge; Xinzhou Ge
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 24, 2024
    Description

    This version includes codes and data necessary to reproduce all results in our response to the correspondences ("Response to 'Neglecting normalization impact in semi‑synthetic RNA‑seq data simulation generates artificial false positives' and 'Winsorization greatly reduces false positives by popular differential expression methods when analyzing human population samples'") (https://doi.org/10.1186/s13059-024-03232-8).

    It also includes a README file to guide the reproduction of the results in our original publication and resources for the goodness of fit test in the original publication, "Exaggerated False Positives by Popular Differential Expression Methods When Analyzing Human Population Samples" (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02648-4).

  18. f

    Fraction of shared cis-eQTLs at 10% FDR between pairs of various versions of...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Jul 18, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Battle, Alexis; Koller, Daphne; Levinson, Douglas; Urban, Alexander E.; Mostafavi, Sara; Montgomery, Stephen B.; Zhu, Xiaowei (2013). Fraction of shared cis-eQTLs at 10% FDR between pairs of various versions of normalized Pickrell RNA-seq data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001714965
    Explore at:
    Dataset updated
    Jul 18, 2013
    Authors
    Battle, Alexis; Koller, Daphne; Levinson, Douglas; Urban, Alexander E.; Mostafavi, Sara; Montgomery, Stephen B.; Zhu, Xiaowei
    Description

    Each row depicts the fraction of cis-eQTLs discovered using a particular normalization method that are also discovered in another normalized version. For example, the element at row i and column j (the element) depicts the proportion of cis-eQTLs discovered using normalization method i that are also discovered in normalization method j.

  19. f

    Table_4_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...

    • frontiersin.figshare.com
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Lytal; Di Ran; Lingling An (2023). Table_4_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00041.s005
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Nicholas Lytal; Di Ran; Lingling An
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.

  20. m

    Data from: Dataset from transcriptome profiling of Musa resistant and...

    • data.mendeley.com
    Updated Nov 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ANURADHA CHELLIAH (2023). Dataset from transcriptome profiling of Musa resistant and susceptible cultivars in response to Fusarium oxysporum f.sp. cubense race1 and TR4 challenges using Illumina NovaSeq [Dataset]. http://doi.org/10.17632/g3nwfxy7tx.1
    Explore at:
    Dataset updated
    Nov 5, 2023
    Authors
    ANURADHA CHELLIAH
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplementary data 4a: RNA-seq raw read counts to genes in the Foc race1 challenged corm sample (before normalization). Supplementary data 4b: RNA-seq raw read counts to genes in the Foc race1 challenged root sample (before normalization). Supplementary data 4c: RNA-seq raw read counts to genes in the Foc TR4 challenged corm sample (before normalization). Supplementary data 4d: RNA-seq raw read counts to genes in the Foc TR4 challenged root sample (before normalization)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zip [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s002

Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zip

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

Search
Clear search
Close search
Google apps
Main menu