37 datasets found
  1. f

    Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data...

    • frontiersin.figshare.com
    application/cdfv2
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s001
    Explore at:
    application/cdfv2Available download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

  2. d

    Methods for normalizing microbiome data: an ecological perspective

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Oct 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger (2018). Methods for normalizing microbiome data: an ecological perspective [Dataset]. http://doi.org/10.5061/dryad.tn8qs35
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 30, 2018
    Dataset provided by
    Dryad
    Authors
    Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger
    Time period covered
    2018
    Description

    Simulation script 1This R script will simulate two populations of microbiome samples and compare normalization methods.Simulation script 2This R script will simulate two populations of microbiome samples and compare normalization methods via PcOAs.Sample.OTU.distributionOTU distribution used in the paper: Methods for normalizing microbiome data: an ecological perspective

  3. Data from: A systematic evaluation of normalization methods and probe...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra (2023). A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data [Dataset]. http://doi.org/10.5061/dryad.cnp5hqc7v
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Universidade de São Paulo
    University of Toronto
    Hospital for Sick Children
    Authors
    H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Background The Infinium EPIC array measures the methylation status of > 850,000 CpG sites. The EPIC BeadChip uses a two-array design: Infinium Type I and Type II probes. These probe types exhibit different technical characteristics which may confound analyses. Numerous normalization and pre-processing methods have been developed to reduce probe type bias as well as other issues such as background and dye bias.
    Methods This study evaluates the performance of various normalization methods using 16 replicated samples and three metrics: absolute beta-value difference, overlap of non-replicated CpGs between replicate pairs, and effect on beta-value distributions. Additionally, we carried out Pearson’s correlation and intraclass correlation coefficient (ICC) analyses using both raw and SeSAMe 2 normalized data.
    Results The method we define as SeSAMe 2, which consists of the application of the regular SeSAMe pipeline with an additional round of QC, pOOBAH masking, was found to be the best-performing normalization method, while quantile-based methods were found to be the worst performing methods. Whole-array Pearson’s correlations were found to be high. However, in agreement with previous studies, a substantial proportion of the probes on the EPIC array showed poor reproducibility (ICC < 0.50). The majority of poor-performing probes have beta values close to either 0 or 1, and relatively low standard deviations. These results suggest that probe reliability is largely the result of limited biological variation rather than technical measurement variation. Importantly, normalizing the data with SeSAMe 2 dramatically improved ICC estimates, with the proportion of probes with ICC values > 0.50 increasing from 45.18% (raw data) to 61.35% (SeSAMe 2). Methods

    Study Participants and Samples

    The whole blood samples were obtained from the Health, Well-being and Aging (Saúde, Ben-estar e Envelhecimento, SABE) study cohort. SABE is a cohort of census-withdrawn elderly from the city of São Paulo, Brazil, followed up every five years since the year 2000, with DNA first collected in 2010. Samples from 24 elderly adults were collected at two time points for a total of 48 samples. The first time point is the 2010 collection wave, performed from 2010 to 2012, and the second time point was set in 2020 in a COVID-19 monitoring project (9±0.71 years apart). The 24 individuals were 67.41±5.52 years of age (mean ± standard deviation) at time point one; and 76.41±6.17 at time point two and comprised 13 men and 11 women.

    All individuals enrolled in the SABE cohort provided written consent, and the ethic protocols were approved by local and national institutional review boards COEP/FSP/USP OF.COEP/23/10, CONEP 2044/2014, CEP HIAE 1263-10, University of Toronto RIS 39685.

    Blood Collection and Processing

    Genomic DNA was extracted from whole peripheral blood samples collected in EDTA tubes. DNA extraction and purification followed manufacturer’s recommended protocols, using Qiagen AutoPure LS kit with Gentra automated extraction (first time point) or manual extraction (second time point), due to discontinuation of the equipment but using the same commercial reagents. DNA was quantified using Nanodrop spectrometer and diluted to 50ng/uL. To assess the reproducibility of the EPIC array, we also obtained technical replicates for 16 out of the 48 samples, for a total of 64 samples submitted for further analyses. Whole Genome Sequencing data is also available for the samples described above.

    Characterization of DNA Methylation using the EPIC array

    Approximately 1,000ng of human genomic DNA was used for bisulphite conversion. Methylation status was evaluated using the MethylationEPIC array at The Centre for Applied Genomics (TCAG, Hospital for Sick Children, Toronto, Ontario, Canada), following protocols recommended by Illumina (San Diego, California, USA).

    Processing and Analysis of DNA Methylation Data

    The R/Bioconductor packages Meffil (version 1.1.0), RnBeads (version 2.6.0), minfi (version 1.34.0) and wateRmelon (version 1.32.0) were used to import, process and perform quality control (QC) analyses on the methylation data. Starting with the 64 samples, we first used Meffil to infer the sex of the 64 samples and compared the inferred sex to reported sex. Utilizing the 59 SNP probes that are available as part of the EPIC array, we calculated concordance between the methylation intensities of the samples and the corresponding genotype calls extracted from their WGS data. We then performed comprehensive sample-level and probe-level QC using the RnBeads QC pipeline. Specifically, we (1) removed probes if their target sequences overlap with a SNP at any base, (2) removed known cross-reactive probes (3) used the iterative Greedycut algorithm to filter out samples and probes, using a detection p-value threshold of 0.01 and (4) removed probes if more than 5% of the samples having a missing value. Since RnBeads does not have a function to perform probe filtering based on bead number, we used the wateRmelon package to extract bead numbers from the IDAT files and calculated the proportion of samples with bead number < 3. Probes with more than 5% of samples having low bead number (< 3) were removed. For the comparison of normalization methods, we also computed detection p-values using out-of-band probes empirical distribution with the pOOBAH() function in the SeSAMe (version 1.14.2) R package, with a p-value threshold of 0.05, and the combine.neg parameter set to TRUE. In the scenario where pOOBAH filtering was carried out, it was done in parallel with the previously mentioned QC steps, and the resulting probes flagged in both analyses were combined and removed from the data.

    Normalization Methods Evaluated

    The normalization methods compared in this study were implemented using different R/Bioconductor packages and are summarized in Figure 1. All data was read into R workspace as RG Channel Sets using minfi’s read.metharray.exp() function. One sample that was flagged during QC was removed, and further normalization steps were carried out in the remaining set of 63 samples. Prior to all normalizations with minfi, probes that did not pass QC were removed. Noob, SWAN, Quantile, Funnorm and Illumina normalizations were implemented using minfi. BMIQ normalization was implemented with ChAMP (version 2.26.0), using as input Raw data produced by minfi’s preprocessRaw() function. In the combination of Noob with BMIQ (Noob+BMIQ), BMIQ normalization was carried out using as input minfi’s Noob normalized data. Noob normalization was also implemented with SeSAMe, using a nonlinear dye bias correction. For SeSAMe normalization, two scenarios were tested. For both, the inputs were unmasked SigDF Sets converted from minfi’s RG Channel Sets. In the first, which we call “SeSAMe 1”, SeSAMe’s pOOBAH masking was not executed, and the only probes filtered out of the dataset prior to normalization were the ones that did not pass QC in the previous analyses. In the second scenario, which we call “SeSAMe 2”, pOOBAH masking was carried out in the unfiltered dataset, and masked probes were removed. This removal was followed by further removal of probes that did not pass previous QC, and that had not been removed by pOOBAH. Therefore, SeSAMe 2 has two rounds of probe removal. Noob normalization with nonlinear dye bias correction was then carried out in the filtered dataset. Methods were then compared by subsetting the 16 replicated samples and evaluating the effects that the different normalization methods had in the absolute difference of beta values (|β|) between replicated samples.

  4. Additional file 3: of DBNorm: normalizing high-density oligonucleotide...

    • figshare.com
    • springernature.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qinxue Meng; Daniel Catchpoole; David Skillicorn; Paul Kennedy (2023). Additional file 3: of DBNorm: normalizing high-density oligonucleotide microarray data based on distributions [Dataset]. http://doi.org/10.6084/m9.figshare.5648932.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Qinxue Meng; Daniel Catchpoole; David Skillicorn; Paul Kennedy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DBNorm test script. Code of how we test DBNorm package. (TXT 2Â kb)

  5. d

    GC/MS Simulated Data Sets normalized using quantile normalization

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scholtens, Denise (2023). GC/MS Simulated Data Sets normalized using quantile normalization [Dataset]. https://search.dataone.org/view/sha256%3Ac3b94a68005c6bac4212457d403eedc6d12c76d960c0b0d171bd8ec5386d9cd5
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Scholtens, Denise
    Description

    1000 simulated data sets stored in a list of R dataframes used in support of Reisetter et al. (submitted) 'Mixture model normalization for non-targeted gas chromatography / mass spectrometry metabolomics data'. These are results after normalization using quantile normalization (Bolstad et al. 2003).

  6. d

    GC/MS Simulated Data Sets normalized using mean centering

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scholtens, Denise (2023). GC/MS Simulated Data Sets normalized using mean centering [Dataset]. https://search.dataone.org/view/sha256%3Ac59278f5e4c45949d1f9abc0429d4909b19ec1fb8d8bbb4d5244bd9163701dfa
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Scholtens, Denise
    Description

    1000 simulated data sets stored in a list of R dataframes used in support of Reisetter et al. (submitted) 'Mixture model normalization for non-targeted gas chromatography / mass spectrometry metabolomics data'. These are results after normalization using mean centering as described in Reisetter et al.

  7. c

    Data from: Normalized Difference Vegetation Index for Fanno Creek, Oregon

    • s.cnmilf.com
    • search.dataone.org
    • +1more
    Updated Nov 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Normalized Difference Vegetation Index for Fanno Creek, Oregon [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/normalized-difference-vegetation-index-for-fanno-creek-oregon
    Explore at:
    Dataset updated
    Nov 1, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Oregon, Fanno Creek
    Description

    Fanno Creek is a tributary to the Tualatin River and flows though parts of the southwest Portland metropolitan area. The stream is heavily influenced by urban runoff and shows characteristic flashy streamflow and poor water quality commonly associated with urban streams. This data set represents the Normalized Difference Vegetation Index (NDVI), or "greenness" of the Fanno Creek floodplain study area. Aerial photography was used to isolate areas of vegetation based on comparing different bandwidths within the imagery. In this case, the NDVI is calculated as the quotient of the near infrared band minus the red band divided by the near infared plus the red band. NDVI = (NIR - R)/(NIR + R).

  8. Data from: Workflow for Evaluating Normalization Tools for Omics Data Using...

    • acs.figshare.com
    txt
    Updated Oct 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleesa E. Chua; Leah D. Pfeifer; Emily R. Sekera; Amanda B. Hummon; Heather Desaire (2023). Workflow for Evaluating Normalization Tools for Omics Data Using Supervised and Unsupervised Machine Learning [Dataset]. http://doi.org/10.1021/jasms.3c00295.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 28, 2023
    Dataset provided by
    ACS Publications
    Authors
    Aleesa E. Chua; Leah D. Pfeifer; Emily R. Sekera; Amanda B. Hummon; Heather Desaire
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    To achieve high quality omics results, systematic variability in mass spectrometry (MS) data must be adequately addressed. Effective data normalization is essential for minimizing this variability. The abundance of approaches and the data-dependent nature of normalization have led some researchers to develop open-source academic software for choosing the best approach. While these tools are certainly beneficial to the community, none of them meet all of the needs of all users, particularly users who want to test new strategies that are not available in these products. Herein, we present a simple and straightforward workflow that facilitates the identification of optimal normalization strategies using straightforward evaluation metrics, employing both supervised and unsupervised machine learning. The workflow offers a “DIY” aspect, where the performance of any normalization strategy can be evaluated for any type of MS data. As a demonstration of its utility, we apply this workflow on two distinct datasets, an ESI-MS dataset of extracted lipids from latent fingerprints and a cancer spheroid dataset of metabolites ionized by MALDI-MSI, for which we identified the best-performing normalization strategies.

  9. f

    Mean and standard deviation (SD) of NEO-PI-R normalized scores in the...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tetyana Zayats; Bao-Zhu Yang; Pingxing Xie; James Poling; Lindsay A. Farrer; Joel Gelernter (2023). Mean and standard deviation (SD) of NEO-PI-R normalized scores in the participants of the study. [Dataset]. http://doi.org/10.1371/journal.pone.0049368.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Tetyana Zayats; Bao-Zhu Yang; Pingxing Xie; James Poling; Lindsay A. Farrer; Joel Gelernter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CD = cocaine dependence, ND = nicotine dependence, CIP = cocaine induced paranoia.

  10. h

    dagw-word-frequencies-normalized-by-domain

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dagw-word-frequencies-normalized-by-domain [Dataset]. https://huggingface.co/datasets/chcaa/dagw-word-frequencies-normalized-by-domain
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Center for Humanities Computing Aarhus
    Description

    Dataset Card for DAGW Word Frequencies (normalized)

    Paper: Derczynski, L., Ciosici, M. R., Baglini, R., Christiansen, M. H., Dalsgaard, J. A., Fusaroli, R., ... & Varab, D. (2021). The Danish Gigaword Corpus. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa) (pp. 413-421). Point of Contact: Kenneth Enevoldsen (Kennethcenevoldsen (at) gmail (dot) com )

    This is a list of word frequencies derived from the Danish Gigaword (collected before… See the full description on the dataset page: https://huggingface.co/datasets/chcaa/dagw-word-frequencies-normalized-by-domain.

  11. e

    Supplementary file including normalized data sets to reproduce the analyses...

    • data.europa.eu
    • gimi9.com
    • +1more
    zip
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Universitätsbibliothek der Ludwig-Maximilians-Universität München (2024). Supplementary file including normalized data sets to reproduce the analyses presented in the paper "Use of pre-transformation to cope with extreme values in important candidate features" by Boulesteix, Guillemot & Sauerbrei (Biometrical Journal, 2011) [Dataset]. https://data.europa.eu/data/datasets/https-open-bydata-de-api-hub-repo-datasets-https-data-ub-uni-muenchen-de-39-dataset?locale=nl
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset authored and provided by
    Universitätsbibliothek der Ludwig-Maximilians-Universität München
    License

    http://dcat-ap.de/def/licenses/odcpddlhttp://dcat-ap.de/def/licenses/odcpddl

    Description

    The zip-file contains supplementary files (normalized data sets and R-codes) to reproduce the analyses presented in the paper "Use of pre-transformation to cope with extreme values in important candidate features" by Boulesteix, Guillemot & Sauerbrei (Biometrical Journal, 2011). The raw data (CEL-files) are publicly available and described in the following papers: - Ancona et al, 2006. On the statistical assessment of classifiers using DNA microarray data. BMC Bioinformatics 7, 387. - Miller et al, 2005. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proceedings of the National Academy of Science 102, 13550–13555. - Minn et al, 2005. Genes that mediate breast cancer metastasis to lung. Nature 436, 518–524. - Pawitan et al, 2005. Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Research 7, R953–964. - Scherzer et al, 2007. Molecular markers of early parkinsons disease based on gene expression in blood. Proceedings of the National Academy of Science 104, 955-960. - Singh et al, 2002. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209. - Sotiriou et al, 2006. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. Journal of the National Cancer Institute 98, 262–272. - Tang et al, 2009. Gene-expression profiling of peripheral blood mononuclear cells in sepsis. Critical Care Medicine 37, 882–888. - Wang et al, 2005. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365, 671–679. - Irizarry, 2003. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 31 (4), e15. - Irizarry et al, 2006. Comparison of Affymetrix GeneChip expression measures. Bioinformatics 22 (7), 789–794.

  12. Z

    H.sapien Genelab OSD Normalized RNA Seq Matrix

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Somsanith, June (2022). H.sapien Genelab OSD Normalized RNA Seq Matrix [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7443811
    Explore at:
    Dataset updated
    Dec 16, 2022
    Dataset provided by
    Somsanith, June
    Barker, Richard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    H.sapien normalized counts RNA seq data matrix from NASA Genelab's open science data repository. Created using R.

  13. e

    A cross species and multi-omics (including metabolomics) analysis in...

    • ebi.ac.uk
    Updated Oct 29, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anguraj Sadanandam; A Sadanandam; C Grotzinger; B Wiedenmann; D Hanahan (2015). A cross species and multi-omics (including metabolomics) analysis in pancreatic neuroendocrine tumours (miRNA) [Dataset]. https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-73367
    Explore at:
    Dataset updated
    Oct 29, 2015
    Authors
    Anguraj Sadanandam; A Sadanandam; C Grotzinger; B Wiedenmann; D Hanahan
    Description

    Pancreatic neuroendocrine tumor (PanNET) is relatively infrequent but is nevertheless metastatic. Seeking to extend a new paradigm of personalized medicine, we performed an integrative analysis of transcriptomic (mRNA and microRNA) and mutational profiles and defined three clinically relevant human PanNET subtypes. Importantly, cross-species analysis revealed two of these three subtypes in a well-characterized, genetically engineered mouse model (RIP1-Tag2) of PanNET and its cell lines. Each subtype share similarities to distinct cell types in pancreatic neuroendocrine development, features are reflected in their metabolic profiles. Subtype-specific molecular signatures metabolites are proposed to identify these subtypes. Total RNA was extracted from fresh frozen archival patient PanNET samples and hybridized on Agilent microRNA arrays. All normalization methods were performed on the Total Gene Signal from Agilent "GeneView" data files in R, an open source statistical scripting language (http://www.r-project.org). Except for VSN, data were log2 transformed after adding a small constant such that the smallest value of the data set was 1 before taking the log. Scaling normalization was performed by dividing each array by its mean signal intensity and then by rescaling to the global mean intensity of all arrays. Quantile normalization was performed using the "normalize.quantiles" function from R package "affy" from the Bioconductor project (http://www.bioconductor.org).

  14. f

    Table_2_Comparison of Normalization Methods for Analysis of TempO-Seq...

    • frontiersin.figshare.com
    • figshare.com
    xlsx
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach (2023). Table_2_Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data.xlsx [Dataset]. http://doi.org/10.3389/fgene.2020.00594.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.

  15. Z

    D.melanogaster Genelab OSD Normalized RNA Seq Matrix

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Barker, Richard (2022). D.melanogaster Genelab OSD Normalized RNA Seq Matrix [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7443911
    Explore at:
    Dataset updated
    Dec 16, 2022
    Dataset provided by
    Somsanith, June
    Barker, Richard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    D.melanogaster normalized counts RNA seq data matrix developed from NASA Genelab's open science data repository. Created using R.

  16. a

    Reptiles normalized cum currmap 20240416

    • hub.arcgis.com
    • ecouplift-aecom.hub.arcgis.com
    Updated Apr 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter.Hurley@aecom.com_aecom (2024). Reptiles normalized cum currmap 20240416 [Dataset]. https://hub.arcgis.com/documents/ed0d1cbb7fa14a5889cfd1a54d894308
    Explore at:
    Dataset updated
    Apr 22, 2024
    Dataset authored and provided by
    Peter.Hurley@aecom.com_aecom
    Description

    The Wildlife Connectivity Map has been created using Omniscape.jl:Landau, V.A., V.B. Shah, R. Anantharaman, and K.R. Hall. 2021. Omniscape.jl: Software to compute omnidirectional landscape connectivity. Journal of Open Source Software, 6(57), 2829.McRae, B. H., K. Popper, A. Jones, M. Schindel, S. Buttrick, K. R. Hall, R. S. Unnasch, and J. Platt. 2016. Conserving Nature’s Stage: Mapping Omnidirectional Connectivity for Resilient Terrestrial Landscapes in the Pacific Northwest. The Nature Conservancy, Portland, Oregon.Input data into the model includes:Habitat Land Cover Map of Scotland 2022: Available under the Open Government Licence v3.0 https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/ Maps and data created by Space Intelligence with input and support from NatureScot , © SNHOS OpenRoads: Available under the Open Government Licence v3.0 https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/OS Open Rivers: Available under the Open Government Licence v3.0 https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/OS Terrain 50: Available under the Open Government Licence v3.0 https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/Resistance and habitat values have chosen using expert Ecologists within AECOM

  17. Data for "On the influence of non-individual binaural cues and the impact of...

    • zenodo.org
    bin
    Updated Feb 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes M. Arend; Johannes M. Arend; Heinrich R. Liesefeld; Heinrich R. Liesefeld; Christoph Pörschmann; Christoph Pörschmann (2021). Data for "On the influence of non-individual binaural cues and the impact of level normalization on auditory distance estimation of nearby sound sources" [Dataset]. http://doi.org/10.5281/zenodo.4445283
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 19, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Johannes M. Arend; Johannes M. Arend; Heinrich R. Liesefeld; Heinrich R. Liesefeld; Christoph Pörschmann; Christoph Pörschmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data for:
    J. M. Arend, H. R. Liesefeld, and C. Pörschmann, “On the influence of non-individual binaural cues and the impact of level normalization on auditory distance estimation of nearby sound sources,” Acta Acustica, vol. 5, no. 10, pp. 1–21, 2021.

    The dataset contains the results of the three experiments presented in the article. A codebook for each experiment explains the respective variables and denotations.

  18. a

    Large Terrestrial Mammals normalized cum currmap 20240416

    • hub.arcgis.com
    • ecouplift-aecom.hub.arcgis.com
    Updated Apr 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter.Hurley@aecom.com_aecom (2024). Large Terrestrial Mammals normalized cum currmap 20240416 [Dataset]. https://hub.arcgis.com/documents/f69ab565babf49a1aa949a58e0e58a0f
    Explore at:
    Dataset updated
    Apr 22, 2024
    Dataset authored and provided by
    Peter.Hurley@aecom.com_aecom
    Description

    The Wildlife Connectivity Map has been created using Omniscape.jl:Landau, V.A., V.B. Shah, R. Anantharaman, and K.R. Hall. 2021. Omniscape.jl: Software to compute omnidirectional landscape connectivity. Journal of Open Source Software, 6(57), 2829.McRae, B. H., K. Popper, A. Jones, M. Schindel, S. Buttrick, K. R. Hall, R. S. Unnasch, and J. Platt. 2016. Conserving Nature’s Stage: Mapping Omnidirectional Connectivity for Resilient Terrestrial Landscapes in the Pacific Northwest. The Nature Conservancy, Portland, Oregon.Input data into the model includes:Habitat Land Cover Map of Scotland 2022: Available under the Open Government Licence v3.0 https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/ Maps and data created by Space Intelligence with input and support from NatureScot , © SNHOS OpenRoads: Available under the Open Government Licence v3.0 https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/OS Open Rivers: Available under the Open Government Licence v3.0 https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/OS Terrain 50: Available under the Open Government Licence v3.0 https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/Resistance and habitat values have chosen using expert Ecologists within AECOM

  19. N

    Quantifying inter-individual anatomical variability in the subcortex using...

    • neurovault.org
    nifti
    Updated Jun 30, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Quantifying inter-individual anatomical variability in the subcortex using 7T structural MRI: Nonlinear FLASH RN interrater prop R normalized [Dataset]. http://identifiers.org/neurovault.image:8960
    Explore at:
    niftiAvailable download formats
    Dataset updated
    Jun 30, 2018
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    FSL5.0

    glassbrain

    Collection description

    Subject species

    homo sapiens

    Map type

    Other

  20. Common Birds normalized cum currmap 20240416

    • ecouplift-aecom.hub.arcgis.com
    Updated Apr 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter.Hurley@aecom.com_aecom (2024). Common Birds normalized cum currmap 20240416 [Dataset]. https://ecouplift-aecom.hub.arcgis.com/documents/7f2f40931f2f4de382722ff2d2bc6e13
    Explore at:
    Dataset updated
    Apr 22, 2024
    Dataset provided by
    AECOMhttps://aecom.com/
    Authors
    Peter.Hurley@aecom.com_aecom
    Description

    The Wildlife Connectivity Map has been created using Omniscape.jl:Landau, V.A., V.B. Shah, R. Anantharaman, and K.R. Hall. 2021. Omniscape.jl: Software to compute omnidirectional landscape connectivity. Journal of Open Source Software, 6(57), 2829.McRae, B. H., K. Popper, A. Jones, M. Schindel, S. Buttrick, K. R. Hall, R. S. Unnasch, and J. Platt. 2016. Conserving Nature’s Stage: Mapping Omnidirectional Connectivity for Resilient Terrestrial Landscapes in the Pacific Northwest. The Nature Conservancy, Portland, Oregon.Input data into the model includes:Habitat Land Cover Map of Scotland 2022: Available under the Open Government Licence v3.0 https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/ Maps and data created by Space Intelligence with input and support from NatureScot , © SNHOS OpenRoads: Available under the Open Government Licence v3.0 https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/OS Open Rivers: Available under the Open Government Licence v3.0 https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/OS Terrain 50: Available under the Open Government Licence v3.0 https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/Resistance and habitat values have chosen using expert Ecologists within AECOM

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s001

Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc

Related Article
Explore at:
application/cdfv2Available download formats
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

Search
Clear search
Close search
Google apps
Main menu