8 datasets found

f
Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data...
frontiersin.figshare.com
application/cdfv2
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s001
Explore at:
application/cdfv2Available download formats
Unique identifier
https://doi.org/10.3389/fgene.2019.00400.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.
d
Methods for normalizing microbiome data: an ecological perspective
search.dataone.org
data.niaid.nih.gov
+1more
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger (2025). Methods for normalizing microbiome data: an ecological perspective [Dataset]. http://doi.org/10.5061/dryad.tn8qs35
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.tn8qs35
Dataset updated
Apr 11, 2025
Dataset provided by
Dryad Digital Repository
Authors
Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger
Time period covered
Oct 24, 2019
Description
Microbiome sequencing data often need to be normalized due to differences in read depths, and recommendations for microbiome analyses generally warn against using proportions or rarefying to normalize data and instead advocate alternatives, such as upper quartile, CSS, edgeR-TMM, or DESeq-VS. Those recommendations are, however, based on studies that focused on differential abundance testing and variance standardization, rather than community-level comparisons (i.e., beta diversity), Also, standardizing the within-sample variance across samples may suppress differences in species evenness, potentially distorting community-level patterns. Furthermore, the recommended methods use log transformations, which we expect to exaggerate the importance of differences among rare OTUs, while suppressing the importance of differences among common OTUs. 2. We tested these theoretical predictions via simulations and a real-world data set. 3. Proportions and rarefying produced more accurate compariso...
t
Supplemental materials to the conference paper "validating 111.1 million...
service.tib.eu
Updated May 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Supplemental materials to the conference paper "validating 111.1 million marc records" [Dataset]. https://service.tib.eu/ldmservice/dataset/goe-doi-10-25625-amf8jc
Explore at:
Dataset updated
May 16, 2025
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
User guide To generate the reports: prerequisite: Java 8 runtime environment download metadata-qa-marc project as it is described at https://github.com/pkiraly/metadata-qa-marc (e.g. into ~/git/metadata-qa-marc directory) download the .sh and .R files from this project to a subdirectory (e.g. 'scripts') adjust the DIR variable in the [library-name].sh files according to your directory structure run-all.sh creates -details.csv and -summary.csv files into $DIR/_reports directory If you do not want to generate the reports, but would like to use the data files provided, download *.csv.gz files to a '_reports' directory. To generate Table 2. and 3. of the paper: prerequisite: R move normalize-summary.sh, distill-ids.sh, and normalize-ids.sh into $DIR/_reports directory cd $DIR/_reports ./normalize-summary.sh ./distill-ids.sh ./normalize-ids.sh Rscript evaluate-details.R Rscript evaluate-summary.R
f
Table_2_Comparison of Normalization Methods for Analysis of TempO-Seq...
figshare.com
frontiersin.figshare.com
xlsx
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach (2023). Table_2_Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data.xlsx [Dataset]. http://doi.org/10.3389/fgene.2020.00594.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2020.00594.s002
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers
Authors
Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.
d
(high-temp) No 8. Metadata Analysis (16S rRNA/ITS) Output
search.dataone.org
Updated Aug 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jarrod Scott (2024). (high-temp) No 8. Metadata Analysis (16S rRNA/ITS) Output [Dataset]. https://search.dataone.org/view/urn%3Auuid%3A718e0794-b5ff-4919-95ef-4a90a7890a5b
Explore at:
Dataset updated
Aug 15, 2024
Dataset provided by
Smithsonian Research Data Repository
Authors
Jarrod Scott
Description
Output files from the 8. Metadata Analysis Workflow page of the SWELTR high-temp study. In this workflow, we compared environmental metadata with microbial communities. The workflow is split into two parts.

metadata_ssu18_wf.rdata : Part 1 contains all variables and objects for the 16S rRNA analysis. To see the Objects, in R run _load("metadata_ssu18_wf.rdata", verbose=TRUE)_

metadata_its18_wf.rdata : Part 2 contains all variables and objects for the ITS analysis. To see the Objects, in R run _load("metadata_its18_wf.rdata", verbose=TRUE)_
Additional files:

In both workflows, we run the following steps:

1) Metadata Normality Tests: Shapiro-Wilk Normality Test to test whether each matadata parameter is normally distributed.
2) Normalize Parameters: R package bestNormalize to find and execute the best normalizing transformation.
3) Split Metadata parameters into groups: a) Environmental and edaphic properties, b) Microbial functional responses, and c) Temperature adaptation properties.
4) Autocorrelation Tests: Test all possible pair-wise comparisons, on both normalized and non-normalized data sets, for each group.
5) Remove autocorrelated parameters from each group.
6) Dissimilarity Correlation Tests: Use Mantel Tests to see if any on the metadata groups are significantly correlated with the community data.
7) Best Subset of Variables: Determine which of the metadata parameters from each group are the most strongly correlated with the community data. For this we use the bioenv function from the vegan package.
8) Distance-based Redundancy Analysis: Ordination analysis of samples and metadata vector overlays using capscale, also from the vegan package.

Source code for the workflow can be found here:
https://github.com/sweltr/high-temp/blob/master/metadata.Rmd
Size normalization of similarity scores.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alberto Pascual-García; David Abia; Ángel R. Ortiz; Ugo Bastolla (2023). Size normalization of similarity scores. [Dataset]. http://doi.org/10.1371/journal.pcbi.1000331.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1000331.t004
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Alberto Pascual-García; David Abia; Ángel R. Ortiz; Ugo Bastolla
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The reported parameters were used to normalize the raw scores according to Eq. (8).
f
Data from: Best-Matched Internal Standard Normalization in Liquid...
acs.figshare.com
xlsx
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angela K. Boysen; Katherine R. Heal; Laura T. Carlson; Anitra E. Ingalls (2023). Best-Matched Internal Standard Normalization in Liquid Chromatography–Mass Spectrometry Metabolomics Applied to Environmental Samples [Dataset]. http://doi.org/10.1021/acs.analchem.7b04400.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.analchem.7b04400.s002
Dataset updated
Jun 3, 2023
Dataset provided by
ACS Publications
Authors
Angela K. Boysen; Katherine R. Heal; Laura T. Carlson; Anitra E. Ingalls
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The goal of metabolomics is to measure the entire range of small organic molecules in biological samples. In liquid chromatography–mass spectrometry-based metabolomics, formidable analytical challenges remain in removing the nonbiological factors that affect chromatographic peak areas. These factors include sample matrix-induced ion suppression, chromatographic quality, and analytical drift. The combination of these factors is referred to as obscuring variation. Some metabolomics samples can exhibit intense obscuring variation due to matrix-induced ion suppression, rendering large amounts of data unreliable and difficult to interpret. Existing normalization techniques have limited applicability to these sample types. Here we present a data normalization method to minimize the effects of obscuring variation. We normalize peak areas using a batch-specific normalization process, which matches measured metabolites with isotope-labeled internal standards that behave similarly during the analysis. This method, called best-matched internal standard (B-MIS) normalization, can be applied to targeted or untargeted metabolomics data sets and yields relative concentrations. We evaluate and demonstrate the utility of B-MIS normalization using marine environmental samples and laboratory grown cultures of phytoplankton. In untargeted analyses, B-MIS normalization allowed for inclusion of mass features in downstream analyses that would have been considered unreliable without normalization due to obscuring variation. B-MIS normalization for targeted or untargeted metabolomics is freely available at https://github.com/IngallsLabUW/B-MIS-normalization.
f
DataSheet1_ProDiVis: a method to normalize fluorescence signal localization...
frontiersin.figshare.com
pdf
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kyle T. Nguyen; Alexandre R. Sathler; Alvaro G. Estevez; Isabelle E. Logan; Maria Clara Franco (2024). DataSheet1_ProDiVis: a method to normalize fluorescence signal localization in 3D specimens.pdf [Dataset]. http://doi.org/10.3389/fcell.2024.1420161.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fcell.2024.1420161.s001
Dataset updated
Sep 23, 2024
Dataset provided by
Frontiers
Authors
Kyle T. Nguyen; Alexandre R. Sathler; Alvaro G. Estevez; Isabelle E. Logan; Maria Clara Franco
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A common problem in confocal microscopy is the decrease in intensity of excitation light and emission signal from fluorophores as they travel through 3D specimens, resulting in decreased signal detected as a function of depth. Here, we report a visualization program compatible with widely used fluorophores in cell biology to facilitate image interpretation of differential protein disposition in 3D specimens. Glioblastoma cell clusters were fluorescently labeled for mitochondrial complex I (COXI), P2X7 receptor (P2X7R), β-Actin, Ki-67, and DAPI. Each cell cluster was imaged using a laser scanning confocal microscope. We observed up to ∼70% loss in fluorescence signal across the depth in Z-stacks. This progressive underrepresentation of fluorescence intensity as the focal plane deepens hinders an accurate representation of signal location within a 3D structure. To address these challenges, we developed ProDiVis: a program that adjusts apparent fluorescent signals by normalizing one fluorescent signal to a reference signal at each focal plane. ProDiVis serves as a free and accessible, unbiased visualization tool to use in conjunction with fluorescence microscopy images and imaging software.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s001

Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc

Explore at:

application/cdfv2Available download formats

Unique identifier

https://doi.org/10.3389/fgene.2019.00400.s001

Dataset updated

Jun 1, 2023

Dataset provided by

Frontiers

Authors

Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

Clear search

Close search

Google apps

Main menu

Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data...

Methods for normalizing microbiome data: an ecological perspective

Supplemental materials to the conference paper "validating 111.1 million...

Table_2_Comparison of Normalization Methods for Analysis of TempO-Seq...

(high-temp) No 8. Metadata Analysis (16S rRNA/ITS) Output

Size normalization of similarity scores.

Data from: Best-Matched Internal Standard Normalization in Liquid...

DataSheet1_ProDiVis: a method to normalize fluorescence signal localization...

Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.docSee More Versions

Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc