100+ datasets found

f
Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data...
frontiersin.figshare.com
application/cdfv2
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s001
Explore at:
application/cdfv2Available download formats
Unique identifier
https://doi.org/10.3389/fgene.2019.00400.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.
N
Normalizing Service Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jan 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Normalizing Service Report [Dataset]. https://www.datainsightsmarket.com/reports/normalizing-service-531581
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Jan 12, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Market Analysis for Normalizing Service The global normalizing service market is anticipated to reach a value of xx million USD by 2033, exhibiting a CAGR of xx% during the forecast period. The market growth is attributed to the rising demand for efficient data management solutions, increased adoption of cloud-based applications, and growing awareness of data normalization techniques. The market size was valued at xx million USD in 2025. North America dominates the market, followed by Europe and Asia Pacific. The market is segmented based on application into banking and financial services, healthcare, retail, manufacturing, and other industries. The banking and financial services segment is expected to hold the largest market share due to the need for data accuracy and compliance with regulatory requirements. In terms of types, the market is divided into data integration and reconciliation, data standardization, and data profiling. Data integration and reconciliation is expected to dominate the market as it helps eliminate inconsistencies and redundancy in data sets. Major players in the market include Infosys, Capgemini, IBM, Accenture, and Wipro. The Normalizing Service Market reached a value of USD 1.16 Billion in 2023 and is poised to grow at a rate of 11.7% during the forecast period, reaching a value of USD 2.23 Billion by 2032.
Data from: A systematic evaluation of normalization methods and probe...
zenodo.org
data.niaid.nih.gov
+1more
bin
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra; E. J. Parra; H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky (2023). A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data [Dataset]. http://doi.org/10.5061/dryad.cnp5hqc7v
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.cnp5hqc7v
Dataset updated
May 31, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra; E. J. Parra; H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Background

The Infinium EPIC array measures the methylation status of > 850,000 CpG sites. The EPIC BeadChip uses a two-array design: Infinium Type I and Type II probes. These probe types exhibit different technical characteristics which may confound analyses. Numerous normalization and pre-processing methods have been developed to reduce probe type bias as well as other issues such as background and dye bias.

Methods

This study evaluates the performance of various normalization methods using 16 replicated samples and three metrics: absolute beta-value difference, overlap of non-replicated CpGs between replicate pairs, and effect on beta-value distributions. Additionally, we carried out Pearson's correlation and intraclass correlation coefficient (ICC) analyses using both raw and SeSAMe 2 normalized data.

Results

The method we define as SeSAMe 2, which consists of the application of the regular SeSAMe pipeline with an additional round of QC, pOOBAH masking, was found to be the best-performing normalization method, while quantile-based methods were found to be the worst performing methods. Whole-array Pearson's correlations were found to be high. However, in agreement with previous studies, a substantial proportion of the probes on the EPIC array showed poor reproducibility (ICC < 0.50). The majority of poor-performing probes have beta values close to either 0 or 1, and relatively low standard deviations. These results suggest that probe reliability is largely the result of limited biological variation rather than technical measurement variation. Importantly, normalizing the data with SeSAMe 2 dramatically improved ICC estimates, with the proportion of probes with ICC values > 0.50 increasing from 45.18% (raw data) to 61.35% (SeSAMe 2).
Normalization techniques for PARAFAC modeling of urine metabolomics data
data.niaid.nih.gov
xml
Updated May 11, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Radana Karlikova (2017). Normalization techniques for PARAFAC modeling of urine metabolomics data [Dataset]. https://data.niaid.nih.gov/resources?id=mtbls290
Explore at:
xmlAvailable download formats
Dataset updated
May 11, 2017
Dataset provided by
IMTM, Faculty of Medicine and Dentistry, Palacky University Olomouc, Hnevotinska 5, 775 15 Olomouc, Czech Republic
Authors
Radana Karlikova
Variables measured
Sample type, Metabolomics, Sample collection time
Description
One of the body fluids often used in metabolomics studies is urine. The peak intensities of metabolites in urine are affected by the urine history of an individual resulting in dilution differences. This requires therefore normalization of the data to correct for such differences. Two normalization techniques are commonly applied to urine samples prior to their further statistical analysis. First, AUC normalization aims to normalize a group of signals with peaks by standardizing the area under the curve (AUC) within a sample to the median, mean or any other proper representation of the amount of dilution. The second approach uses specific end-product metabolites such as creatinine and all intensities within a sample are expressed relative to the creatinine intensity. Another way of looking at urine metabolomics data is by realizing that the ratios between peak intensities are the information-carrying features. This opens up possibilities to use another class of data analysis techniques designed to deal with such ratios: compositional data analysis. In this approach special transformations are defined to deal with the ratio problem. In essence, it comes down to using another distance measure than the Euclidian Distance that is used in the conventional analysis of metabolomics data. We will illustrate using this type of approach in combination with three-way methods (i.e. PARAFAC) to be used in cases where samples of some biological material are measured at multiple time points. Aim of the paper is to develop PARAFAC modeling of three-way metabolomics data in the context of compositional data and compare this with standard normalization techniques for the specific case of urine metabolomics data.
n
Methods for normalizing microbiome data: an ecological perspective
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Oct 30, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger (2018). Methods for normalizing microbiome data: an ecological perspective [Dataset]. http://doi.org/10.5061/dryad.tn8qs35
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.tn8qs35
Dataset updated
Oct 30, 2018
Dataset provided by
University of New England
James Cook University
Authors
Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Microbiome sequencing data often need to be normalized due to differences in read depths, and recommendations for microbiome analyses generally warn against using proportions or rarefying to normalize data and instead advocate alternatives, such as upper quartile, CSS, edgeR-TMM, or DESeq-VS. Those recommendations are, however, based on studies that focused on differential abundance testing and variance standardization, rather than community-level comparisons (i.e., beta diversity), Also, standardizing the within-sample variance across samples may suppress differences in species evenness, potentially distorting community-level patterns. Furthermore, the recommended methods use log transformations, which we expect to exaggerate the importance of differences among rare OTUs, while suppressing the importance of differences among common OTUs. 2. We tested these theoretical predictions via simulations and a real-world data set. 3. Proportions and rarefying produced more accurate comparisons among communities and were the only methods that fully normalized read depths across samples. Additionally, upper quartile, CSS, edgeR-TMM, and DESeq-VS often masked differences among communities when common OTUs differed, and they produced false positives when rare OTUs differed. 4. Based on our simulations, normalizing via proportions may be superior to other commonly used methods for comparing ecological communities.
f
Data from: Best-Matched Internal Standard Normalization in Liquid...
acs.figshare.com
xlsx
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angela K. Boysen; Katherine R. Heal; Laura T. Carlson; Anitra E. Ingalls (2023). Best-Matched Internal Standard Normalization in Liquid Chromatography–Mass Spectrometry Metabolomics Applied to Environmental Samples [Dataset]. http://doi.org/10.1021/acs.analchem.7b04400.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.analchem.7b04400.s002
Dataset updated
Jun 3, 2023
Dataset provided by
ACS Publications
Authors
Angela K. Boysen; Katherine R. Heal; Laura T. Carlson; Anitra E. Ingalls
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The goal of metabolomics is to measure the entire range of small organic molecules in biological samples. In liquid chromatography–mass spectrometry-based metabolomics, formidable analytical challenges remain in removing the nonbiological factors that affect chromatographic peak areas. These factors include sample matrix-induced ion suppression, chromatographic quality, and analytical drift. The combination of these factors is referred to as obscuring variation. Some metabolomics samples can exhibit intense obscuring variation due to matrix-induced ion suppression, rendering large amounts of data unreliable and difficult to interpret. Existing normalization techniques have limited applicability to these sample types. Here we present a data normalization method to minimize the effects of obscuring variation. We normalize peak areas using a batch-specific normalization process, which matches measured metabolites with isotope-labeled internal standards that behave similarly during the analysis. This method, called best-matched internal standard (B-MIS) normalization, can be applied to targeted or untargeted metabolomics data sets and yields relative concentrations. We evaluate and demonstrate the utility of B-MIS normalization using marine environmental samples and laboratory grown cultures of phytoplankton. In untargeted analyses, B-MIS normalization allowed for inclusion of mass features in downstream analyses that would have been considered unreliable without normalization due to obscuring variation. B-MIS normalization for targeted or untargeted metabolomics is freely available at https://github.com/IngallsLabUW/B-MIS-normalization.
e
Data from: A generic normalization method for proper quantification in...
ebi.ac.uk
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandra Anjo, A generic normalization method for proper quantification in untargeted proteomics screening [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD009068
Explore at:
Authors
Sandra Anjo
Variables measured
Proteomics
Description
The label-free quantitative mass spectrometry methods, in particular the SWATH-MS approach, have gained popularity and became a powerful technique for comparison of large datasets. In the present work, it is introduced the use of recombinant proteins as internal standards for untargeted label-free methods. The proposed internal standard strategy reveals a similar intragroup normalization capacity when compared with the most common normalization methods, with the additional advantage of maintaining the overall proteome changes between groups (which are lost using the methods referred above). Thus, proving to be able to maintain a good performance even when large qualitative and quantitative differences in sample composition are observed, such as the ones induced by biological regulation (as observed in secretome and other biofluids’ analyses) or by enrichment approaches (such as immunopurifications). Moreover, it corresponds to a cost-effective alternative, easier to implement than the current stable-isotope labeling internal standards, therefore being an appealing strategy for large quantitative screening, as clinical cohorts for biomarker discovery.
m
Data Normalization Method for Geo-Spatial Analysis on Ports
data.mendeley.com
Updated Jun 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nazmus Sakib (2020). Data Normalization Method for Geo-Spatial Analysis on Ports [Dataset]. http://doi.org/10.17632/skn24jntn3.1
Explore at:
Unique identifier
https://doi.org/10.17632/skn24jntn3.1
Dataset updated
Jun 6, 2020
Authors
Nazmus Sakib
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Based on open access data, 79 Mediterranean passenger ports are analyzed to compare their infrastructure, hinterland accessibility and offered multi-modality. Comparative Geo-spatial analysis is also carried out by using the data normalization method in order to visualize the ports' performance on maps. These data driven comprehensive analytical results can bring added value to sustainable development policy and planning initiatives in the Mediterranean Region. The analyzed elements can be also contributed to the development of passenger port performance indicators. The empirical research methods used for the Mediterranean passenger ports can be replicated for transport nodes of any region around the world to determine their relative performance on selected criteria for improvement and planning.

The Mediterranean passenger ports were initially categorizing into cruise and ferry ports. The cruise ports were identified from the member list of the Association for the Mediterranean Cruise Ports (MedCruise), representing more than 80% of the cruise tourism activities per country. The identified cruise ports were mapped by selecting the corresponding geo-referenced ports from the map layer developed by the European Marine Observation and Data Network (EMODnet). The United Nations (UN) Code for Trade and Transport Locations (LOCODE) was identified for each of the cruise ports as the common criteria to carry out the selection. The identified cruise ports not listed by the EMODnet were added to the geo-database by using under license the editing function of the ArcMap (version 10.1) geographic information system software. The ferry ports were identified from the open access industry initiative data provided by the Ferrylines, and were mapped in a similar way as the cruise ports (Figure 1).

Based on the available data from the identified cruise ports, a database (see Table A1–A3) was created for a Mediterranean scale analysis. The ferry ports were excluded due to the unavailability of relevant information on selected criteria (Table 2). However, the cruise ports serving as ferry passenger ports were identified in order to maximize the scope of the analysis. Port infrastructure and hinterland accessibility data were collected from the recent statistical reports published by the MedCruise, which are a compilation of data provided by its individual member port authorities and the cruise terminal operators. Other supplementary sources were the European Sea Ports Organization (ESPO) and the Global Ports Holding, a cruise terminal operator with an established presence in the Mediterranean. Additionally, open access data sources (e.g. the Google Maps and Trip Advisor) were consulted in order to identify the multi-modal transports and bridge the data gaps on hinterland accessibility by measuring the approximate distances.
f
Normalization of High Dimensional Genomics Data Where the Distribution of...
plos.figshare.com
tiff
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mattias Landfors; Philge Philip; Patrik Rydén; Per Stenberg (2023). Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed [Dataset]. http://doi.org/10.1371/journal.pone.0027942
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0027942
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Mattias Landfors; Philge Philip; Patrik Rydén; Per Stenberg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Genome-wide analysis of gene expression or protein binding patterns using different array or sequencing based technologies is now routinely performed to compare different populations, such as treatment and reference groups. It is often necessary to normalize the data obtained to remove technical variation introduced in the course of conducting experimental work, but standard normalization techniques are not capable of eliminating technical bias in cases where the distribution of the truly altered variables is skewed, i.e. when a large fraction of the variables are either positively or negatively affected by the treatment. However, several experiments are likely to generate such skewed distributions, including ChIP-chip experiments for the study of chromatin, gene expression experiments for the study of apoptosis, and SNP-studies of copy number variation in normal and tumour tissues. A preliminary study using spike-in array data established that the capacity of an experiment to identify altered variables and generate unbiased estimates of the fold change decreases as the fraction of altered variables and the skewness increases. We propose the following work-flow for analyzing high-dimensional experiments with regions of altered variables: (1) Pre-process raw data using one of the standard normalization techniques. (2) Investigate if the distribution of the altered variables is skewed. (3) If the distribution is not believed to be skewed, no additional normalization is needed. Otherwise, re-normalize the data using a novel HMM-assisted normalization procedure. (4) Perform downstream analysis. Here, ChIP-chip data and simulated data were used to evaluate the performance of the work-flow. It was found that skewed distributions can be detected by using the novel DSE-test (Detection of Skewed Experiments). Furthermore, applying the HMM-assisted normalization to experiments where the distribution of the truly altered variables is skewed results in considerably higher sensitivity and lower bias than can be attained using standard and invariant normalization methods.
d
Data from: Online spatial normalization for real-time fMRI
search.dataone.org
explore.openaire.eu
+2more
Updated Apr 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaofei Li; Li Yao; Qing Ye; Xiaojie Zhao (2025). Online spatial normalization for real-time fMRI [Dataset]. http://doi.org/10.5061/dryad.1642b
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.1642b
Dataset updated
Apr 7, 2025
Dataset provided by
Dryad Digital Repository
Authors
Xiaofei Li; Li Yao; Qing Ye; Xiaojie Zhao
Time period covered
Jan 1, 2015
Description
Real-time functional magnetic resonance imaging (rtfMRI) is a recently emerged technique that demands fast data processing within a single repetition time (TR), such as a TR of 2 seconds. Data preprocessing in rtfMRI has rarely involved spatial normalization, which can not be accomplished in a short time period. However, spatial normalization may be critical for accurate functional localization in a stereotactic space and is an essential procedure for some emerging applications of rtfMRI. In this study, we introduced an online spatial normalization method that adopts a novel affine registration (AFR) procedure based on principal axes registration (PA) and Gauss-Newton optimization (GN) using the self-adaptive Î² parameter, termed PA-GN(Î²) AFR and nonlinear registration (NLR) based on discrete cosine transform (DCT). In AFR, PA provides an appropriate initial estimate of GN to induce the rapid convergence of GN. In addition, the Î² parameter, which relies on the change rate of cost functio...
D
A spatio-temporal normalization method for geophysical data
phys-techsciences.datastations.nl
application/gzip +7
Updated Mar 1, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
E. Pavlidou; E. Pavlidou (2016). A spatio-temporal normalization method for geophysical data [Dataset]. http://doi.org/10.17026/dans-x7r-kgnr
Explore at:
bin(385755360), txt(236616), txt(61409), bin(126133), txt(61663), application/x-rlang-transport(3032434), txt(61466), pdf(137846), application/x-rlang-transport(2530641), txt(61440), txt(102823), txt(219096), txt(227856), txt(61462), xml(1330621), txt(17952), application/gzip(1651172), type/x-r-syntax(1300), txt(61404), type/x-r-syntax(165), type/x-r-syntax(213), bin(2477), txt(6235), txt(61630), txt(61420), bin(126121), txt(61401), txt(61693), type/x-r-syntax(603), bin(2375), txt(61374), type/x-r-syntax(215), txt(61390), bin(10184), txt(79577), type/x-r-syntax(387), txt(61446), txt(94843), bin(125724), txt(6333), txt(149883), type/x-r-syntax(384), txt(61435), application/x-rlang-transport(11333253), txt(61410), txt(61439), bin(2360), bin(382566720), txt(56882), type/x-r-syntax(433), txt(61431), txt(17928), txt(102793), type/x-r-syntax(3646), txt(70702), bin(6615), txt(61454), type/x-r-syntax(792), zip(61616), bin(3001), type/x-r-syntax(261), type/x-r-syntax(401)Available download formats
Unique identifier
https://doi.org/10.17026/dans-x7r-kgnr
Dataset updated
Mar 1, 2016
Dataset provided by
DANS Data Station Phys-Tech Sciences
Authors
E. Pavlidou; E. Pavlidou
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The objective of the study was to introduce a normalization algorithm which highlights short-term, localized, non-periodic fluctuations in hyper-temporal satellite data by dividing each pixel by the mean value of its spatial neighbourhood set. The algorithm was designed to suppress signal patterns that are common in the central and surrounding pixels, utilizing spatial and temporal information at different scales. Twee folders ('Normalized_different_framesizes' en 'Retrieval_different_anomalies') zijn te groot voor upload en worden nagestuurd via SURF Filesender
n
Cell type labels for all clustering and normalization combinations compared...
data.niaid.nih.gov
search.dataone.org
+2more
zip
Updated Nov 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Hickey (2022). Cell type labels for all clustering and normalization combinations compared for CODEX multiplexed imaging [Dataset]. http://doi.org/10.5061/dryad.dfn2z352c
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.dfn2z352c
Dataset updated
Nov 17, 2022
Dataset provided by
Stanford University
Authors
John Hickey
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
We performed CODEX (co-detection by indexing) multiplexed imaging on four sections of the human colon (ascending, transverse, descending, and sigmoid) using a panel of 47 oligonucleotide-barcoded antibodies. Subsequently images underwent standard CODEX image processing (tile stitching, drift compensation, cycle concatenation, background subtraction, deconvolution, and determination of best focal plane), and single cell segmentation. Output of this process was a dataframe of nearly 130,000 cells with fluorescence values quantified from each marker. We used this dataframe as input to 1 of the 5 normalization techniques of which we compared z, double-log(z), min/max, and arcsinh normalizations to the original unmodified dataset. We used these normalized dataframes as inputs for 4 unsupervised clustering algorithms: k-means, leiden, X-shift euclidian, and X-shift angular.

From the clustering outputs, we then labeled the clusters that resulted for cells observed in the data producing 20 unique cell type labels. We also labeled cell types by hiearchical hand-gating data within cellengine (cellengine.com). We also created another gold standard for comparison by overclustering unormalized data with X-shift angular clustering. Finally, we created one last label as the major cell type call from each cell from all 21 cell type labels in the dataset.

Consequently the dataset has individual cells segmented out in each row. Then there are columns for the X, Y position in pixels in the overall montage image of the dataset. There are also columns to indicate which region the data came from (4 total). The rest are labels generated by all the clustering and normalization techniques used in the manuscript and what were compared to each other. These also were the data that were used for neighborhood analysis for the last figure of the manuscript. These are provided at all four levels of cell type level granularity (from 7 cell types to 35 cell types).
Z
Data for bulk ATAC-seq normalization paper
data.niaid.nih.gov
zenodo.org
Updated Jun 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Van den Berge, Koen (2022). Data for bulk ATAC-seq normalization paper [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4441901
Explore at:
Dataset updated
Jun 15, 2022
Dataset authored and provided by
Van den Berge, Koen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This zipped file contains all public datasets used in our benchmark of bulk ATAC-seq normalization methods.
N
Normalizing Service Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Normalizing Service Report [Dataset]. https://www.marketreportanalytics.com/reports/normalizing-service-53360
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Apr 2, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Normalizing Service market is experiencing robust growth, driven by increasing demand for [insert specific drivers, e.g., data quality improvement, enhanced analytics capabilities, and regulatory compliance]. The market, estimated at $5 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching an estimated value of $15 billion by 2033. This expansion is fueled by several key trends, including the proliferation of big data, the rise of cloud computing, and the growing adoption of advanced analytical techniques across various industries. The increasing complexity of data sources and the need for consistent and reliable information are primary catalysts for market growth. While the market faces certain restraints, such as high implementation costs and a shortage of skilled professionals, these challenges are being mitigated through the emergence of cost-effective solutions and specialized training programs. Segmentation analysis reveals significant opportunities within different application areas. For instance, the [insert specific application, e.g., financial services] sector currently holds a substantial market share due to the stringent regulatory requirements and the importance of data accuracy. Similarly, the [insert specific type, e.g., cloud-based] normalization services segment is experiencing rapid growth due to its scalability and ease of deployment. Geographically, North America and Europe currently dominate the market, but Asia-Pacific is anticipated to emerge as a significant growth region over the forecast period, driven by increasing digitalization and economic expansion. Key players in the market are leveraging technological advancements, strategic partnerships, and mergers and acquisitions to expand their market share and offer comprehensive solutions to meet the evolving needs of their customers.
f
K–Fold method for optimizing the data sets.
figshare.com
xls
Updated May 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
P. Jagadesh; Afzal Hussain Khan; B. Shanmuga Priya; A. Asheeka; Zineb Zoubir; Hassan M. Magbool; Shamshad Alam; Omer Y. Bakather (2024). K–Fold method for optimizing the data sets. [Dataset]. http://doi.org/10.1371/journal.pone.0303101.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303101.t001
Dataset updated
May 13, 2024
Dataset provided by
PLOS ONE
Authors
P. Jagadesh; Afzal Hussain Khan; B. Shanmuga Priya; A. Asheeka; Zineb Zoubir; Hassan M. Magbool; Shamshad Alam; Omer Y. Bakather
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This research study aims to understand the application of Artificial Neural Networks (ANNs) to forecast the Self-Compacting Recycled Coarse Aggregate Concrete (SCRCAC) compressive strength. From different literature, 602 available data sets from SCRCAC mix designs are collected, and the data are rearranged, reconstructed, trained and tested for the ANN model development. The models were established using seven input variables: the mass of cementitious content, water, natural coarse aggregate content, natural fine aggregate content, recycled coarse aggregate content, chemical admixture and mineral admixture used in the SCRCAC mix designs. Two normalization techniques are used for data normalization to visualize the data distribution. For each normalization technique, three transfer functions are used for modelling. In total, six different types of models were run in MATLAB and used to estimate the 28th day SCRCAC compressive strength. Normalization technique 2 performs better than 1 and TANSING is the best transfer function. The best k-fold cross-validation fold is k = 7. The coefficient of determination for predicted and actual compressive strength is 0.78 for training and 0.86 for testing. The impact of the number of neurons and layers on the model was performed. Inputs from standards are used to forecast the 28th day compressive strength. Apart from ANN, Machine Learning (ML) techniques like random forest, extra trees, extreme boosting and light gradient boosting techniques are adopted to predict the 28th day compressive strength of SCRCAC. Compared to ML, ANN prediction shows better results in terms of sensitive analysis. The study also extended to determine 28th day compressive strength from experimental work and compared it with 28th day compressive strength from ANN best model. Standard and ANN mix designs have similar fresh and hardened properties. The average compressive strength from ANN model and experimental results are 39.067 and 38.36 MPa, respectively with correlation coefficient is 1. It appears that ANN can validly predict the compressive strength of concrete.
f
Data from: Normalization Method Utilizing Endogenous Proteins for...
acs.figshare.com
xls
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kai Yan; Yueying Yang; Yunpeng Zhang; Wanbing Zhao; Lujian Liao (2023). Normalization Method Utilizing Endogenous Proteins for Quantitative Proteomics [Dataset]. http://doi.org/10.1021/jasms.0c00012.s002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1021/jasms.0c00012.s002
Dataset updated
Jun 7, 2023
Dataset provided by
ACS Publications
Authors
Kai Yan; Yueying Yang; Yunpeng Zhang; Wanbing Zhao; Lujian Liao
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
We developed a normalization method utilizing the expression levels of a panel of endogenous proteins as normalization standards (EPNS herein). We tested the validity of the method using two sets of tandem mass tag (TMT)-labeled data and found that this normalization method effectively reduced global intensity bias at the protein level. The coefficient of variation (CV) of the overall median was reduced by 55% and 82% on average, compared to the reduction by 72% and 86% after normalization using the upper quartile. Furthermore, we used differential protein expression analysis and statistical learning to identify biomarkers for colorectal cancer from a CPTAC data set. The expression changes of a panel of proteins, including NUP205, GTPBP4, CNN2, GNL3, and S100A11, all of which highly correlate with colorectal cancer. Applying these five proteins as model features, random forest modeling obtained prediction results with the maximum AUC of 0.9998 using EPNS-normalized data, comparing favorably to the AUC of 0.9739 using the raw data. Thus, the normalization method based on EPNS reduced the global intensity bias and is applicable for quantitative proteomic analysis.
e
Data from: Label-free quantitative phosphoproteomics with novel pairwise...
ebi.ac.uk
Updated Aug 24, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Susumu Imanishi (2015). Label-free quantitative phosphoproteomics with novel pairwise abundance normalization reveals synergistic RAS and CIP2A signaling [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD001374
Explore at:
Dataset updated
Aug 24, 2015
Authors
Susumu Imanishi
Variables measured
Proteomics
Description
Label-free quantification is a powerful method for studying cellular protein phosphorylation dynamics. However, whether current data normalization methods achieve sufficient accuracy has not been examined systematically. Here, we demonstrate that a large uni-directional shift in the phosphopeptide abundance distribution is problematic for global median centering and quantile-based normalizations and may mislead the biological conclusion from unlabeled phosphoproteome data. Instead, we present a novel normalization strategy, named pairwise normalization, which is based on adjusting phosphopeptide abundances measured before and after the enrichment. The superior performance of pairwise normalization was validated by statistical methods, western blotting analysis, and by bioinformatics analysis. In addition, we demonstrate that the choice of normalization method influences the downstream analyses of the data and perceived pathway activities. Furthermore, we demonstrate that the developed normalization method, combined with pathway analysis algorithms, revealed a novel biological synergism between Ras signalling and PP2A inhibition by CIP2A.
S
A radiometric normalization dataset of Shandong Province based on Gaofen-1...
scidb.cn
Updated Feb 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
黄莉婷; 焦伟利; 龙腾飞 (2020). A radiometric normalization dataset of Shandong Province based on Gaofen-1 WFV image (2018) [Dataset]. http://doi.org/10.11922/sciencedb.947
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.11922/sciencedb.947
Dataset updated
Feb 20, 2020
Dataset provided by
Science Data Bank
Authors
黄莉婷; 焦伟利; 龙腾飞
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Shandong
Description
Surface reflectance is a critical physical variable that affects the energy budget in land-atmosphere interactions, feature recognition and classification, and climate change research. This dataset uses the relative radiometric normalization method, and takes the Landsat-8 Operational Land Imager (OLI) surface reflectance products as the reference image to normalize the GF-1 satellite WFV sensor cloud-free images of Shandong Province in 2018. Relative radiometric normalization processing mainly includes atmospheric correction, image resampling, image registration, mask, extract the no-change pixels and calculate normalization coefficients. After relative radiometric normalization, the no-change pixels of each GF-1 WFV image and its reference image, R2 is 0.7295 above, RMSE is below 0.0172. The surface reflectance accuracy of GF-1 WFV image is improved, which can be used in cooperation with Landsat data to provide data support for remote sensing quantitative inversion. This dataset is in GeoTIFF format, and the spatial resolution of the image is 16 m.
f
Comparison of normalization approaches for gene expression studies completed...
plos.figshare.com
tiff
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farnoosh Abbas-Aghababazadeh; Qian Li; Brooke L. Fridley (2023). Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing [Dataset]. http://doi.org/10.1371/journal.pone.0206312
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0206312
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Farnoosh Abbas-Aghababazadeh; Qian Li; Brooke L. Fridley
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Normalization of RNA-Seq data has proven essential to ensure accurate inferences and replication of findings. Hence, various normalization methods have been proposed for various technical artifacts that can be present in high-throughput sequencing transcriptomic studies. In this study, we set out to compare the widely used library size normalization methods (UQ, TMM, and RLE) and across sample normalization methods (SVA, RUV, and PCA) for RNA-Seq data using publicly available data from The Cancer Genome Atlas (TCGA) cervical cancer study. Additionally, an extensive simulation study was completed to compare the performance of the across sample normalization methods in estimating technical artifacts. Lastly, we investigated the effect of reduction in degrees of freedom in the normalized data and their impact on downstream differential expression analysis results. Based on this study, the TMM and RLE library size normalization methods give similar results for CESC dataset. In addition, the simulated datasets results show that the SVA (“BE”) method outperforms the other methods (SVA “Leek”, PCA) by correctly estimating the number of latent artifacts. Moreover, ignoring the loss of degrees of freedom due to normalization results in an inflated type I error rates. We recommend adjusting not only for library size differences but also the assessment of known and unknown technical artifacts in the data, and if needed, complete across sample normalization. In addition, we suggest that one includes the known and estimated latent artifacts in the design matrix to correctly account for the loss in degrees of freedom, as opposed to completing the analysis on the post-processed normalized data.
N
Single cell RNA-seq data of human hESCs to evaluate SCnorm: robust...
data.niaid.nih.gov
Updated May 15, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bacher R; Chu L; Kendziorski C; Swanson S (2019). Single cell RNA-seq data of human hESCs to evaluate SCnorm: robust normalization of single-cell rna-seq data [Dataset]. https://data.niaid.nih.gov/resources?id=gse85917
Explore at:
Dataset updated
May 15, 2019
Dataset provided by
University of Florida
Authors
Bacher R; Chu L; Kendziorski C; Swanson S
Description
Normalization of RNA-sequencing data is essential for accurate downstream inference, but the assumptions upon which most methods are based do not hold in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of scRNA-seq data. Total 183 single cells (92 H1 cells, 91 H9 cells), sequenced twice, were used to evaluate SCnorm in normalizing single cell RNA-seq experiments. Total 48 bulk H1 samples were used to compare bulk and single cell properties. For single-cell RNA-seq, the identical single-cell indexed and fragmented cDNA were pooled at 96 cells per lane or at 24 cells per lane to test the effects of sequencing depth, resulting in approximately 1 million and 4 million mapped reads per cell in the two pooling groups, respectively.

Facebook

Twitter

Click to copy link

Link copied

Cite

Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s001

Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc

Explore at:

application/cdfv2Available download formats

Unique identifier

https://doi.org/10.3389/fgene.2019.00400.s001

Dataset updated

Jun 1, 2023

Dataset provided by

Frontiers

Authors

Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

Clear search

Close search

Google apps

Main menu

Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data...

Normalizing Service Report

Data from: A systematic evaluation of normalization methods and probe...

Normalization techniques for PARAFAC modeling of urine metabolomics data

Methods for normalizing microbiome data: an ecological perspective

Data from: Best-Matched Internal Standard Normalization in Liquid...

Data from: A generic normalization method for proper quantification in...

Data Normalization Method for Geo-Spatial Analysis on Ports

Normalization of High Dimensional Genomics Data Where the Distribution of...

Data from: Online spatial normalization for real-time fMRI

A spatio-temporal normalization method for geophysical data

Cell type labels for all clustering and normalization combinations compared...

Data for bulk ATAC-seq normalization paper

Normalizing Service Report

K–Fold method for optimizing the data sets.

Data from: Normalization Method Utilizing Endogenous Proteins for...

Data from: Label-free quantitative phosphoproteomics with novel pairwise...

A radiometric normalization dataset of Shandong Province based on Gaofen-1...

Comparison of normalization approaches for gene expression studies completed...

Single cell RNA-seq data of human hESCs to evaluate SCnorm: robust...

Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.docSee More Versions

Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc