100+ datasets found

c
Data from: LVMED: Dataset of Latvian text normalisation samples for the...
repository.clarin.lv
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Viesturs Jūlijs Lasmanis; Normunds Grūzītis (2023). LVMED: Dataset of Latvian text normalisation samples for the medical domain [Dataset]. https://repository.clarin.lv/repository/xmlui/handle/20.500.12574/85
Explore at:
Dataset updated
May 30, 2023
Authors
Viesturs Jūlijs Lasmanis; Normunds Grūzītis
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The CSV dataset contains sentence pairs for a text-to-text transformation task: given a sentence that contains 0..n abbreviations, rewrite (normalize) the sentence in full words (word forms).

Training dataset: 64,665 sentence pairs Validation dataset: 7,185 sentence pairs. Testing dataset: 7,984 sentence pairs.

All sentences are extracted from a public web corpus (https://korpuss.lv/id/Tīmeklis2020) and contain at least one medical term.
Data from: Normalized data
figshare.com
txt
Updated Jun 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yalbi Balderas (2022). Normalized data [Dataset]. http://doi.org/10.6084/m9.figshare.20076047.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20076047.v1
Dataset updated
Jun 15, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Yalbi Balderas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Normalize data
f
Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data...
frontiersin.figshare.com
zip
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zip [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s002
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2019.00400.s002
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.
N
Normalizing Service Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Normalizing Service Report [Dataset]. https://www.marketreportanalytics.com/reports/normalizing-service-53022
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Apr 2, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Normalizing Service market is experiencing robust growth, driven by increasing demand for [Insert specific drivers based on your knowledge of the Normalizing Service market, e.g., improved data quality, enhanced data analysis capabilities, rising adoption of cloud-based solutions, stringent data governance regulations]. The market is segmented by application [Insert specific applications, e.g., healthcare, finance, manufacturing] and type [Insert specific types of Normalizing Services, e.g., data cleansing, data transformation, data integration]. While precise market sizing data is unavailable, based on industry trends and comparable markets with similar growth trajectories, a reasonable estimate for the 2025 market size could be placed in the range of $500-750 million USD, with a Compound Annual Growth Rate (CAGR) of approximately 15-20% projected from 2025 to 2033. This growth is expected to be fueled by the continued expansion of big data analytics and the rising need for data standardization across diverse industries. However, challenges such as data security concerns, integration complexities, and high initial investment costs can act as potential restraints on market expansion. Regional analysis suggests a strong presence across North America and Europe, driven by early adoption and robust technological infrastructure. Asia-Pacific is poised for significant growth in the coming years due to increasing digitalization and expanding data centers. The market is highly competitive, with a mix of established players and emerging technology companies vying for market share. Successful players will need to differentiate their offerings through specialized solutions, strategic partnerships, and a focus on addressing specific industry needs. Future growth will depend on advancements in AI and machine learning technologies, further integration with cloud platforms, and the development of user-friendly, scalable solutions.
N
Normalizing Service Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Normalizing Service Report [Dataset]. https://www.marketreportanalytics.com/reports/normalizing-service-53595
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Apr 2, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Normalizing Service market is experiencing robust growth, driven by increasing demand for [insert specific drivers, e.g., improved data quality, enhanced data security, rising adoption of cloud-based solutions]. The market size in 2025 is estimated at $5 billion, projecting a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033. This expansion is fueled by several key trends, including the growing adoption of [insert specific trends, e.g., big data analytics, AI-powered normalization tools, increasing regulatory compliance requirements]. While challenges remain, such as [insert specific restraints, e.g., high implementation costs, data integration complexities, lack of skilled professionals], the market's positive trajectory is expected to continue. Segmentation reveals that the [insert dominant application segment, e.g., financial services] application segment holds the largest market share, with [insert dominant type segment, e.g., cloud-based] solutions demonstrating significant growth. Regional analysis shows a strong presence across North America and Europe, particularly in the United States, United Kingdom, and Germany, driven by early adoption of advanced technologies and robust digital infrastructure. However, emerging markets in Asia-Pacific, particularly in China and India, are exhibiting significant growth potential due to expanding digitalization and increasing data volumes. The competitive landscape is characterized by a mix of established players and emerging companies, leading to innovation and market consolidation. The forecast period (2025-2033) promises continued market expansion, underpinned by technological advancements, increased regulatory pressures, and evolving business needs across diverse industries. The long-term outlook is optimistic, indicating a substantial market opportunity for companies offering innovative and cost-effective Normalizing Services.
f
Identification of Novel Reference Genes Suitable for qRT-PCR Normalization...
plos.figshare.com
tiff
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yu Hu; Shuying Xie; Jihua Yao (2023). Identification of Novel Reference Genes Suitable for qRT-PCR Normalization with Respect to the Zebrafish Developmental Stage [Dataset]. http://doi.org/10.1371/journal.pone.0149277
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0149277
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Yu Hu; Shuying Xie; Jihua Yao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Reference genes used in normalizing qRT-PCR data are critical for the accuracy of gene expression analysis. However, many traditional reference genes used in zebrafish early development are not appropriate because of their variable expression levels during embryogenesis. In the present study, we used our previous RNA-Seq dataset to identify novel reference genes suitable for gene expression analysis during zebrafish early developmental stages. We first selected 197 most stably expressed genes from an RNA-Seq dataset (29,291 genes in total), according to the ratio of their maximum to minimum RPKM values. Among the 197 genes, 4 genes with moderate expression levels and the least variation throughout 9 developmental stages were identified as candidate reference genes. Using four independent statistical algorithms (delta-CT, geNorm, BestKeeper and NormFinder), the stability of qRT-PCR expression of these candidates was then evaluated and compared to that of actb1 and actb2, two commonly used zebrafish reference genes. Stability rankings showed that two genes, namely mobk13 (mob4) and lsm12b, were more stable than actb1 and actb2 in most cases. To further test the suitability of mobk13 and lsm12b as novel reference genes, they were used to normalize three well-studied target genes. The results showed that mobk13 and lsm12b were more suitable than actb1 and actb2 with respect to zebrafish early development. We recommend mobk13 and lsm12b as new optimal reference genes for zebrafish qRT-PCR analysis during embryogenesis and early larval stages.
Z
Data from: Adapting Phrase-based Machine Translation to Normalise Medical...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Collier, Nigel (2020). Adapting Phrase-based Machine Translation to Normalise Medical Terms in Social Media Messages [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_27354
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Collier, Nigel
Limsopatham, Nut
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Data and supplementary information for the paper entitled "Adapting Phrase-based Machine Translation to Normalise Medical Terms in Social Media Messages" to be published at EMNLP 2015: Conference on Empirical Methods in Natural Language Processing — September 17–21, 2015 — Lisboa, Portugal.

ABSTRACT: Previous studies have shown that health reports in social media, such as DailyStrength and Twitter, have potential for monitoring health conditions (e.g. adverse drug reactions, infectious diseases) in particular communities. However, in order for a machine to understand and make inferences on these health conditions, the ability to recognise when laymen's terms refer to a particular medical concept (i.e. text normalisation) is required. To achieve this, we propose to adapt an existing phrase-based machine translation (MT) technique and a vector representation of words to map between a social media phrase and a medical concept. We evaluate our proposed approach using a collection of phrases from tweets related to adverse drug reactions. Our experimental results show that the combination of a phrase-based MT technique and the similarity between word vector representations outperforms the baselines that apply only either of them by up to 55%.
N
Normalizing Service Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jan 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Normalizing Service Report [Dataset]. https://www.datainsightsmarket.com/reports/normalizing-service-531581
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Jan 12, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Market Analysis for Normalizing Service The global normalizing service market is anticipated to reach a value of xx million USD by 2033, exhibiting a CAGR of xx% during the forecast period. The market growth is attributed to the rising demand for efficient data management solutions, increased adoption of cloud-based applications, and growing awareness of data normalization techniques. The market size was valued at xx million USD in 2025. North America dominates the market, followed by Europe and Asia Pacific. The market is segmented based on application into banking and financial services, healthcare, retail, manufacturing, and other industries. The banking and financial services segment is expected to hold the largest market share due to the need for data accuracy and compliance with regulatory requirements. In terms of types, the market is divided into data integration and reconciliation, data standardization, and data profiling. Data integration and reconciliation is expected to dominate the market as it helps eliminate inconsistencies and redundancy in data sets. Major players in the market include Infosys, Capgemini, IBM, Accenture, and Wipro. The Normalizing Service Market reached a value of USD 1.16 Billion in 2023 and is poised to grow at a rate of 11.7% during the forecast period, reaching a value of USD 2.23 Billion by 2032.
d
Data from: A systematic evaluation of normalization methods and probe...
search.dataone.org
data.niaid.nih.gov
+2more
Updated Nov 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra (2023). A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data [Dataset]. http://doi.org/10.5061/dryad.cnp5hqc7v
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.cnp5hqc7v
Dataset updated
Nov 30, 2023
Dataset provided by
Dryad Digital Repository
Authors
H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra
Time period covered
Jan 1, 2022
Description
Background The Infinium EPIC array measures the methylation status ofâ€‰>â€‰850,000 CpG sites. The EPIC BeadChip uses a two-array design: Infinium Type I and Type II probes. These probe types exhibit different technical characteristics which may confound analyses. Numerous normalization and pre-processing methods have been developed to reduce probe type bias as well as other issues such as background and dye bias.
Methods This study evaluates the performance of various normalization methods using 16 replicated samples and three metrics: absolute beta-value difference, overlap of non-replicated CpGs between replicate pairs, and effect on beta-value distributions. Additionally, we carried out Pearsonâ€™s correlation and intraclass correlation coefficient (ICC) analyses using both raw and SeSAMe 2 normalized data.Â
Results The method we define as SeSAMe 2, which consists of the application of the regular SeSAMe pipeline with an additional round of QC, pOOBAH masking, was found to be the b...,

Study Participants and SamplesÂ

The whole blood samples were obtained from the Health, Well-being and Aging (SaÃºde, Ben-estar e Envelhecimento, SABE) study cohort. SABE is a cohort of census-withdrawn elderly from the city of SÃ£o Paulo, Brazil, followed up every five years since the year 2000, with DNA first collected in 2010. Samples from 24 elderly adults were collected at two time points for a total of 48 samples. The first time point is the 2010 collection wave, performed from 2010 to 2012, and the second time point was set in 2020 in a COVID-19 monitoring project (9Â±0.71 years apart). The 24 individuals were 67.41Â±5.52 years of age (mean Â± standard deviation) at time point one; and 76.41Â±6.17 at time point two and comprised 13 men and 11 women.

All individuals enrolled in the SABE cohort provided written consent, and the ethic protocols were approved by local and national institutional review boards COEP/FSP/USP OF.COEP/23/10, CONEP 2044/2014, CEP HIAE 1263-10, University o..., We provide data on an Excel file, with absolute differences in beta values between replicate samples for each probe provided in different tabs for raw data and different normalization methods.
f
Data from: proteiNorm – A User-Friendly Tool for Normalization and Analysis...
acs.figshare.com
zip
Updated Jun 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefan Graw; Jillian Tang; Maroof K Zafar; Alicia K Byrd; Chris Bolden; Eric C. Peterson; Stephanie D Byrum (2023). proteiNorm – A User-Friendly Tool for Normalization and Analysis of TMT and Label-Free Protein Quantification [Dataset]. http://doi.org/10.1021/acsomega.0c02564.s002
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acsomega.0c02564.s002
Dataset updated
Jun 4, 2023
Dataset provided by
ACS Publications
Authors
Stefan Graw; Jillian Tang; Maroof K Zafar; Alicia K Byrd; Chris Bolden; Eric C. Peterson; Stephanie D Byrum
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The technological advances in mass spectrometry allow us to collect more comprehensive data with higher quality and increasing speed. With the rapidly increasing amount of data generated, the need for streamlining analyses becomes more apparent. Proteomics data is known to be often affected by systemic bias from unknown sources, and failing to adequately normalize the data can lead to erroneous conclusions. To allow researchers to easily evaluate and compare different normalization methods via a user-friendly interface, we have developed “proteiNorm”. The current implementation of proteiNorm accommodates preliminary filters on peptide and sample levels followed by an evaluation of several popular normalization methods and visualization of the missing value. The user then selects an adequate normalization method and one of the several imputation methods used for the subsequent comparison of different differential expression methods and estimation of statistical power. The application of proteiNorm and interpretation of its results are demonstrated on two tandem mass tag multiplex (TMT6plex and TMT10plex) and one label-free spike-in mass spectrometry example data set. The three data sets reveal how the normalization methods perform differently on different experimental designs and the need for evaluation of normalization methods for each mass spectrometry experiment. With proteiNorm, we provide a user-friendly tool to identify an adequate normalization method and to select an appropriate method for differential expression analysis.
dataset_4
figshare.com
xlsx
Updated Dec 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olga Ovtsarenko (2024). dataset_4 [Dataset]. http://doi.org/10.6084/m9.figshare.28049816.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28049816.v1
Dataset updated
Dec 18, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Olga Ovtsarenko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Weighted attributes normalization, scaling
n
Methods for normalizing microbiome data: an ecological perspective
data.niaid.nih.gov
datadryad.org
zip
Updated Oct 30, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger (2018). Methods for normalizing microbiome data: an ecological perspective [Dataset]. http://doi.org/10.5061/dryad.tn8qs35
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.tn8qs35
Dataset updated
Oct 30, 2018
Dataset provided by
University of New England
James Cook University
Authors
Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Microbiome sequencing data often need to be normalized due to differences in read depths, and recommendations for microbiome analyses generally warn against using proportions or rarefying to normalize data and instead advocate alternatives, such as upper quartile, CSS, edgeR-TMM, or DESeq-VS. Those recommendations are, however, based on studies that focused on differential abundance testing and variance standardization, rather than community-level comparisons (i.e., beta diversity), Also, standardizing the within-sample variance across samples may suppress differences in species evenness, potentially distorting community-level patterns. Furthermore, the recommended methods use log transformations, which we expect to exaggerate the importance of differences among rare OTUs, while suppressing the importance of differences among common OTUs. 2. We tested these theoretical predictions via simulations and a real-world data set. 3. Proportions and rarefying produced more accurate comparisons among communities and were the only methods that fully normalized read depths across samples. Additionally, upper quartile, CSS, edgeR-TMM, and DESeq-VS often masked differences among communities when common OTUs differed, and they produced false positives when rare OTUs differed. 4. Based on our simulations, normalizing via proportions may be superior to other commonly used methods for comparing ecological communities.
MIMIC-IV Lab Events Subset - Preprocessed for Data Normalization...
zenodo.org
bin, text/x-python +1
Updated Jan 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ali Azadi; ali Azadi (2025). MIMIC-IV Lab Events Subset - Preprocessed for Data Normalization Analysis.xlsx [Dataset]. http://doi.org/10.5281/zenodo.14641824
Explore at:
txt, bin, text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14641824
Dataset updated
Jan 13, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
ali Azadi; ali Azadi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file contains a preprocessed subset of the MIMIC-IV dataset (Medical Information Mart for Intensive Care, Version IV), specifically focusing on laboratory event data related to glucose levels. It has been curated and processed for research on data normalization and integration within Clinical Decision Support Systems (CDSS) to improve Human-Computer Interaction (HCI) elements.

The dataset includes the following key features:

Raw Lab Data: Original values of glucose levels as recorded in the clinical setting.

Normalized Data: Glucose levels transformed into a standardized range for comparison and analysis.

Demographic Information: Includes patient age and gender to support subgroup analyses.

This data has been used to analyze the impact of normalization and integration techniques on improving data accuracy and usability in CDSS environments. The file is provided as part of ongoing research on enhancing clinical decision-making and user interaction in healthcare systems.

Key Applications:

Research on the effects of data normalization on clinical outcomes.

Study of demographic variations in laboratory values to support personalized healthcare.

Exploration of data integration and its role in reducing cognitive load in CDSS.

Data Source:

The data originates from the publicly available MIMIC-IV database, developed and maintained by the Massachusetts Institute of Technology (MIT). Proper ethical guidelines for accessing and preprocessing the dataset have been followed.

File Content:

Filename: MIMIC-IV_LabEvents_Subset_Normalization.xlsx

File Format: Microsoft Excel

Number of Rows: 100 samples for demonstration purposes.

Fields Included: Patient ID, Age, Gender, Raw Glucose Value, Normalized Glucose Value, and additional derived statistics.
A
Data from: The Bronson Files, Dataset 5, Field 105, 2014
data.amerigeoss.org
csv, jpeg, pdf, qt +2
Updated Aug 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States (2022). The Bronson Files, Dataset 5, Field 105, 2014 [Dataset]. https://data.amerigeoss.org/dataset/the-bronson-files-dataset-5-field-105-2014-14f0b
Explore at:
csv, zip, pdf, xls, qt, jpegAvailable download formats
Dataset updated
Aug 24, 2022
Dataset provided by
United States
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Dr. Kevin Bronson provides a second year of nitrogen and water management in wheat agricultural research dataset for compute. Ten irrigation treatments from a linear sprinkler were combined with nitrogen treatments. This dataset includes notation of field events and operations, an intermediate analysis mega-table of correlated and calculated parameters, including laboratory analysis results generated during the experimentation, plus high resolution plot level intermediate data tables of SAS process output, as well as the complete raw data sensor records and logger outputs.

This proximal terrestrial high-throughput plant phenotyping data examples our early tri-metric field method, where a geo-referenced 5Hz crop canopy height, temperature and spectral signature are recorded coincident to indicate a plant health status. In this development period, our Proximal Sensing Cart Mark1 (PSCM1) platform suspends a single cluster of sensors on a dual sliding vertical placement armature.

Experimental design and operational details of research conducted are contained in related published articles, however further description of the measured data signals as well as germane commentary is herein offered.

The primary component of this dataset is the Holland Scientific (HS) CropCircle ACS-470 reflectance numbers. Which as derived here, consist of raw active optical band-pass values, digitized onboard the sensor product. Data is delivered as sequential serialized text output including the associated GPS information. Typically this is a production agriculture support technology, enabling an efficient precision application of nitrogen fertilizer. We used this optical reflectance sensor technology to investigate plant agronomic biology, as the ACS-470 is a unique performance product being not only rugged and reliable but illumination active and filter customizable.

Individualized ACS-470 sensor detector behavior and subsequent index calculation influence can be understood through analysis of white-panel and other known target measurements. When a sensor is held 120cm from a titanium dioxide white painted panel, a normalized unity value of 1.0 is set for each detector. To generate this dataset we used a Holland Scientific SC-1 device and set the 1.0 unity value (field normalize) on each sensor individually, before each data collection, and without using any channel gain boost. The SC-1 field normalization device allows a communications connection to a Windows machine, where company provided sensor control software enables the necessary sensor normalization routine, and a real-time view of streaming sensor data.

This type of active proximal multi-spectral reflectance data may be perceived as inherently “noisy”; however basic analytical description consistently resolves a biological patterning, and more advanced statistical analysis is suggested to achieve discovery. Sources of polychromatic reflectance are inherent in the environment; and can be influenced by surface features like wax or water, or presence of crystal mineralization; varying bi-directional reflectance in the proximal space is a model reality, and directed energy emission reflection sampling is expected to support physical understanding of the underling passive environmental system.

Soil in view of the sensor does decrease the raw detection amplitude of the target color returned and can add a soil reflection signal component. Yet that return accurately represents a largely two-dimensional cover and intensity signal of the target material present within each view. It does however, not represent a reflection of the plant material solely because it can contain additional features in view. Expect NDVI values greater than 0.1 when sensing plants and saturating more around 0.8, rather than the typical 0.9 of passive NDVI.

The active signal does not transmit energy to penetrate, perhaps past LAI 2.1 or less, compared to what a solar induced passive reflectance sensor would encounter. However the focus of our active sensor scan is on the uppermost expanded canopy leaves, and they are positioned to intercept the major solar energy. Active energy sensors are more easy to direct, and in our capture method we target a consistent sensor height that is 1m above the average canopy height, and maintaining a rig travel speed target around 1.5 mph, with sensors parallel to earth ground in a nadir view.

We consider these CropCircle raw detector returns to be more “instant” in generation, and “less-filtered” electronically, while onboard the “black-box” device, than are other reflectance products which produce vegetation indices as averages of multiple detector samples in time.

It is known through internal sensor performance tracking across our entire location inventory, that sensor body temperature change affects sensor raw detector returns in minor and undescribed yet apparently consistent ways.

Holland Scientific 5Hz CropCircle active optical reflectance ACS-470 sensors, that were measured on the GeoScout digital propriety serial data logger, have a stable output format as defined by firmware version. Fifteen collection events are presented.

Different numbers of csv data files were generated based on field operations, and there were a few short duration instances where GPS signal was lost. Multiple raw data files when present, including white panel measurements before or after field collections, were combined into one file, with the inclusion of the null value placeholder -9999. Two CropCircle sensors, numbered 2 and 3, were used, supplying data in a lined format, where variables are repeated for each sensor. This created a discrete data row for each individual sensor measurement instance.

We offer six high-throughput single pixel spectral colors, recorded at 530, 590, 670, 730, 780, and 800nm. The filtered band-pass was 10nm, except for the NIR, which was set to 20 and supplied an increased signal (including an increased noise).

Dual, or tandem approach, CropCircle paired sensor usage empowers additional vegetation index calculations, such as:
DATT = (r800-r730)/(r800-r670)
DATTA = (r800-r730)/(r800-r590)
MTCI = (r800-r730)/(r730-r670)
CIRE = (r800/r730)-1
CI = (r800/r590)-1
CCCI = NDRE/NDVIR800
PRI = (r590-r530)/(r590+r530)
CI800 = ((r800/r590)-1)
CI780 = ((r780/r590)-1)

The Campbell Scientific (CS) environmental data recording of small range (0 to 5 v) voltage sensor signals are accurate and largely shielded from electronic thermal induced influence, or other such factors by design. They were used as was descriptively recommended by the company. A high precision clock timing, and a recorded confluence of custom metrics, allow the Campbell Scientific raw data signal acquisitions a high research value generally, and have delivered baseline metrics in our plant phenotyping program. Raw electrical sensor signal captures were recorded at the maximum digital resolution, and could be re-processed in whole, while the subsequent onboard calculated metrics were often data typed at a lower memory precision and served our research analysis.

Improved Campbell Scientific data at 5Hz is presented for nine collection events, where thermal, ultrasonic displacement, and additional GPS metrics were recorded. Ultrasonic height metrics generated by the Honeywell sensor and present in this dataset, represent successful phenotypic recordings. The Honeywell ultrasonic displacement sensor has worked well in this application because of its 180Khz signal frequency that ranges 2m space. Air temperature is still a developing metric, a thermocouple wire junction (TC) placed in free air with a solar shade produced a low-confidence passive ambient air temperature.

Campbell Scientific logger derived data output is structured in a column format, with multiple sensor data values present in each data row. One data row represents one program output cycle recording across the sensing array, as there was no onboard logger data averaging or down sampling. Campbell Scientific data is first recorded in binary format onboard the data logger, and then upon data retrieval, converted to ASCII text via the PC based LoggerNet CardConvert application. Here, our full CS raw data output, that includes a four-line header structure, was truncated to a typical single row header of variable names. The -9999 placeholder value was inserted for null instances.

There is canopy thermal data from three view vantages. A nadir sensor view, and looking forward and backward down the plant row at a 30 degree angle off nadir. The high confidence Apogee Instruments SI-111 type infrared radiometer, non-contact thermometer, serial number 1022 was in a front position looking forward away from the platform, number 1023 with a nadir view was in middle position, and sensor number 1052 was in a rear position and looking back toward the platform frame. We have a long and successful history testing and benchmarking performance, and deploying Apogee Instruments infrared radiometers in field experimentation. They are biologically spectral window relevant sensors and return a fast update 0.2C accurate average surface temperature, derived from what is (geometrically weighted) in their field of view.

Data gaps do exist beyond null value -9999 designations, there are some instances when GPS signal was lost, or rarely on HS GeoScout logger error. GPS information may be missing at the start of data recording. However once the receiver supplies a signal the values will populate. Likewise there may be missing information at the end of a data collection, where the GPS signal was lost but sensors continue to record along with the data logger timestamping.

In the raw CS data, collections 1 through 7 are represented by only one table file, where the UTC from the GPS
d
Working With Messy Data in OpenRefine Workshop
search.dataone.org
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kelly Schultz (2023). Working With Messy Data in OpenRefine Workshop [Dataset]. http://doi.org/10.5683/SP3/YSM3JM
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/YSM3JM
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Kelly Schultz
Description
This workshop will introduce OpenRefine, a powerful open source tool for exploring, cleaning and manipulating "messy" data. Through hands-on activities, using a variety of datasets, participants will learn how to: Explore and identify patterns in data; Normalize data using facets and clusters; Manipulate and generate new textual and numeric data; Transform and reshape datasets; Use the General Regular Expression Language (GREL) to undertake manipulations, such as concatenating strings.
f
Data from: Normalization Method Utilizing Endogenous Proteins for...
acs.figshare.com
xls
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kai Yan; Yueying Yang; Yunpeng Zhang; Wanbing Zhao; Lujian Liao (2023). Normalization Method Utilizing Endogenous Proteins for Quantitative Proteomics [Dataset]. http://doi.org/10.1021/jasms.0c00012.s002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1021/jasms.0c00012.s002
Dataset updated
Jun 7, 2023
Dataset provided by
ACS Publications
Authors
Kai Yan; Yueying Yang; Yunpeng Zhang; Wanbing Zhao; Lujian Liao
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
We developed a normalization method utilizing the expression levels of a panel of endogenous proteins as normalization standards (EPNS herein). We tested the validity of the method using two sets of tandem mass tag (TMT)-labeled data and found that this normalization method effectively reduced global intensity bias at the protein level. The coefficient of variation (CV) of the overall median was reduced by 55% and 82% on average, compared to the reduction by 72% and 86% after normalization using the upper quartile. Furthermore, we used differential protein expression analysis and statistical learning to identify biomarkers for colorectal cancer from a CPTAC data set. The expression changes of a panel of proteins, including NUP205, GTPBP4, CNN2, GNL3, and S100A11, all of which highly correlate with colorectal cancer. Applying these five proteins as model features, random forest modeling obtained prediction results with the maximum AUC of 0.9998 using EPNS-normalized data, comparing favorably to the AUC of 0.9739 using the raw data. Thus, the normalization method based on EPNS reduced the global intensity bias and is applicable for quantitative proteomic analysis.
d
WLCI - Important Agricultural Lands Assessment (Input Raster: Normalized...
catalog.data.gov
data.usgs.gov
+4more
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). WLCI - Important Agricultural Lands Assessment (Input Raster: Normalized Antelope Damage Claims) [Dataset]. https://catalog.data.gov/dataset/wlci-important-agricultural-lands-assessment-input-raster-normalized-antelope-damage-claim
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Description
The values in this raster are unit-less scores ranging from 0 to 1 that represent normalized dollars per acre damage claims from antelope on Wyoming lands. This raster is one of 9 inputs used to calculate the "Normalized Importance Index."
4
GTEx (Genotype-Tissue Expression) data normalized
data.4tu.nl
figshare.com
zip
Updated Oct 26, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erdogan Taskesen (2015). GTEx (Genotype-Tissue Expression) data normalized [Dataset]. http://doi.org/10.4121/uuid:ec5bfa66-5531-482a-904f-b693aa999e8b
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:ec5bfa66-5531-482a-904f-b693aa999e8b
Dataset updated
Oct 26, 2015
Dataset provided by
TU Delft
Authors
Erdogan Taskesen
License
https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
Description
This is a normalized dataset from the original RNAseq dataset downloaded from Genotype-Tissue Expression (GTEx) project: www.gtexportal.org: RNA-SeQCv1.1.8 gene rpkm Pilot V3 patch1. The data was used to analyze how tissue samples are related to each other in terms of gene expression data The data can be used to get insights in how gene expression levels behave in in the different human tissues.
Normalization techniques for PARAFAC modeling of urine metabolomics data
data.niaid.nih.gov
xml
Updated May 11, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Radana Karlikova (2017). Normalization techniques for PARAFAC modeling of urine metabolomics data [Dataset]. https://data.niaid.nih.gov/resources?id=mtbls290
Explore at:
xmlAvailable download formats
Dataset updated
May 11, 2017
Dataset provided by
IMTM, Faculty of Medicine and Dentistry, Palacky University Olomouc, Hnevotinska 5, 775 15 Olomouc, Czech Republic
Authors
Radana Karlikova
Variables measured
Sample type, Metabolomics, Sample collection time
Description
One of the body fluids often used in metabolomics studies is urine. The peak intensities of metabolites in urine are affected by the urine history of an individual resulting in dilution differences. This requires therefore normalization of the data to correct for such differences. Two normalization techniques are commonly applied to urine samples prior to their further statistical analysis. First, AUC normalization aims to normalize a group of signals with peaks by standardizing the area under the curve (AUC) within a sample to the median, mean or any other proper representation of the amount of dilution. The second approach uses specific end-product metabolites such as creatinine and all intensities within a sample are expressed relative to the creatinine intensity. Another way of looking at urine metabolomics data is by realizing that the ratios between peak intensities are the information-carrying features. This opens up possibilities to use another class of data analysis techniques designed to deal with such ratios: compositional data analysis. In this approach special transformations are defined to deal with the ratio problem. In essence, it comes down to using another distance measure than the Euclidian Distance that is used in the conventional analysis of metabolomics data. We will illustrate using this type of approach in combination with three-way methods (i.e. PARAFAC) to be used in cases where samples of some biological material are measured at multiple time points. Aim of the paper is to develop PARAFAC modeling of three-way metabolomics data in the context of compositional data and compare this with standard normalization techniques for the specific case of urine metabolomics data.
JNCC Sentinel-2 indices Analysis Ready Data (ARD) Normalised Burn Ratio...
catalogue.ceda.ac.uk
data-search.nerc.ac.uk
Updated Mar 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joint Nature Conservation Committee (JNCC) (2024). JNCC Sentinel-2 indices Analysis Ready Data (ARD) Normalised Burn Ratio (NBR) v1 [Dataset]. https://catalogue.ceda.ac.uk/uuid/6df6b803c2784b8ab9e03834bf9a4337
Explore at:
Dataset updated
Mar 23, 2024
Dataset provided by
Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
Authors
Joint Nature Conservation Committee (JNCC)
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered

Description
Sentinel Hub NBR description: To detect burned areas, the NBR-RAW index is the most appropriate choice. Using bands 8 and 12 it highlights burnt areas in large fire zones greater than 500 acres. To observe burn severity, you may subtract the post-fire NBR image from the pre-fire NBR image. Darker pixels indicate burned areas.

NBR = (NIR – SWIR) / (NIR + SWIR)

Sentinel-2 NBR = (B08 - B12) / (B08 + B12)

These data have been created by the Joint Nature Conservation Committee (JNCC) as part of a Defra Natural Capital & Ecosystem Assessment (NCEA) project to produce a regional, and ultimately national, system for detecting a change in habitat condition at a land parcel level. The first stage of the project is focused on Yorkshire, UK, and therefore the dataset includes granules and scenes covering Yorkshire and surrounding areas only. The dataset contains the following indices derived from Defra and JNCC Sentinel-2 Analysis Ready Data.

NDVI, NDMI, NDWI, NBR, and EVI files are generated for the following Sentinel-2 granules: • T30UWE • T30UXF • T30UWF • T30UXE • T31UCV • T30UYE • T31UCA

As the project continues, JNCC will expand the geographical coverage of this dataset and will provide continuous updates as ARD becomes available.

Facebook

Twitter

Click to copy link

Link copied

Cite

Viesturs Jūlijs Lasmanis; Normunds Grūzītis (2023). LVMED: Dataset of Latvian text normalisation samples for the medical domain [Dataset]. https://repository.clarin.lv/repository/xmlui/handle/20.500.12574/85

Data from: LVMED: Dataset of Latvian text normalisation samples for the medical domain

Explore at:

Dataset updated

May 30, 2023

Authors

Viesturs Jūlijs Lasmanis; Normunds Grūzītis

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

The CSV dataset contains sentence pairs for a text-to-text transformation task: given a sentence that contains 0..n abbreviations, rewrite (normalize) the sentence in full words (word forms).

Training dataset: 64,665 sentence pairs Validation dataset: 7,185 sentence pairs. Testing dataset: 7,984 sentence pairs.

All sentences are extracted from a public web corpus (https://korpuss.lv/id/Tīmeklis2020) and contain at least one medical term.

Clear search

Close search

Google apps

Main menu

Data from: LVMED: Dataset of Latvian text normalisation samples for the...

Data from: Normalized data

Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data...

Normalizing Service Report

Normalizing Service Report

Identification of Novel Reference Genes Suitable for qRT-PCR Normalization...

Data from: Adapting Phrase-based Machine Translation to Normalise Medical...

Normalizing Service Report

Data from: A systematic evaluation of normalization methods and probe...

Data from: proteiNorm – A User-Friendly Tool for Normalization and Analysis...

dataset_4

Methods for normalizing microbiome data: an ecological perspective

MIMIC-IV Lab Events Subset - Preprocessed for Data Normalization...

Key Applications:

Data Source:

File Content:

Data from: The Bronson Files, Dataset 5, Field 105, 2014

Working With Messy Data in OpenRefine Workshop

Data from: Normalization Method Utilizing Endogenous Proteins for...

WLCI - Important Agricultural Lands Assessment (Input Raster: Normalized...

GTEx (Genotype-Tissue Expression) data normalized

Normalization techniques for PARAFAC modeling of urine metabolomics data

JNCC Sentinel-2 indices Analysis Ready Data (ARD) Normalised Burn Ratio...

Data from: LVMED: Dataset of Latvian text normalisation samples for the medical domain