100+ datasets found

f
DataSheet1_TimeNorm: a novel normalization method for time course microbiome...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
An, Lingling; Lu, Meng; Butt, Hamza; Luo, Qianwen; Du, Ruofei; Lytal, Nicholas; Jiang, Hongmei (2024). DataSheet1_TimeNorm: a novel normalization method for time course microbiome data.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001407445
Explore at:
Dataset updated
Sep 24, 2024
Authors
An, Lingling; Lu, Meng; Butt, Hamza; Luo, Qianwen; Du, Ruofei; Lytal, Nicholas; Jiang, Hongmei
Description
Metagenomic time-course studies provide valuable insights into the dynamics of microbial systems and have become increasingly popular alongside the reduction in costs of next-generation sequencing technologies. Normalization is a common but critical preprocessing step before proceeding with downstream analysis. To the best of our knowledge, currently there is no reported method to appropriately normalize microbial time-series data. We propose TimeNorm, a novel normalization method that considers the compositional property and time dependency in time-course microbiome data. It is the first method designed for normalizing time-series data within the same time point (intra-time normalization) and across time points (bridge normalization), separately. Intra-time normalization normalizes microbial samples under the same condition based on common dominant features. Bridge normalization detects and utilizes a group of most stable features across two adjacent time points for normalization. Through comprehensive simulation studies and application to a real study, we demonstrate that TimeNorm outperforms existing normalization methods and boosts the power of downstream differential abundance analysis.
d
Mission and Vision Statements (Normalized)
search.dataone.org
datasetcatalog.nlm.nih.gov
Updated Oct 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anez, Diomar; Anez, Dimar (2025). Mission and Vision Statements (Normalized) [Dataset]. http://doi.org/10.7910/DVN/SFKSW0
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SFKSW0
Dataset updated
Oct 29, 2025
Dataset provided by
Harvard Dataverse
Authors
Anez, Diomar; Anez, Dimar
Description
This dataset provides processed and normalized/standardized indices for the management tool group focused on 'Mission and Vision Statements', including related concepts like Purpose Statements. Derived from five distinct raw data sources, these indices are specifically designed for comparative longitudinal analysis, enabling the examination of trends and relationships across different empirical domains (web search, literature, academic publishing, and executive adoption). The data presented here represent transformed versions of the original source data, aimed at achieving metric comparability. Users requiring the unprocessed source data should consult the corresponding Mission/Vision dataset in the Management Tool Source Data (Raw Extracts) Dataverse. Data Files and Processing Methodologies: Google Trends File (Prefix: GT_): Normalized Relative Search Interest (RSI) Input Data: Native monthly RSI values from Google Trends (Jan 2004 - Jan 2025) for the query "mission statement" + "vision statement" + "mission and vision corporate". Processing: None. Utilizes the original base-100 normalized Google Trends index. Output Metric: Monthly Normalized RSI (Base 100). Frequency: Monthly. Google Books Ngram Viewer File (Prefix: GB_): Normalized Relative Frequency Input Data: Annual relative frequency values from Google Books Ngram Viewer (1950-2022, English corpus, no smoothing) for the query Mission Statements + Vision Statements + Purpose Statements + Mission and Vision. Processing: Annual relative frequency series normalized (peak year = 100). Output Metric: Annual Normalized Relative Frequency Index (Base 100). Frequency: Annual. Crossref.org File (Prefix: CR_): Normalized Relative Publication Share Index Input Data: Absolute monthly publication counts matching Mission/Vision-related keywords [("mission statement" OR ...) AND (...) - see raw data for full query] in titles/abstracts (1950-2025), alongside total monthly Crossref publications. Deduplicated via DOIs. Processing: Monthly relative share calculated (Mission/Vision Count / Total Count). Monthly relative share series normalized (peak month's share = 100). Output Metric: Monthly Normalized Relative Publication Share Index (Base 100). Frequency: Monthly. Bain & Co. Survey - Usability File (Prefix: BU_): Normalized Usability Index Input Data: Original usability percentages (%) from Bain surveys for specific years: Mission/Vision (1993); Mission Statements (1996); Mission and Vision Statements (1999-2017); Purpose, Mission, and Vision Statements (2022). Processing: Semantic Grouping: Data points across the different naming conventions were treated as a single conceptual series. Normalization: Combined series normalized relative to its historical peak (Max % = 100). Output Metric: Biennial Estimated Normalized Usability Index (Base 100 relative to historical peak). Frequency: Biennial (Approx.). Bain & Co. Survey - Satisfaction File (Prefix: BS_): Standardized Satisfaction Index Input Data: Original average satisfaction scores (1-5 scale) from Bain surveys for specific years (same names/years as Usability). Processing: Semantic Grouping: Data points treated as a single conceptual series. Standardization (Z-scores): Using Z = (X - 3.0) / 0.891609. Index Scale Transformation: Index = 50 + (Z * 22). Output Metric: Biennial Standardized Satisfaction Index (Center=50, Range?[1,100]). Frequency: Biennial (Approx.). File Naming Convention: Files generally follow the pattern: PREFIX_Tool_Processed.csv or similar, where the PREFIX indicates the data source (GT_, GB_, CR_, BU_, BS_). Consult the parent Dataverse description (Management Tool Comparative Indices) for general context and the methodological disclaimer. For original extraction details (specific keywords, URLs, etc.), refer to the corresponding Mission/Vision dataset in the Raw Extracts Dataverse. Comprehensive project documentation provides full details on all processing steps.
n
Data from: A systematic evaluation of normalization methods and probe...
data.niaid.nih.gov
dataone.org
+2more
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra (2023). A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data [Dataset]. http://doi.org/10.5061/dryad.cnp5hqc7v
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.cnp5hqc7v
Dataset updated
May 30, 2023
Dataset provided by
Universidade de São Paulo
Hospital for Sick Children
University of Toronto
Authors
H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Background The Infinium EPIC array measures the methylation status of > 850,000 CpG sites. The EPIC BeadChip uses a two-array design: Infinium Type I and Type II probes. These probe types exhibit different technical characteristics which may confound analyses. Numerous normalization and pre-processing methods have been developed to reduce probe type bias as well as other issues such as background and dye bias.
Methods This study evaluates the performance of various normalization methods using 16 replicated samples and three metrics: absolute beta-value difference, overlap of non-replicated CpGs between replicate pairs, and effect on beta-value distributions. Additionally, we carried out Pearson’s correlation and intraclass correlation coefficient (ICC) analyses using both raw and SeSAMe 2 normalized data.
Results The method we define as SeSAMe 2, which consists of the application of the regular SeSAMe pipeline with an additional round of QC, pOOBAH masking, was found to be the best-performing normalization method, while quantile-based methods were found to be the worst performing methods. Whole-array Pearson’s correlations were found to be high. However, in agreement with previous studies, a substantial proportion of the probes on the EPIC array showed poor reproducibility (ICC < 0.50). The majority of poor-performing probes have beta values close to either 0 or 1, and relatively low standard deviations. These results suggest that probe reliability is largely the result of limited biological variation rather than technical measurement variation. Importantly, normalizing the data with SeSAMe 2 dramatically improved ICC estimates, with the proportion of probes with ICC values > 0.50 increasing from 45.18% (raw data) to 61.35% (SeSAMe 2). Methods

Study Participants and Samples

The whole blood samples were obtained from the Health, Well-being and Aging (Saúde, Ben-estar e Envelhecimento, SABE) study cohort. SABE is a cohort of census-withdrawn elderly from the city of São Paulo, Brazil, followed up every five years since the year 2000, with DNA first collected in 2010. Samples from 24 elderly adults were collected at two time points for a total of 48 samples. The first time point is the 2010 collection wave, performed from 2010 to 2012, and the second time point was set in 2020 in a COVID-19 monitoring project (9±0.71 years apart). The 24 individuals were 67.41±5.52 years of age (mean ± standard deviation) at time point one; and 76.41±6.17 at time point two and comprised 13 men and 11 women.

All individuals enrolled in the SABE cohort provided written consent, and the ethic protocols were approved by local and national institutional review boards COEP/FSP/USP OF.COEP/23/10, CONEP 2044/2014, CEP HIAE 1263-10, University of Toronto RIS 39685.

Blood Collection and Processing

Genomic DNA was extracted from whole peripheral blood samples collected in EDTA tubes. DNA extraction and purification followed manufacturer’s recommended protocols, using Qiagen AutoPure LS kit with Gentra automated extraction (first time point) or manual extraction (second time point), due to discontinuation of the equipment but using the same commercial reagents. DNA was quantified using Nanodrop spectrometer and diluted to 50ng/uL. To assess the reproducibility of the EPIC array, we also obtained technical replicates for 16 out of the 48 samples, for a total of 64 samples submitted for further analyses. Whole Genome Sequencing data is also available for the samples described above.

Characterization of DNA Methylation using the EPIC array

Approximately 1,000ng of human genomic DNA was used for bisulphite conversion. Methylation status was evaluated using the MethylationEPIC array at The Centre for Applied Genomics (TCAG, Hospital for Sick Children, Toronto, Ontario, Canada), following protocols recommended by Illumina (San Diego, California, USA).

Processing and Analysis of DNA Methylation Data

The R/Bioconductor packages Meffil (version 1.1.0), RnBeads (version 2.6.0), minfi (version 1.34.0) and wateRmelon (version 1.32.0) were used to import, process and perform quality control (QC) analyses on the methylation data. Starting with the 64 samples, we first used Meffil to infer the sex of the 64 samples and compared the inferred sex to reported sex. Utilizing the 59 SNP probes that are available as part of the EPIC array, we calculated concordance between the methylation intensities of the samples and the corresponding genotype calls extracted from their WGS data. We then performed comprehensive sample-level and probe-level QC using the RnBeads QC pipeline. Specifically, we (1) removed probes if their target sequences overlap with a SNP at any base, (2) removed known cross-reactive probes (3) used the iterative Greedycut algorithm to filter out samples and probes, using a detection p-value threshold of 0.01 and (4) removed probes if more than 5% of the samples having a missing value. Since RnBeads does not have a function to perform probe filtering based on bead number, we used the wateRmelon package to extract bead numbers from the IDAT files and calculated the proportion of samples with bead number < 3. Probes with more than 5% of samples having low bead number (< 3) were removed. For the comparison of normalization methods, we also computed detection p-values using out-of-band probes empirical distribution with the pOOBAH() function in the SeSAMe (version 1.14.2) R package, with a p-value threshold of 0.05, and the combine.neg parameter set to TRUE. In the scenario where pOOBAH filtering was carried out, it was done in parallel with the previously mentioned QC steps, and the resulting probes flagged in both analyses were combined and removed from the data.

Normalization Methods Evaluated

The normalization methods compared in this study were implemented using different R/Bioconductor packages and are summarized in Figure 1. All data was read into R workspace as RG Channel Sets using minfi’s read.metharray.exp() function. One sample that was flagged during QC was removed, and further normalization steps were carried out in the remaining set of 63 samples. Prior to all normalizations with minfi, probes that did not pass QC were removed. Noob, SWAN, Quantile, Funnorm and Illumina normalizations were implemented using minfi. BMIQ normalization was implemented with ChAMP (version 2.26.0), using as input Raw data produced by minfi’s preprocessRaw() function. In the combination of Noob with BMIQ (Noob+BMIQ), BMIQ normalization was carried out using as input minfi’s Noob normalized data. Noob normalization was also implemented with SeSAMe, using a nonlinear dye bias correction. For SeSAMe normalization, two scenarios were tested. For both, the inputs were unmasked SigDF Sets converted from minfi’s RG Channel Sets. In the first, which we call “SeSAMe 1”, SeSAMe’s pOOBAH masking was not executed, and the only probes filtered out of the dataset prior to normalization were the ones that did not pass QC in the previous analyses. In the second scenario, which we call “SeSAMe 2”, pOOBAH masking was carried out in the unfiltered dataset, and masked probes were removed. This removal was followed by further removal of probes that did not pass previous QC, and that had not been removed by pOOBAH. Therefore, SeSAMe 2 has two rounds of probe removal. Noob normalization with nonlinear dye bias correction was then carried out in the filtered dataset. Methods were then compared by subsetting the 16 replicated samples and evaluating the effects that the different normalization methods had in the absolute difference of beta values (|β|) between replicated samples.
f
Data from: Adaptive Inference for Change Points in High-Dimensional Data
figshare.com
tandf.figshare.com
pdf
Updated Apr 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yangfan Zhang; Runmin Wang; Xiaofeng Shao (2021). Adaptive Inference for Change Points in High-Dimensional Data [Dataset]. http://doi.org/10.6084/m9.figshare.13757610.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13757610.v1
Dataset updated
Apr 27, 2021
Dataset provided by
Taylor & Francis
Authors
Yangfan Zhang; Runmin Wang; Xiaofeng Shao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this article, we propose a class of test statistics for a change point in the mean of high-dimensional independent data. Our test integrates the U-statistic based approach in a recent work by Wang et al. and the Lq-norm based high-dimensional test in a recent work by He et al., and inherits several appealing features such as being tuning parameter free and asymptotic independence for test statistics corresponding to even q’s. A simple combination of test statistics corresponding to several different q’s leads to a test with adaptive power property, that is, it can be powerful against both sparse and dense alternatives. On the estimation front, we obtain the convergence rate of the maximizer of our test statistic standardized by sample size when there is one change-point in mean and q = 2, and propose to combine our tests with a wild binary segmentation algorithm to estimate the change-point number and locations when there are multiple change-points. Numerical comparisons using both simulated and real data demonstrate the advantage of our adaptive test and its corresponding estimation method.
d
Growth Strategies (Normalized)
search.dataone.org
datasetcatalog.nlm.nih.gov
Updated Oct 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anez, Diomar; Anez, Dimar (2025). Growth Strategies (Normalized) [Dataset]. http://doi.org/10.7910/DVN/OW8GOW
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/OW8GOW
Dataset updated
Oct 29, 2025
Dataset provided by
Harvard Dataverse
Authors
Anez, Diomar; Anez, Dimar
Description
This dataset provides processed and normalized/standardized indices for the management tool group focused on 'Growth Strategies'. Derived from five distinct raw data sources, these indices are specifically designed for comparative longitudinal analysis, enabling the examination of trends and relationships across different empirical domains (web search, literature, academic publishing, and executive adoption). The data presented here represent transformed versions of the original source data, aimed at achieving metric comparability. Users requiring the unprocessed source data should consult the corresponding Growth Strategies dataset in the Management Tool Source Data (Raw Extracts) Dataverse. Data Files and Processing Methodologies: Google Trends File (Prefix: GT_): Normalized Relative Search Interest (RSI) Input Data: Native monthly RSI values from Google Trends (Jan 2004 - Jan 2025) for the query "growth strategies" + "growth strategy" + "growth strategies business". Processing: None. Utilizes the original base-100 normalized Google Trends index. Output Metric: Monthly Normalized RSI (Base 100). Frequency: Monthly. Google Books Ngram Viewer File (Prefix: GB_): Normalized Relative Frequency Input Data: Annual relative frequency values from Google Books Ngram Viewer (1950-2022, English corpus, no smoothing) for the query Growth Strategies + Growth Strategy. Processing: Annual relative frequency series normalized (peak year = 100). Output Metric: Annual Normalized Relative Frequency Index (Base 100). Frequency: Annual. Crossref.org File (Prefix: CR_): Normalized Relative Publication Share Index Input Data: Absolute monthly publication counts matching Growth Strategies-related keywords [("growth strategies" OR ...) AND (...) - see raw data for full query] in titles/abstracts (1950-2025), alongside total monthly Crossref publications. Deduplicated via DOIs. Processing: Monthly relative share calculated (Growth Strat. Count / Total Count). Monthly relative share series normalized (peak month's share = 100). Output Metric: Monthly Normalized Relative Publication Share Index (Base 100). Frequency: Monthly. Bain & Co. Survey - Usability File (Prefix: BU_): Normalized Usability Index Input Data: Original usability percentages (%) from Bain surveys for specific years: Growth Strategies (1996, 1999, 2000, 2002, 2004); Growth Strategy Tools (2006, 2008). Note: Not reported after 2008. Processing: Semantic Grouping: Data points for "Growth Strategies" and "Growth Strategy Tools" were treated as a single conceptual series. Normalization: Combined series normalized relative to its historical peak (Max % = 100). Output Metric: Biennial Estimated Normalized Usability Index (Base 100 relative to historical peak). Frequency: Biennial (Approx.). Bain & Co. Survey - Satisfaction File (Prefix: BS_): Standardized Satisfaction Index Input Data: Original average satisfaction scores (1-5 scale) from Bain surveys for specific years: Growth Strategies (1996-2004); Growth Strategy Tools (2006, 2008). Note: Not reported after 2008. Processing: Semantic Grouping: Data points treated as a single conceptual series. Standardization (Z-scores): Using Z = (X - 3.0) / 0.891609. Index Scale Transformation: Index = 50 + (Z * 22). Output Metric: Biennial Standardized Satisfaction Index (Center=50, Range?[1,100]). Frequency: Biennial (Approx.). File Naming Convention: Files generally follow the pattern: PREFIX_Tool_Processed.csv or similar, where the PREFIX indicates the data source (GT_, GB_, CR_, BU_, BS_). Consult the parent Dataverse description (Management Tool Comparative Indices) for general context and the methodological disclaimer. For original extraction details (specific keywords, URLs, etc.), refer to the corresponding Growth Strategies dataset in the Raw Extracts Dataverse. Comprehensive project documentation provides full details on all processing steps.
D
EV Charging Data Normalization Middleware Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Oct 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). EV Charging Data Normalization Middleware Market Research Report 2033 [Dataset]. https://dataintelo.com/report/ev-charging-data-normalization-middleware-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Oct 1, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
EV Charging Data Normalization Middleware Market Outlook

According to our latest research, the global EV Charging Data Normalization Middleware market size reached USD 412.6 million in 2024, driven by the rapid expansion of electric vehicle infrastructure and the increasing need for seamless data integration across diverse charging networks. The market is experiencing a robust compound annual growth rate (CAGR) of 22.4% and is forecasted to achieve a value of USD 2,977.9 million by 2033. This impressive growth can be attributed to the surging adoption of electric vehicles worldwide, the proliferation of charging points, and the urgent requirement for standardized data management solutions that can ensure interoperability, optimize operations, and enhance user experience across fragmented EV charging ecosystems.

A primary growth factor fueling the EV Charging Data Normalization Middleware market is the exponential increase in electric vehicle adoption globally. As governments and private entities invest heavily in expanding EV charging infrastructure, the diversity of hardware, software, and communication protocols has created significant interoperability challenges. Data normalization middleware acts as a critical intermediary, harmonizing disparate data streams from various charging stations, networks, and management platforms. This enables seamless communication, real-time analytics, and efficient billing, which are essential for scaling up EV infrastructure. The demand for robust middleware solutions is further amplified by regulatory mandates for open data exchange, transparent reporting, and the integration of renewable energy sources, all of which necessitate advanced data normalization capabilities.

Another significant driver is the evolution of smart grid technologies and the integration of EV charging with energy management systems. Utilities and energy providers are increasingly leveraging data normalization middleware to aggregate and analyze charging data, optimize load balancing, and support demand response initiatives. These solutions facilitate the synchronization of EV charging activities with grid requirements, reducing peak loads and enabling dynamic pricing models. The middleware's ability to provide real-time, standardized data empowers stakeholders to make informed decisions, enhance operational efficiency, and improve customer satisfaction. Additionally, advancements in artificial intelligence and machine learning are enabling middleware platforms to deliver predictive analytics, automate fault detection, and support proactive maintenance, further enhancing their value proposition in the market.

The proliferation of public and private charging networks, coupled with the emergence of new business models such as fleet electrification and vehicle-to-grid (V2G) services, is also catalyzing market growth. Fleet operators and charging network providers rely on data normalization middleware to consolidate information from heterogeneous sources, streamline operations, and deliver unified services to end-users. As the competitive landscape intensifies, stakeholders are prioritizing middleware solutions that offer scalability, security, and compliance with industry standards. The increasing collaboration between automotive OEMs, technology vendors, and energy companies is fostering innovation and accelerating the deployment of advanced middleware platforms tailored to diverse application scenarios.

From a regional perspective, Europe is currently leading the EV Charging Data Normalization Middleware market, accounting for a significant share of global revenues due to its mature EV ecosystem, stringent regulatory frameworks, and aggressive decarbonization targets. North America is witnessing rapid growth, driven by substantial investments in EV infrastructure and supportive government policies. The Asia Pacific region is emerging as a lucrative market, propelled by the rapid adoption of electric vehicles in China, India, and Southeast Asia, coupled with large-scale infrastructure development initiatives. These regional dynamics are shaping the competitive landscape and influencing the strategic priorities of market participants worldwide.

Component Analysis

The Component segment of the EV Charging Data Normalization Middleware market is bifurcated into Software and Services, each playing a pivotal role in the e
H
Data and Code for: "Universal Adaptive Normalization Scale (AMIS):...
dataverse.harvard.edu
search.dataone.org
Updated Nov 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gennady Kravtsov (2025). Data and Code for: "Universal Adaptive Normalization Scale (AMIS): Integration of Heterogeneous Metrics into a Unified System" [Dataset]. http://doi.org/10.7910/DVN/BISM0N
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/BISM0N
Dataset updated
Nov 12, 2025
Dataset provided by
Harvard Dataverse
Authors
Gennady Kravtsov
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Dataset Title: Data and Code for: "Universal Adaptive Normalization Scale (AMIS): Integration of Heterogeneous Metrics into a Unified System" Description: This dataset contains source data and processing results for validating the Adaptive Multi-Interval Scale (AMIS) normalization method. Includes educational performance data (student grades), economic statistics (World Bank GDP), and Python implementation of the AMIS algorithm with graphical interface. Contents: - Source data: educational grades and GDP statistics - AMIS normalization results (3, 5, 9, 17-point models) - Comparative analysis with linear normalization - Ready-to-use Python code for data processing Applications: - Educational data normalization and analysis - Economic indicators comparison - Development of unified metric systems - Methodology research in data scaling Technical info: Python code with pandas, numpy, scipy, matplotlib dependencies. Data in Excel format.
H
Scenario Planning (Normalized)
datasetcatalog.nlm.nih.gov
search.dataone.org
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anez, Diomar; Anez, Dimar (2025). Scenario Planning (Normalized) [Dataset]. http://doi.org/10.7910/DVN/YX7VBS
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/YX7VBS
Dataset updated
May 6, 2025
Authors
Anez, Diomar; Anez, Dimar
Description
This dataset provides processed and normalized/standardized indices for the management tool group 'Scenario Planning', including related concepts like Scenario Analysis and Contingency Planning. Derived from five distinct raw data sources, these indices are specifically designed for comparative longitudinal analysis, enabling the examination of trends and relationships across different empirical domains (web search, literature, academic publishing, and executive adoption). The data presented here represent transformed versions of the original source data, aimed at achieving metric comparability. Users requiring the unprocessed source data should consult the corresponding Scenario Planning dataset in the Management Tool Source Data (Raw Extracts) Dataverse. Data Files and Processing Methodologies: Google Trends File (Prefix: GT_): Normalized Relative Search Interest (RSI) Input Data: Native monthly RSI values from Google Trends (Jan 2004 - Jan 2025) for the query "scenario planning" + "scenario analysis" + "contingency planning" + "scenario planning business". Processing: None. Utilizes the original base-100 normalized Google Trends index. Output Metric: Monthly Normalized RSI (Base 100). Frequency: Monthly. Google Books Ngram Viewer File (Prefix: GB_): Normalized Relative Frequency Input Data: Annual relative frequency values from Google Books Ngram Viewer (1950-2022, English corpus, no smoothing) for the query Scenario Planning + Scenario Analysis + Contingency Planning + Scenario and Contingency Planning. Processing: Annual relative frequency series normalized (peak year = 100). Output Metric: Annual Normalized Relative Frequency Index (Base 100). Frequency: Annual. Crossref.org File (Prefix: CR_): Normalized Relative Publication Share Index Input Data: Absolute monthly publication counts matching Scenario Planning-related keywords [("scenario planning" OR ...) AND ("management" OR ...) - see raw data for full query] in titles/abstracts (1950-2025), alongside total monthly Crossref publications. Deduplicated via DOIs. Processing: Monthly relative share calculated (Scenario Planning Count / Total Count). Monthly relative share series normalized (peak month's share = 100). Output Metric: Monthly Normalized Relative Publication Share Index (Base 100). Frequency: Monthly. Bain & Co. Survey - Usability File (Prefix: BU_): Normalized Usability Index Input Data: Original usability percentages (%) from Bain surveys for specific years: Scenario Planning (1993, 1999, 2000); Scenario and Contingency Planning (2004, 2006, 2008, 2010, 2012, 2014, 2017); Scenario Analysis and Contingency Planning (2022). Processing: Semantic Grouping: Data points across the different naming conventions were treated as a single conceptual series. Normalization: Combined series normalized relative to its historical peak (Max % = 100). Output Metric: Biennial Estimated Normalized Usability Index (Base 100 relative to historical peak). Frequency: Biennial (Approx.). Bain & Co. Survey - Satisfaction File (Prefix: BS_): Standardized Satisfaction Index Input Data: Original average satisfaction scores (1-5 scale) from Bain surveys for specific years: Scenario Planning (1993, 1999, 2000); Scenario and Contingency Planning (2004, 2006, 2008, 2010, 2012, 2014, 2017); Scenario Analysis and Contingency Planning (2022). Processing: Semantic Grouping: Data points treated as a single conceptual series. Standardization (Z-scores): Using Z = (X - 3.0) / 0.891609. Index Scale Transformation: Index = 50 + (Z * 22). Output Metric: Biennial Standardized Satisfaction Index (Center=50, Range?[1,100]). Frequency: Biennial (Approx.). File Naming Convention: Files generally follow the pattern: PREFIX_Tool_Processed.csv or similar, where the PREFIX indicates the data source (GT_, GB_, CR_, BU_, BS_). Consult the parent Dataverse description (Management Tool Comparative Indices) for general context and the methodological disclaimer. For original extraction details (specific keywords, URLs, etc.), refer to the corresponding Scenario Planning dataset in the Raw Extracts Dataverse. Comprehensive project documentation provides full details on all processing steps.
4
Dataset of ISBA open loop states and ASCAT normalized backscatter and slope...
data.4tu.nl
zip
Updated May 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xu Shan; Susan Steele-Dunne; Ou Ku; Sebastian Hahn; Wolfgang Wagner; Jean-Christophe Calvet; Bertrand Bonan; Albergel Clement (2024). Dataset of ISBA open loop states and ASCAT normalized backscatter and slope data of grid points [Dataset]. http://doi.org/10.4121/f37fda03-61d7-4d2a-835f-30fbb63537b1.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/f37fda03-61d7-4d2a-835f-30fbb63537b1.v1
Dataset updated
May 2, 2024
Dataset provided by
4TU.ResearchData
Authors
Xu Shan; Susan Steele-Dunne; Ou Ku; Sebastian Hahn; Wolfgang Wagner; Jean-Christophe Calvet; Bertrand Bonan; Albergel Clement
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
grid points in west Europe where there are International Soil Moisture Stations
Dataset funded by
NWO
Netherlands eScience Center
Description
This dataset comprises open loop simulations of the ISBA land surface model and ASCAT normalized backscatter and slope data at grid points (GPIs) containing International Soil Moisture Network stations. The dataset covers the period from 2007 to 2019 and includes a total of 31 GPIs.
The ISBA land surface model runs on the offline version of SURFEX software (https://www.umr-cnrm.fr/surfex/) using ERA5 forcing data. ASCAT normalized backscatter (“sig”) data are provided at 10am and 10pm every day. If normalized backscatter are not acquired at a particular time, a value of 999 is provided.
ASCAT slope (“slop”) is calculated daily following Hahn et al. (2017). These daily values are provided at 10am in this dataset. The slope value at 10pm is set to 999.
H
Talent & Employee Engagement (Normalized)
dataverse.harvard.edu
search.dataone.org
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diomar Anez; Dimar Anez (2025). Talent & Employee Engagement (Normalized) [Dataset]. http://doi.org/10.7910/DVN/MOCGHM
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/MOCGHM
Dataset updated
May 6, 2025
Dataset provided by
Harvard Dataverse
Authors
Diomar Anez; Dimar Anez
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset provides processed and normalized/standardized indices for the management tool group focused on 'Talent & Employee Engagement', including concepts like Employee Engagement Surveys/Systems and Corporate Codes of Ethics. Derived from five distinct raw data sources, these indices are specifically designed for comparative longitudinal analysis, enabling the examination of trends and relationships across different empirical domains (web search, literature, academic publishing, and executive adoption). The data presented here represent transformed versions of the original source data, aimed at achieving metric comparability. Users requiring the unprocessed source data should consult the corresponding Talent/Engagement dataset in the Management Tool Source Data (Raw Extracts) Dataverse. Data Files and Processing Methodologies: Google Trends File (Prefix: GT_): Normalized Relative Search Interest (RSI) Input Data: Native monthly RSI values from Google Trends (Jan 2004 - Jan 2025) for the query "corporate code of ethics" + "employee engagement" + "employee engagement management". Processing: None. Utilizes the original base-100 normalized Google Trends index. Output Metric: Monthly Normalized RSI (Base 100). Frequency: Monthly. Google Books Ngram Viewer File (Prefix: GB_): Normalized Relative Frequency Input Data: Annual relative frequency values from Google Books Ngram Viewer (1950-2022, English corpus, no smoothing) for the query Corporate Code of Ethics+Employee Engagement Programs+Employee Engagement Surveys+Employee Engagement. Processing: Annual relative frequency series normalized (peak year = 100). Output Metric: Annual Normalized Relative Frequency Index (Base 100). Frequency: Annual. Crossref.org File (Prefix: CR_): Normalized Relative Publication Share Index Input Data: Absolute monthly publication counts matching Engagement/Ethics-related keywords [("corporate code of ethics" OR ...) AND (...) - see raw data for full query] in titles/abstracts (1950-2025), alongside total monthly Crossref publications. Deduplicated via DOIs. Processing: Monthly relative share calculated (Engage/Ethics Count / Total Count). Monthly relative share series normalized (peak month's share = 100). Output Metric: Monthly Normalized Relative Publication Share Index (Base 100). Frequency: Monthly. Bain & Co. Survey - Usability File (Prefix: BU_): Normalized Usability Index Input Data: Original usability percentages (%) from Bain surveys for specific years: Corporate Code of Ethics (2002); Employee Engagement Surveys (2012, 2014); Employee Engagement Systems (2017, 2022). Processing: Semantic Grouping: Data points across related names treated as a single conceptual series representing Talent/Engagement focus. Normalization: Combined series normalized relative to its historical peak (Max % = 100). Output Metric: Biennial Estimated Normalized Usability Index (Base 100 relative to historical peak). Frequency: Biennial (Approx.). Bain & Co. Survey - Satisfaction File (Prefix: BS_): Standardized Satisfaction Index Input Data: Original average satisfaction scores (1-5 scale) from Bain surveys for specific years (same names/years as Usability). Processing: Semantic Grouping: Data points treated as a single conceptual series. Standardization (Z-scores): Using Z = (X - 3.0) / 0.891609. Index Scale Transformation: Index = 50 + (Z * 22). Output Metric: Biennial Standardized Satisfaction Index (Center=50, Range?[1,100]). Frequency: Biennial (Approx.). File Naming Convention: Files generally follow the pattern: PREFIX_Tool_Processed.csv or similar, where the PREFIX indicates the data source (GT_, GB_, CR_, BU_, BS_). Consult the parent Dataverse description (Management Tool Comparative Indices) for general context and the methodological disclaimer. For original extraction details (specific keywords, URLs, etc.), refer to the corresponding Talent/Engagement dataset in the Raw Extracts Dataverse. Comprehensive project documentation provides full details on all processing steps.
d
LiDAR - Normalized Digital Surface Model - Tiles
catalog.data.gov
opendata.dc.gov
+3more
Updated Feb 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
D.C. Office of the Chief Technology Officer (2025). LiDAR - Normalized Digital Surface Model - Tiles [Dataset]. https://catalog.data.gov/dataset/lidar-normalized-digital-surface-model-tiles
Explore at:
Dataset updated
Feb 4, 2025
Dataset provided by
D.C. Office of the Chief Technology Officer
Description
Normalizd Digital Surface Model - 1m resolution. The dataset contains the 1m Digital Surface Model for the District of Columbia.Some areas have limited data. The lidar dataset redaction was conducted under the guidance of the United States Secret Service. Except for classified ground points and classified water points, all lidar data returns and collected data were removed from the dataset within the United States Secret Service 1m redaction boundary generated for the 2017 orthophoto flight
d
2024 LiDAR - Normalized Digital Surface Model
catalog.data.gov
opendata.dc.gov
+1more
Updated Feb 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of the Chief Technology Officer (2025). 2024 LiDAR - Normalized Digital Surface Model [Dataset]. https://catalog.data.gov/dataset/2024-lidar-normalized-digital-surface-model
Explore at:
Dataset updated
Feb 4, 2025
Dataset provided by
Office of the Chief Technology Officer
Description
Normalized Digital Surface Model - 1m resolution. The dataset contains the 1m Normalized Digital Surface Model for the District of Columbia. These lidar data are processed classified LAS 1.4 files at USGS QL1 covering the District of Columbia. Some areas have limited data. The lidar dataset redaction was conducted under the guidance of the United States Secret Service. All data returns were removed from the dataset within the United States Secret Service redaction boundary except for classified ground points and classified water points.
UniCourt Legal Analytics API - USA Legal Data (AI Normalized)
datarade.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UniCourt, UniCourt Legal Analytics API - USA Legal Data (AI Normalized) [Dataset]. https://datarade.ai/data-products/unicourt-legal-analytics-api-usa-legal-data-ai-normalized-unicourt
Explore at:
Dataset provided by
Unicourt
Authors
UniCourt
Area covered
United States
Description
UniCourt provides easy access to normalized legal analytics data via our Attorney Analytics API, Law Firm Analytics API, Judge Analytics API, Party Analytics API, and Court Analytics API, giving you the flexibility you need to intuitively move between interconnected data points. This structure can be used for AI & ML Training Data.

Build the Best Legal Analytics Possible

• UniCourt collects court data from hundreds of state and federal trial court databases, as well as attorney bar data, judicial records data, and Secretary of State data. • We then combine all of those data sets together through our entity normalization process to identify who’s who in litigation, so you can download structured data via our APIs and build the best legal analytics possible.

Flexible Analytics APIs for Meaningful Integrations

• UniCourt’s Legal Analytics APIs put billions of data points at your fingertips and give you the flexibility you need to integrate analytics into your matter management systems, BI dashboards, data lakes, CRMs, and other data management tools. • Create on-demand, self-service reporting options within your internal applications and set up automated data feeds to keep your mission critical analytics reports regularly refreshed with updated data.

What Legal Analytics APIs Are Available?

UniCourt offers a wide range of Legal Analytics APIs and various end-points to provide the data you need. Here are the core analytics APIs we provide:

• Attorney Analytics API • Law Firm Analytics API • Judge Analytics API • Party Analytics API • Case Analytics API
d
2020 LiDAR - Normalized Digital Surface Model
catalog.data.gov
opendata.dc.gov
+5more
Updated May 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
D.C. Office of the Chief Technology Officer (2025). 2020 LiDAR - Normalized Digital Surface Model [Dataset]. https://catalog.data.gov/dataset/2020-lidar-normalized-digital-surface-model
Explore at:
Dataset updated
May 7, 2025
Dataset provided by
D.C. Office of the Chief Technology Officer
Description
Normalizd Digital Surface Model - 1m resolution. The dataset contains the 1m Digital Surface Model for the District of Columbia. Some areas have limited data. The lidar dataset redaction was conducted under the guidance of the United States Secret Service. Except for classified ground points and classified water points, all lidar data returns and collected data were removed from the dataset within the United States Secret Service 1m redaction boundary generated for the 2017 orthophoto flight. This dataset is provided as an ArcGIS Image service. Please note, the download feature for this image service in Open Data DC provides a compressed PNG, JPEG or TIFF. The compressed GeoTIFF mosaic raster dataset is available under additional options when viewing downloads. Requests for the individual GeoTIFF set of images should be sent to open.data@dc.gov.
d
2022 LiDAR - Normalized Digital Surface Model
catalog.data.gov
opendata.dc.gov
+1more
Updated May 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of the Chief Technology Officer (2025). 2022 LiDAR - Normalized Digital Surface Model [Dataset]. https://catalog.data.gov/dataset/2022-lidar-normalized-digital-surface-model
Explore at:
Dataset updated
May 7, 2025
Dataset provided by
Office of the Chief Technology Officer
Description
Normalized Digital Surface Model - 1m resolution. The dataset contains the 1m Normalized Digital Surface Model for the District of Columbia. Some areas have limited data. The lidar dataset redaction was conducted under the guidance of the United States Secret Service. All data returns were removed from the dataset within the United States Secret Service redaction boundary except for classified ground points and classified water points.
Data for On The Finite-Size Lyapunov Exponent For The Schrodinger Operator...
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Michael Kielstra; Paul Michael Kielstra; Marius Lemm; Marius Lemm (2020). Data for On The Finite-Size Lyapunov Exponent For The Schrodinger Operator With Skew-Shift Potential [Dataset]. http://doi.org/10.5281/zenodo.2638904
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.2638904
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Paul Michael Kielstra; Paul Michael Kielstra; Marius Lemm; Marius Lemm
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
A ZIP file of computer code and output used in the numerical calculations for On The Finite-Size Lyapunov Exponent For The Schrodinger Operator With Skew-Shift Potential by Paul Michael Kielstra and Marius Lemm. The ZIP decompresses to about 26GB, containing multiple files:

201x201 bad set grid.txt: A list of 201x201=40401 evenly spaced points on [0, 1]x[0, 1], each written in the form (x, y) and followed by 30000 values of E which are probably bad for that point. This gives a total of 40401x30001=1212070401 lines.

Upper bounds.txt: individual upper bounds for equation (9) calculated at various points. The bound in this equation in the published paper is the worst of these.

E=0/N/2001x2001 grid.tsv: A tab-separated values file of 2001x2001=4004001 evenly spaced points on [0, 1]x[0, 1], with headers:
X: The x-coordinate of the point represented by the line in question.
Y: The y-coordinate.
Exact_x, Exact_y: The x- and y-coordinates to the maximum precision the computer used. In case, for instance, the x-coordinate is defined to be 0.5 but is actually 0.5000000000000001 in memory.
Matrix: The matrix generated at this point, modulo a certain normalization (see below).
Result: The log of the norm of the matrix. This has been corrected for the normalization -- it is calculated as if the matrix had never been normalized.
Normalizationcount: The actual matrix generated is too large to store in memory, so the matrix we store and output is (Matrix)x(Normalizer^Normalizationcount). We used a normalizer of 0.01.
This file was calculated with the values E=0, N=30000, lambda=1/2. The header line means that this file contains 4004001+1=4004002 lines in total.

E=0/N/2001x2001 random grid.tsv: As with the 2001x2001 grid.tsv file, but missing the exact_x and exact_y coordinates. Instead, the x and y values are both exact and randomly chosen. The lines in the file are in no particular order. This file contains the data for the Monte Carlo approximation used in the paper.

E=0/2N/2001x2001 grid.tsv: As with its counterpart in the folder labeled N, but calculated with N=60000 instead.

E=-2.495: As with its counterpart E=0, but everything is calculated with E=-2.495123260049612 (which we round to -2.49512326 in the paper). This folder also contains no random or Monte Carlo calculations.

Code/Multiplier.m: MATLAB code to generate the skew matrix at a given point.

Code/Iterator.m: MATLAB code to iterate over a series of points and call Multiplier at each.

Code/Striper.m: MATLAB code to split up the input space into a series of stripes and call Iterator on exactly one of them. We performed our calculations in parallel, each job consisting of calling Striper on a different stripe number.

Code/Badfinder.m: MATLAB code to take a point and output a series of E-values for which that point is in the bad set.

Code/BadSetIterator.m: As with Iterator.m, but calls Badfinder.

Code/BadSetStriper.m: As with Striper.m, but calls BadSetIterator. (The function in this file is also called Striper.)
Optimized 478-Point 3D Facial Landmark Dataset
kaggle.com
zip
Updated Oct 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pasindu Sewmuthu Abewickrama Singhe (2025). Optimized 478-Point 3D Facial Landmark Dataset [Dataset]. https://www.kaggle.com/datasets/psewmuthu/optimized-video-facial-landmarks
Explore at:
zip(3661229390 bytes)Available download formats
Dataset updated
Oct 19, 2025
Authors
Pasindu Sewmuthu Abewickrama Singhe
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Optimized 478-Point Normalized 3D Facial Landmark Dataset

Overview

This dataset provides pre-extracted, normalized, and optimized 3D facial landmark coordinates from a large corpus of video data, optimized for tasks such as frame-level emotion recognition, facial expression analysis, and head pose estimation.

Each row represents a single video frame and contains the corresponding emotion label and the x, y, z coordinates for 478 distinct facial landmarks, as generated by the MediaPipe Face Landmarker model.

Original Video Dataset Details

The data presented here is a pre-processed feature set derived from the Video Emotion dataset.

Key characteristics of the original video data include:

Source: Video Emotion (Source User: thnhthngchu)

Domain: Facial expressions and affective computing, likely covering a range of in-the-wild or constrained scenarios.

Labels: The videos were originally labeled with clip-level emotional categories.

License of Original Data: Please refer to the licensing terms specified by the original source dataset on Kaggle to ensure proper usage and attribution.

Data Collection and Preprocessing

The features were extracted using the MediaPipe Face Landmarker model.

Frame Extraction: Each video file was processed frame-by-frame.

Landmark Detection: For each frame, the 478 facial landmarks were detected.

Normalization: All coordinates (x, y, z) are normalized to the range [0.0, 1.0] relative to the bounding box of the face or the original frame dimensions, making the features robust to varying resolutions and distances.

Data Structure: The extracted coordinates, along with video identifiers and ground truth emotion labels, were flattened into a single tabular Parquet format.

Columns and Features Description

The dataset is provided in a single Parquet file, typically named emotion_landmark_dataset.parquet. It contains 3 + (478 x 3) = 1437 columns.

| Column Name | Data Type | Description | | :--- | :--- | :--- | | video_filename | String | The identifier of the original video file from which the frame was extracted. | | frame_num | Integer | The sequential frame index within the original video file. | | emotion | String/Categorical | The ground truth emotion label for this clip. Classes include: Angry, Disgust, Fear, Happy, Neutral, Sad. | | x_0 to x_477 | Float | The normalized X coordinate (horizontal position) for each of the 478 landmarks (0.0 to 1.0). | | y_0 to y_477 | Float | The normalized Y coordinate (vertical position) for each of the 478 landmarks (0.0 to 1.0). | | z_0 to z_477 | Float | The normalized Z coordinate (depth, relative to the face center) for each of the 478 landmarks. |

Note on Coordinates: Since the coordinates are normalized, they must be multiplied by the respective pixel width and height of the frame to visualize them on the video.

Visualization Example (Sample Test Image)

To ensure the coordinates have been extracted correctly and to demonstrate the data visually, please refer to the Optimized 3D Facial Landmark Dataset Usage provided in the Code Tab of this dataset.

The notebook includes a Python script that loads random frames from the original video files, correctly denormalizes the coordinates using the frame's dimensions, and plots the 478 landmarks on the face.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F21201155%2Fb7b309582a6a18470eab4e288224f14b%2F_results_28_0.png?generation=1760895678667957&alt=media" alt="Visualization">

Potential Applications

Transfer Learning: Use the landmarks as input features for a classifier (e.g., LSTM, CNN) to train highly accurate, lightweight emotion recognition models without needing to process large raw video files.

Data Augmentation: Analyze the distributions of landmarks for generating synthetic data.

Model Comparison: Benchmark the effectiveness of 3D facial landmarks against other feature extraction methods.

Biometrics: Advanced facial tracking and identity verification research.

Citation

If you use this dataset in your research or project, please cite the original video dataset (if applicable) and acknowledge the use of the MediaPipe framework for feature extraction.

MediaPipe: Google Inc. (2020). MediaPipe.
f
Data from: MetaboGroupS: A Group Entropy-Based Web Platform for Evaluating...
figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated Jun 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shisheng Wang; Xiaolei Chen; Dan Du; Wen Zheng; Liqiang Hu; Hao Yang; Jingqiu Cheng; Meng Gong (2023). MetaboGroupS: A Group Entropy-Based Web Platform for Evaluating Normalization Methods in Blood Metabolomics Data from Maintenance Hemodialysis Patients [Dataset]. http://doi.org/10.1021/acs.analchem.8b03065.s004
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.analchem.8b03065.s004
Dataset updated
Jun 12, 2023
Dataset provided by
ACS Publications
Authors
Shisheng Wang; Xiaolei Chen; Dan Du; Wen Zheng; Liqiang Hu; Hao Yang; Jingqiu Cheng; Meng Gong
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Because of inevitable and complicated signal variations in LC-MSn-based nontargeted metabolomics, normalization of metabolites data is a highly recommended procedure to assist in improving accuracies in metabolic profiling and discovery of potential biomarkers. Despite various normalization methods having been developed and applied for processing these data sets, it is still difficult to assess their performance. Moreover, such methods are elusive and difficult to choose for users, especially those without bioinformatics training. In this study, we present a powerful and user-friendly web platform, named MetaboGroupS, for comparison and evaluation of seven popular normalization methods and provide an optimal one automatically for end users based on the group entropies of every sample data point. For examination and application of this tool, we analyzed a complex clinical human data set from maintenance hemodialysis patients with erythrin resistance. Metabolite peaks (11 027) were extracted from the experimental data and then imported into this platform; the entire analysis process was completed sequentially within 5 min. To further test the performance and universality of MetaboGroupS, we analyzed two more published data sets including a nuclear magnetic resonance (NMR) data set on this platform. The results indicated that the method with a lower intragroup entropy and higher intergroup entropy would be preferable. In addition, MetaboGroupS can be quite conveniently operated by users and does not require any profound computational expertise or background for scientists in many fields. MetaboGroupS is freely available at https://omicstools.shinyapps.io/MetaboGroupSapp/.
f
Overview of the eight quality measures used to define the quality of...
plos.figshare.com
xls
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexis Vandenbon (2023). Overview of the eight quality measures used to define the quality of genome-wide gene co-expression networks. [Dataset]. http://doi.org/10.1371/journal.pone.0263344.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0263344.t001
Dataset updated
Jun 16, 2023
Dataset provided by
PLOS ONE
Authors
Alexis Vandenbon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview of the eight quality measures used to define the quality of genome-wide gene co-expression networks.
m
GraphPrism files with raw/normalized data supporting DExCon study
figshare.manchester.ac.uk
figshare.com
bin
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jakub Gemperle (2023). GraphPrism files with raw/normalized data supporting DExCon study [Dataset]. http://doi.org/10.48420/16878904.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.48420/16878904.v2
Dataset updated
Jun 2, 2023
Dataset provided by
University of Manchester
Authors
Jakub Gemperle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
GraphPrism files with raw/normalized data points supporting study "DExCon, DExogron, LUXon: on-demand expression control of endogenous genes reveals dynamics of Rab11 family members"

Facebook

Twitter

Click to copy link

Link copied

Cite

An, Lingling; Lu, Meng; Butt, Hamza; Luo, Qianwen; Du, Ruofei; Lytal, Nicholas; Jiang, Hongmei (2024). DataSheet1_TimeNorm: a novel normalization method for time course microbiome data.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001407445

DataSheet1_TimeNorm: a novel normalization method for time course microbiome data.pdf

Explore at:

Dataset updated

Sep 24, 2024

Authors

An, Lingling; Lu, Meng; Butt, Hamza; Luo, Qianwen; Du, Ruofei; Lytal, Nicholas; Jiang, Hongmei

Description

Metagenomic time-course studies provide valuable insights into the dynamics of microbial systems and have become increasingly popular alongside the reduction in costs of next-generation sequencing technologies. Normalization is a common but critical preprocessing step before proceeding with downstream analysis. To the best of our knowledge, currently there is no reported method to appropriately normalize microbial time-series data. We propose TimeNorm, a novel normalization method that considers the compositional property and time dependency in time-course microbiome data. It is the first method designed for normalizing time-series data within the same time point (intra-time normalization) and across time points (bridge normalization), separately. Intra-time normalization normalizes microbial samples under the same condition based on common dominant features. Bridge normalization detects and utilizes a group of most stable features across two adjacent time points for normalization. Through comprehensive simulation studies and application to a real study, we demonstrate that TimeNorm outperforms existing normalization methods and boosts the power of downstream differential abundance analysis.

Clear search

Close search

Google apps

Main menu

DataSheet1_TimeNorm: a novel normalization method for time course microbiome...

Mission and Vision Statements (Normalized)

Data from: A systematic evaluation of normalization methods and probe...

Data from: Adaptive Inference for Change Points in High-Dimensional Data

Growth Strategies (Normalized)

EV Charging Data Normalization Middleware Market Research Report 2033

EV Charging Data Normalization Middleware Market Outlook

Component Analysis

Data and Code for: "Universal Adaptive Normalization Scale (AMIS):...

Scenario Planning (Normalized)

Dataset of ISBA open loop states and ASCAT normalized backscatter and slope...

Talent & Employee Engagement (Normalized)

LiDAR - Normalized Digital Surface Model - Tiles

2024 LiDAR - Normalized Digital Surface Model

UniCourt Legal Analytics API - USA Legal Data (AI Normalized)

2020 LiDAR - Normalized Digital Surface Model

2022 LiDAR - Normalized Digital Surface Model

Data for On The Finite-Size Lyapunov Exponent For The Schrodinger Operator...

Optimized 478-Point 3D Facial Landmark Dataset

Optimized 478-Point Normalized 3D Facial Landmark Dataset

Overview

Original Video Dataset Details

Data Collection and Preprocessing

Columns and Features Description

Visualization Example (Sample Test Image)

Potential Applications

Citation

Data from: MetaboGroupS: A Group Entropy-Based Web Platform for Evaluating...

Overview of the eight quality measures used to define the quality of...

GraphPrism files with raw/normalized data supporting DExCon study

DataSheet1_TimeNorm: a novel normalization method for time course microbiome data.pdf