Facebook
TwitterMetagenomic time-course studies provide valuable insights into the dynamics of microbial systems and have become increasingly popular alongside the reduction in costs of next-generation sequencing technologies. Normalization is a common but critical preprocessing step before proceeding with downstream analysis. To the best of our knowledge, currently there is no reported method to appropriately normalize microbial time-series data. We propose TimeNorm, a novel normalization method that considers the compositional property and time dependency in time-course microbiome data. It is the first method designed for normalizing time-series data within the same time point (intra-time normalization) and across time points (bridge normalization), separately. Intra-time normalization normalizes microbial samples under the same condition based on common dominant features. Bridge normalization detects and utilizes a group of most stable features across two adjacent time points for normalization. Through comprehensive simulation studies and application to a real study, we demonstrate that TimeNorm outperforms existing normalization methods and boosts the power of downstream differential abundance analysis.
Facebook
TwitterThis dataset provides processed and normalized/standardized indices for the management tool group focused on 'Mission and Vision Statements', including related concepts like Purpose Statements. Derived from five distinct raw data sources, these indices are specifically designed for comparative longitudinal analysis, enabling the examination of trends and relationships across different empirical domains (web search, literature, academic publishing, and executive adoption). The data presented here represent transformed versions of the original source data, aimed at achieving metric comparability. Users requiring the unprocessed source data should consult the corresponding Mission/Vision dataset in the Management Tool Source Data (Raw Extracts) Dataverse. Data Files and Processing Methodologies: Google Trends File (Prefix: GT_): Normalized Relative Search Interest (RSI) Input Data: Native monthly RSI values from Google Trends (Jan 2004 - Jan 2025) for the query "mission statement" + "vision statement" + "mission and vision corporate". Processing: None. Utilizes the original base-100 normalized Google Trends index. Output Metric: Monthly Normalized RSI (Base 100). Frequency: Monthly. Google Books Ngram Viewer File (Prefix: GB_): Normalized Relative Frequency Input Data: Annual relative frequency values from Google Books Ngram Viewer (1950-2022, English corpus, no smoothing) for the query Mission Statements + Vision Statements + Purpose Statements + Mission and Vision. Processing: Annual relative frequency series normalized (peak year = 100). Output Metric: Annual Normalized Relative Frequency Index (Base 100). Frequency: Annual. Crossref.org File (Prefix: CR_): Normalized Relative Publication Share Index Input Data: Absolute monthly publication counts matching Mission/Vision-related keywords [("mission statement" OR ...) AND (...) - see raw data for full query] in titles/abstracts (1950-2025), alongside total monthly Crossref publications. Deduplicated via DOIs. Processing: Monthly relative share calculated (Mission/Vision Count / Total Count). Monthly relative share series normalized (peak month's share = 100). Output Metric: Monthly Normalized Relative Publication Share Index (Base 100). Frequency: Monthly. Bain & Co. Survey - Usability File (Prefix: BU_): Normalized Usability Index Input Data: Original usability percentages (%) from Bain surveys for specific years: Mission/Vision (1993); Mission Statements (1996); Mission and Vision Statements (1999-2017); Purpose, Mission, and Vision Statements (2022). Processing: Semantic Grouping: Data points across the different naming conventions were treated as a single conceptual series. Normalization: Combined series normalized relative to its historical peak (Max % = 100). Output Metric: Biennial Estimated Normalized Usability Index (Base 100 relative to historical peak). Frequency: Biennial (Approx.). Bain & Co. Survey - Satisfaction File (Prefix: BS_): Standardized Satisfaction Index Input Data: Original average satisfaction scores (1-5 scale) from Bain surveys for specific years (same names/years as Usability). Processing: Semantic Grouping: Data points treated as a single conceptual series. Standardization (Z-scores): Using Z = (X - 3.0) / 0.891609. Index Scale Transformation: Index = 50 + (Z * 22). Output Metric: Biennial Standardized Satisfaction Index (Center=50, Range?[1,100]). Frequency: Biennial (Approx.). File Naming Convention: Files generally follow the pattern: PREFIX_Tool_Processed.csv or similar, where the PREFIX indicates the data source (GT_, GB_, CR_, BU_, BS_). Consult the parent Dataverse description (Management Tool Comparative Indices) for general context and the methodological disclaimer. For original extraction details (specific keywords, URLs, etc.), refer to the corresponding Mission/Vision dataset in the Raw Extracts Dataverse. Comprehensive project documentation provides full details on all processing steps.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background
The Infinium EPIC array measures the methylation status of > 850,000 CpG sites. The EPIC BeadChip uses a two-array design: Infinium Type I and Type II probes. These probe types exhibit different technical characteristics which may confound analyses. Numerous normalization and pre-processing methods have been developed to reduce probe type bias as well as other issues such as background and dye bias.
Methods
This study evaluates the performance of various normalization methods using 16 replicated samples and three metrics: absolute beta-value difference, overlap of non-replicated CpGs between replicate pairs, and effect on beta-value distributions. Additionally, we carried out Pearson’s correlation and intraclass correlation coefficient (ICC) analyses using both raw and SeSAMe 2 normalized data.
Results
The method we define as SeSAMe 2, which consists of the application of the regular SeSAMe pipeline with an additional round of QC, pOOBAH masking, was found to be the best-performing normalization method, while quantile-based methods were found to be the worst performing methods. Whole-array Pearson’s correlations were found to be high. However, in agreement with previous studies, a substantial proportion of the probes on the EPIC array showed poor reproducibility (ICC < 0.50). The majority of poor-performing probes have beta values close to either 0 or 1, and relatively low standard deviations. These results suggest that probe reliability is largely the result of limited biological variation rather than technical measurement variation. Importantly, normalizing the data with SeSAMe 2 dramatically improved ICC estimates, with the proportion of probes with ICC values > 0.50 increasing from 45.18% (raw data) to 61.35% (SeSAMe 2).
Methods
Study Participants and Samples
The whole blood samples were obtained from the Health, Well-being and Aging (Saúde, Ben-estar e Envelhecimento, SABE) study cohort. SABE is a cohort of census-withdrawn elderly from the city of São Paulo, Brazil, followed up every five years since the year 2000, with DNA first collected in 2010. Samples from 24 elderly adults were collected at two time points for a total of 48 samples. The first time point is the 2010 collection wave, performed from 2010 to 2012, and the second time point was set in 2020 in a COVID-19 monitoring project (9±0.71 years apart). The 24 individuals were 67.41±5.52 years of age (mean ± standard deviation) at time point one; and 76.41±6.17 at time point two and comprised 13 men and 11 women.
All individuals enrolled in the SABE cohort provided written consent, and the ethic protocols were approved by local and national institutional review boards COEP/FSP/USP OF.COEP/23/10, CONEP 2044/2014, CEP HIAE 1263-10, University of Toronto RIS 39685.
Blood Collection and Processing
Genomic DNA was extracted from whole peripheral blood samples collected in EDTA tubes. DNA extraction and purification followed manufacturer’s recommended protocols, using Qiagen AutoPure LS kit with Gentra automated extraction (first time point) or manual extraction (second time point), due to discontinuation of the equipment but using the same commercial reagents. DNA was quantified using Nanodrop spectrometer and diluted to 50ng/uL. To assess the reproducibility of the EPIC array, we also obtained technical replicates for 16 out of the 48 samples, for a total of 64 samples submitted for further analyses. Whole Genome Sequencing data is also available for the samples described above.
Characterization of DNA Methylation using the EPIC array
Approximately 1,000ng of human genomic DNA was used for bisulphite conversion. Methylation status was evaluated using the MethylationEPIC array at The Centre for Applied Genomics (TCAG, Hospital for Sick Children, Toronto, Ontario, Canada), following protocols recommended by Illumina (San Diego, California, USA).
Processing and Analysis of DNA Methylation Data
The R/Bioconductor packages Meffil (version 1.1.0), RnBeads (version 2.6.0), minfi (version 1.34.0) and wateRmelon (version 1.32.0) were used to import, process and perform quality control (QC) analyses on the methylation data. Starting with the 64 samples, we first used Meffil to infer the sex of the 64 samples and compared the inferred sex to reported sex. Utilizing the 59 SNP probes that are available as part of the EPIC array, we calculated concordance between the methylation intensities of the samples and the corresponding genotype calls extracted from their WGS data. We then performed comprehensive sample-level and probe-level QC using the RnBeads QC pipeline. Specifically, we (1) removed probes if their target sequences overlap with a SNP at any base, (2) removed known cross-reactive probes (3) used the iterative Greedycut algorithm to filter out samples and probes, using a detection p-value threshold of 0.01 and (4) removed probes if more than 5% of the samples having a missing value. Since RnBeads does not have a function to perform probe filtering based on bead number, we used the wateRmelon package to extract bead numbers from the IDAT files and calculated the proportion of samples with bead number < 3. Probes with more than 5% of samples having low bead number (< 3) were removed. For the comparison of normalization methods, we also computed detection p-values using out-of-band probes empirical distribution with the pOOBAH() function in the SeSAMe (version 1.14.2) R package, with a p-value threshold of 0.05, and the combine.neg parameter set to TRUE. In the scenario where pOOBAH filtering was carried out, it was done in parallel with the previously mentioned QC steps, and the resulting probes flagged in both analyses were combined and removed from the data.
Normalization Methods Evaluated
The normalization methods compared in this study were implemented using different R/Bioconductor packages and are summarized in Figure 1. All data was read into R workspace as RG Channel Sets using minfi’s read.metharray.exp() function. One sample that was flagged during QC was removed, and further normalization steps were carried out in the remaining set of 63 samples. Prior to all normalizations with minfi, probes that did not pass QC were removed. Noob, SWAN, Quantile, Funnorm and Illumina normalizations were implemented using minfi. BMIQ normalization was implemented with ChAMP (version 2.26.0), using as input Raw data produced by minfi’s preprocessRaw() function. In the combination of Noob with BMIQ (Noob+BMIQ), BMIQ normalization was carried out using as input minfi’s Noob normalized data. Noob normalization was also implemented with SeSAMe, using a nonlinear dye bias correction. For SeSAMe normalization, two scenarios were tested. For both, the inputs were unmasked SigDF Sets converted from minfi’s RG Channel Sets. In the first, which we call “SeSAMe 1”, SeSAMe’s pOOBAH masking was not executed, and the only probes filtered out of the dataset prior to normalization were the ones that did not pass QC in the previous analyses. In the second scenario, which we call “SeSAMe 2”, pOOBAH masking was carried out in the unfiltered dataset, and masked probes were removed. This removal was followed by further removal of probes that did not pass previous QC, and that had not been removed by pOOBAH. Therefore, SeSAMe 2 has two rounds of probe removal. Noob normalization with nonlinear dye bias correction was then carried out in the filtered dataset. Methods were then compared by subsetting the 16 replicated samples and evaluating the effects that the different normalization methods had in the absolute difference of beta values (|β|) between replicated samples.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this article, we propose a class of test statistics for a change point in the mean of high-dimensional independent data. Our test integrates the U-statistic based approach in a recent work by Wang et al. and the Lq-norm based high-dimensional test in a recent work by He et al., and inherits several appealing features such as being tuning parameter free and asymptotic independence for test statistics corresponding to even q’s. A simple combination of test statistics corresponding to several different q’s leads to a test with adaptive power property, that is, it can be powerful against both sparse and dense alternatives. On the estimation front, we obtain the convergence rate of the maximizer of our test statistic standardized by sample size when there is one change-point in mean and q = 2, and propose to combine our tests with a wild binary segmentation algorithm to estimate the change-point number and locations when there are multiple change-points. Numerical comparisons using both simulated and real data demonstrate the advantage of our adaptive test and its corresponding estimation method.
Facebook
TwitterThis dataset provides processed and normalized/standardized indices for the management tool group focused on 'Growth Strategies'. Derived from five distinct raw data sources, these indices are specifically designed for comparative longitudinal analysis, enabling the examination of trends and relationships across different empirical domains (web search, literature, academic publishing, and executive adoption). The data presented here represent transformed versions of the original source data, aimed at achieving metric comparability. Users requiring the unprocessed source data should consult the corresponding Growth Strategies dataset in the Management Tool Source Data (Raw Extracts) Dataverse. Data Files and Processing Methodologies: Google Trends File (Prefix: GT_): Normalized Relative Search Interest (RSI) Input Data: Native monthly RSI values from Google Trends (Jan 2004 - Jan 2025) for the query "growth strategies" + "growth strategy" + "growth strategies business". Processing: None. Utilizes the original base-100 normalized Google Trends index. Output Metric: Monthly Normalized RSI (Base 100). Frequency: Monthly. Google Books Ngram Viewer File (Prefix: GB_): Normalized Relative Frequency Input Data: Annual relative frequency values from Google Books Ngram Viewer (1950-2022, English corpus, no smoothing) for the query Growth Strategies + Growth Strategy. Processing: Annual relative frequency series normalized (peak year = 100). Output Metric: Annual Normalized Relative Frequency Index (Base 100). Frequency: Annual. Crossref.org File (Prefix: CR_): Normalized Relative Publication Share Index Input Data: Absolute monthly publication counts matching Growth Strategies-related keywords [("growth strategies" OR ...) AND (...) - see raw data for full query] in titles/abstracts (1950-2025), alongside total monthly Crossref publications. Deduplicated via DOIs. Processing: Monthly relative share calculated (Growth Strat. Count / Total Count). Monthly relative share series normalized (peak month's share = 100). Output Metric: Monthly Normalized Relative Publication Share Index (Base 100). Frequency: Monthly. Bain & Co. Survey - Usability File (Prefix: BU_): Normalized Usability Index Input Data: Original usability percentages (%) from Bain surveys for specific years: Growth Strategies (1996, 1999, 2000, 2002, 2004); Growth Strategy Tools (2006, 2008). Note: Not reported after 2008. Processing: Semantic Grouping: Data points for "Growth Strategies" and "Growth Strategy Tools" were treated as a single conceptual series. Normalization: Combined series normalized relative to its historical peak (Max % = 100). Output Metric: Biennial Estimated Normalized Usability Index (Base 100 relative to historical peak). Frequency: Biennial (Approx.). Bain & Co. Survey - Satisfaction File (Prefix: BS_): Standardized Satisfaction Index Input Data: Original average satisfaction scores (1-5 scale) from Bain surveys for specific years: Growth Strategies (1996-2004); Growth Strategy Tools (2006, 2008). Note: Not reported after 2008. Processing: Semantic Grouping: Data points treated as a single conceptual series. Standardization (Z-scores): Using Z = (X - 3.0) / 0.891609. Index Scale Transformation: Index = 50 + (Z * 22). Output Metric: Biennial Standardized Satisfaction Index (Center=50, Range?[1,100]). Frequency: Biennial (Approx.). File Naming Convention: Files generally follow the pattern: PREFIX_Tool_Processed.csv or similar, where the PREFIX indicates the data source (GT_, GB_, CR_, BU_, BS_). Consult the parent Dataverse description (Management Tool Comparative Indices) for general context and the methodological disclaimer. For original extraction details (specific keywords, URLs, etc.), refer to the corresponding Growth Strategies dataset in the Raw Extracts Dataverse. Comprehensive project documentation provides full details on all processing steps.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global EV Charging Data Normalization Middleware market size reached USD 412.6 million in 2024, driven by the rapid expansion of electric vehicle infrastructure and the increasing need for seamless data integration across diverse charging networks. The market is experiencing a robust compound annual growth rate (CAGR) of 22.4% and is forecasted to achieve a value of USD 2,977.9 million by 2033. This impressive growth can be attributed to the surging adoption of electric vehicles worldwide, the proliferation of charging points, and the urgent requirement for standardized data management solutions that can ensure interoperability, optimize operations, and enhance user experience across fragmented EV charging ecosystems.
A primary growth factor fueling the EV Charging Data Normalization Middleware market is the exponential increase in electric vehicle adoption globally. As governments and private entities invest heavily in expanding EV charging infrastructure, the diversity of hardware, software, and communication protocols has created significant interoperability challenges. Data normalization middleware acts as a critical intermediary, harmonizing disparate data streams from various charging stations, networks, and management platforms. This enables seamless communication, real-time analytics, and efficient billing, which are essential for scaling up EV infrastructure. The demand for robust middleware solutions is further amplified by regulatory mandates for open data exchange, transparent reporting, and the integration of renewable energy sources, all of which necessitate advanced data normalization capabilities.
Another significant driver is the evolution of smart grid technologies and the integration of EV charging with energy management systems. Utilities and energy providers are increasingly leveraging data normalization middleware to aggregate and analyze charging data, optimize load balancing, and support demand response initiatives. These solutions facilitate the synchronization of EV charging activities with grid requirements, reducing peak loads and enabling dynamic pricing models. The middleware's ability to provide real-time, standardized data empowers stakeholders to make informed decisions, enhance operational efficiency, and improve customer satisfaction. Additionally, advancements in artificial intelligence and machine learning are enabling middleware platforms to deliver predictive analytics, automate fault detection, and support proactive maintenance, further enhancing their value proposition in the market.
The proliferation of public and private charging networks, coupled with the emergence of new business models such as fleet electrification and vehicle-to-grid (V2G) services, is also catalyzing market growth. Fleet operators and charging network providers rely on data normalization middleware to consolidate information from heterogeneous sources, streamline operations, and deliver unified services to end-users. As the competitive landscape intensifies, stakeholders are prioritizing middleware solutions that offer scalability, security, and compliance with industry standards. The increasing collaboration between automotive OEMs, technology vendors, and energy companies is fostering innovation and accelerating the deployment of advanced middleware platforms tailored to diverse application scenarios.
From a regional perspective, Europe is currently leading the EV Charging Data Normalization Middleware market, accounting for a significant share of global revenues due to its mature EV ecosystem, stringent regulatory frameworks, and aggressive decarbonization targets. North America is witnessing rapid growth, driven by substantial investments in EV infrastructure and supportive government policies. The Asia Pacific region is emerging as a lucrative market, propelled by the rapid adoption of electric vehicles in China, India, and Southeast Asia, coupled with large-scale infrastructure development initiatives. These regional dynamics are shaping the competitive landscape and influencing the strategic priorities of market participants worldwide.
The Component segment of the EV Charging Data Normalization Middleware market is bifurcated into Software and Services, each playing a pivotal role in the e
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset Title: Data and Code for: "Universal Adaptive Normalization Scale (AMIS): Integration of Heterogeneous Metrics into a Unified System" Description: This dataset contains source data and processing results for validating the Adaptive Multi-Interval Scale (AMIS) normalization method. Includes educational performance data (student grades), economic statistics (World Bank GDP), and Python implementation of the AMIS algorithm with graphical interface. Contents: - Source data: educational grades and GDP statistics - AMIS normalization results (3, 5, 9, 17-point models) - Comparative analysis with linear normalization - Ready-to-use Python code for data processing Applications: - Educational data normalization and analysis - Economic indicators comparison - Development of unified metric systems - Methodology research in data scaling Technical info: Python code with pandas, numpy, scipy, matplotlib dependencies. Data in Excel format.
Facebook
TwitterThis dataset provides processed and normalized/standardized indices for the management tool group 'Scenario Planning', including related concepts like Scenario Analysis and Contingency Planning. Derived from five distinct raw data sources, these indices are specifically designed for comparative longitudinal analysis, enabling the examination of trends and relationships across different empirical domains (web search, literature, academic publishing, and executive adoption). The data presented here represent transformed versions of the original source data, aimed at achieving metric comparability. Users requiring the unprocessed source data should consult the corresponding Scenario Planning dataset in the Management Tool Source Data (Raw Extracts) Dataverse. Data Files and Processing Methodologies: Google Trends File (Prefix: GT_): Normalized Relative Search Interest (RSI) Input Data: Native monthly RSI values from Google Trends (Jan 2004 - Jan 2025) for the query "scenario planning" + "scenario analysis" + "contingency planning" + "scenario planning business". Processing: None. Utilizes the original base-100 normalized Google Trends index. Output Metric: Monthly Normalized RSI (Base 100). Frequency: Monthly. Google Books Ngram Viewer File (Prefix: GB_): Normalized Relative Frequency Input Data: Annual relative frequency values from Google Books Ngram Viewer (1950-2022, English corpus, no smoothing) for the query Scenario Planning + Scenario Analysis + Contingency Planning + Scenario and Contingency Planning. Processing: Annual relative frequency series normalized (peak year = 100). Output Metric: Annual Normalized Relative Frequency Index (Base 100). Frequency: Annual. Crossref.org File (Prefix: CR_): Normalized Relative Publication Share Index Input Data: Absolute monthly publication counts matching Scenario Planning-related keywords [("scenario planning" OR ...) AND ("management" OR ...) - see raw data for full query] in titles/abstracts (1950-2025), alongside total monthly Crossref publications. Deduplicated via DOIs. Processing: Monthly relative share calculated (Scenario Planning Count / Total Count). Monthly relative share series normalized (peak month's share = 100). Output Metric: Monthly Normalized Relative Publication Share Index (Base 100). Frequency: Monthly. Bain & Co. Survey - Usability File (Prefix: BU_): Normalized Usability Index Input Data: Original usability percentages (%) from Bain surveys for specific years: Scenario Planning (1993, 1999, 2000); Scenario and Contingency Planning (2004, 2006, 2008, 2010, 2012, 2014, 2017); Scenario Analysis and Contingency Planning (2022). Processing: Semantic Grouping: Data points across the different naming conventions were treated as a single conceptual series. Normalization: Combined series normalized relative to its historical peak (Max % = 100). Output Metric: Biennial Estimated Normalized Usability Index (Base 100 relative to historical peak). Frequency: Biennial (Approx.). Bain & Co. Survey - Satisfaction File (Prefix: BS_): Standardized Satisfaction Index Input Data: Original average satisfaction scores (1-5 scale) from Bain surveys for specific years: Scenario Planning (1993, 1999, 2000); Scenario and Contingency Planning (2004, 2006, 2008, 2010, 2012, 2014, 2017); Scenario Analysis and Contingency Planning (2022). Processing: Semantic Grouping: Data points treated as a single conceptual series. Standardization (Z-scores): Using Z = (X - 3.0) / 0.891609. Index Scale Transformation: Index = 50 + (Z * 22). Output Metric: Biennial Standardized Satisfaction Index (Center=50, Range?[1,100]). Frequency: Biennial (Approx.). File Naming Convention: Files generally follow the pattern: PREFIX_Tool_Processed.csv or similar, where the PREFIX indicates the data source (GT_, GB_, CR_, BU_, BS_). Consult the parent Dataverse description (Management Tool Comparative Indices) for general context and the methodological disclaimer. For original extraction details (specific keywords, URLs, etc.), refer to the corresponding Scenario Planning dataset in the Raw Extracts Dataverse. Comprehensive project documentation provides full details on all processing steps.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset comprises open loop simulations of the ISBA land surface model and ASCAT normalized backscatter and slope data at grid points (GPIs) containing International Soil Moisture Network stations. The dataset covers the period from 2007 to 2019 and includes a total of 31 GPIs.
The ISBA land surface model runs on the offline version of SURFEX software (https://www.umr-cnrm.fr/surfex/) using ERA5 forcing data. ASCAT normalized backscatter (“sig”) data are provided at 10am and 10pm every day. If normalized backscatter are not acquired at a particular time, a value of 999 is provided.
ASCAT slope (“slop”) is calculated daily following Hahn et al. (2017). These daily values are provided at 10am in this dataset. The slope value at 10pm is set to 999.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides processed and normalized/standardized indices for the management tool group focused on 'Talent & Employee Engagement', including concepts like Employee Engagement Surveys/Systems and Corporate Codes of Ethics. Derived from five distinct raw data sources, these indices are specifically designed for comparative longitudinal analysis, enabling the examination of trends and relationships across different empirical domains (web search, literature, academic publishing, and executive adoption). The data presented here represent transformed versions of the original source data, aimed at achieving metric comparability. Users requiring the unprocessed source data should consult the corresponding Talent/Engagement dataset in the Management Tool Source Data (Raw Extracts) Dataverse. Data Files and Processing Methodologies: Google Trends File (Prefix: GT_): Normalized Relative Search Interest (RSI) Input Data: Native monthly RSI values from Google Trends (Jan 2004 - Jan 2025) for the query "corporate code of ethics" + "employee engagement" + "employee engagement management". Processing: None. Utilizes the original base-100 normalized Google Trends index. Output Metric: Monthly Normalized RSI (Base 100). Frequency: Monthly. Google Books Ngram Viewer File (Prefix: GB_): Normalized Relative Frequency Input Data: Annual relative frequency values from Google Books Ngram Viewer (1950-2022, English corpus, no smoothing) for the query Corporate Code of Ethics+Employee Engagement Programs+Employee Engagement Surveys+Employee Engagement. Processing: Annual relative frequency series normalized (peak year = 100). Output Metric: Annual Normalized Relative Frequency Index (Base 100). Frequency: Annual. Crossref.org File (Prefix: CR_): Normalized Relative Publication Share Index Input Data: Absolute monthly publication counts matching Engagement/Ethics-related keywords [("corporate code of ethics" OR ...) AND (...) - see raw data for full query] in titles/abstracts (1950-2025), alongside total monthly Crossref publications. Deduplicated via DOIs. Processing: Monthly relative share calculated (Engage/Ethics Count / Total Count). Monthly relative share series normalized (peak month's share = 100). Output Metric: Monthly Normalized Relative Publication Share Index (Base 100). Frequency: Monthly. Bain & Co. Survey - Usability File (Prefix: BU_): Normalized Usability Index Input Data: Original usability percentages (%) from Bain surveys for specific years: Corporate Code of Ethics (2002); Employee Engagement Surveys (2012, 2014); Employee Engagement Systems (2017, 2022). Processing: Semantic Grouping: Data points across related names treated as a single conceptual series representing Talent/Engagement focus. Normalization: Combined series normalized relative to its historical peak (Max % = 100). Output Metric: Biennial Estimated Normalized Usability Index (Base 100 relative to historical peak). Frequency: Biennial (Approx.). Bain & Co. Survey - Satisfaction File (Prefix: BS_): Standardized Satisfaction Index Input Data: Original average satisfaction scores (1-5 scale) from Bain surveys for specific years (same names/years as Usability). Processing: Semantic Grouping: Data points treated as a single conceptual series. Standardization (Z-scores): Using Z = (X - 3.0) / 0.891609. Index Scale Transformation: Index = 50 + (Z * 22). Output Metric: Biennial Standardized Satisfaction Index (Center=50, Range?[1,100]). Frequency: Biennial (Approx.). File Naming Convention: Files generally follow the pattern: PREFIX_Tool_Processed.csv or similar, where the PREFIX indicates the data source (GT_, GB_, CR_, BU_, BS_). Consult the parent Dataverse description (Management Tool Comparative Indices) for general context and the methodological disclaimer. For original extraction details (specific keywords, URLs, etc.), refer to the corresponding Talent/Engagement dataset in the Raw Extracts Dataverse. Comprehensive project documentation provides full details on all processing steps.
Facebook
TwitterNormalizd Digital Surface Model - 1m resolution. The dataset contains the 1m Digital Surface Model for the District of Columbia.Some areas have limited data. The lidar dataset redaction was conducted under the guidance of the United States Secret Service. Except for classified ground points and classified water points, all lidar data returns and collected data were removed from the dataset within the United States Secret Service 1m redaction boundary generated for the 2017 orthophoto flight
Facebook
TwitterNormalized Digital Surface Model - 1m resolution. The dataset contains the 1m Normalized Digital Surface Model for the District of Columbia. These lidar data are processed classified LAS 1.4 files at USGS QL1 covering the District of Columbia. Some areas have limited data. The lidar dataset redaction was conducted under the guidance of the United States Secret Service. All data returns were removed from the dataset within the United States Secret Service redaction boundary except for classified ground points and classified water points.
Facebook
TwitterUniCourt provides easy access to normalized legal analytics data via our Attorney Analytics API, Law Firm Analytics API, Judge Analytics API, Party Analytics API, and Court Analytics API, giving you the flexibility you need to intuitively move between interconnected data points. This structure can be used for AI & ML Training Data.
Build the Best Legal Analytics Possible
• UniCourt collects court data from hundreds of state and federal trial court databases, as well as attorney bar data, judicial records data, and Secretary of State data. • We then combine all of those data sets together through our entity normalization process to identify who’s who in litigation, so you can download structured data via our APIs and build the best legal analytics possible.
Flexible Analytics APIs for Meaningful Integrations
• UniCourt’s Legal Analytics APIs put billions of data points at your fingertips and give you the flexibility you need to integrate analytics into your matter management systems, BI dashboards, data lakes, CRMs, and other data management tools. • Create on-demand, self-service reporting options within your internal applications and set up automated data feeds to keep your mission critical analytics reports regularly refreshed with updated data.
What Legal Analytics APIs Are Available?
UniCourt offers a wide range of Legal Analytics APIs and various end-points to provide the data you need. Here are the core analytics APIs we provide:
• Attorney Analytics API • Law Firm Analytics API • Judge Analytics API • Party Analytics API • Case Analytics API
Facebook
TwitterNormalizd Digital Surface Model - 1m resolution. The dataset contains the 1m Digital Surface Model for the District of Columbia. Some areas have limited data. The lidar dataset redaction was conducted under the guidance of the United States Secret Service. Except for classified ground points and classified water points, all lidar data returns and collected data were removed from the dataset within the United States Secret Service 1m redaction boundary generated for the 2017 orthophoto flight. This dataset is provided as an ArcGIS Image service. Please note, the download feature for this image service in Open Data DC provides a compressed PNG, JPEG or TIFF. The compressed GeoTIFF mosaic raster dataset is available under additional options when viewing downloads. Requests for the individual GeoTIFF set of images should be sent to open.data@dc.gov.
Facebook
TwitterNormalized Digital Surface Model - 1m resolution. The dataset contains the 1m Normalized Digital Surface Model for the District of Columbia. Some areas have limited data. The lidar dataset redaction was conducted under the guidance of the United States Secret Service. All data returns were removed from the dataset within the United States Secret Service redaction boundary except for classified ground points and classified water points.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
A ZIP file of computer code and output used in the numerical calculations for On The Finite-Size Lyapunov Exponent For The Schrodinger Operator With Skew-Shift Potential by Paul Michael Kielstra and Marius Lemm. The ZIP decompresses to about 26GB, containing multiple files:
201x201 bad set grid.txt: A list of 201x201=40401 evenly spaced points on [0, 1]x[0, 1], each written in the form (x, y) and followed by 30000 values of E which are probably bad for that point. This gives a total of 40401x30001=1212070401 lines.
Upper bounds.txt: individual upper bounds for equation (9) calculated at various points. The bound in this equation in the published paper is the worst of these.
E=0/N/2001x2001 grid.tsv: A tab-separated values file of 2001x2001=4004001 evenly spaced points on [0, 1]x[0, 1], with headers:
X: The x-coordinate of the point represented by the line in question.
Y: The y-coordinate.
Exact_x, Exact_y: The x- and y-coordinates to the maximum precision the computer used. In case, for instance, the x-coordinate is defined to be 0.5 but is actually 0.5000000000000001 in memory.
Matrix: The matrix generated at this point, modulo a certain normalization (see below).
Result: The log of the norm of the matrix. This has been corrected for the normalization -- it is calculated as if the matrix had never been normalized.
Normalizationcount: The actual matrix generated is too large to store in memory, so the matrix we store and output is (Matrix)x(Normalizer^Normalizationcount). We used a normalizer of 0.01.
This file was calculated with the values E=0, N=30000, lambda=1/2. The header line means that this file contains 4004001+1=4004002 lines in total.
E=0/N/2001x2001 random grid.tsv: As with the 2001x2001 grid.tsv file, but missing the exact_x and exact_y coordinates. Instead, the x and y values are both exact and randomly chosen. The lines in the file are in no particular order. This file contains the data for the Monte Carlo approximation used in the paper.
E=0/2N/2001x2001 grid.tsv: As with its counterpart in the folder labeled N, but calculated with N=60000 instead.
E=-2.495: As with its counterpart E=0, but everything is calculated with E=-2.495123260049612 (which we round to -2.49512326 in the paper). This folder also contains no random or Monte Carlo calculations.
Code/Multiplier.m: MATLAB code to generate the skew matrix at a given point.
Code/Iterator.m: MATLAB code to iterate over a series of points and call Multiplier at each.
Code/Striper.m: MATLAB code to split up the input space into a series of stripes and call Iterator on exactly one of them. We performed our calculations in parallel, each job consisting of calling Striper on a different stripe number.
Code/Badfinder.m: MATLAB code to take a point and output a series of E-values for which that point is in the bad set.
Code/BadSetIterator.m: As with Iterator.m, but calls Badfinder.
Code/BadSetStriper.m: As with Striper.m, but calls BadSetIterator. (The function in this file is also called Striper.)
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset provides pre-extracted, normalized, and optimized 3D facial landmark coordinates from a large corpus of video data, optimized for tasks such as frame-level emotion recognition, facial expression analysis, and head pose estimation.
Each row represents a single video frame and contains the corresponding emotion label and the x, y, z coordinates for 478 distinct facial landmarks, as generated by the MediaPipe Face Landmarker model.
The data presented here is a pre-processed feature set derived from the Video Emotion dataset.
Key characteristics of the original video data include:
The features were extracted using the MediaPipe Face Landmarker model.
The dataset is provided in a single Parquet file, typically named emotion_landmark_dataset.parquet. It contains 3 + (478 x 3) = 1437 columns.
| Column Name | Data Type | Description |
| :--- | :--- | :--- |
| video_filename | String | The identifier of the original video file from which the frame was extracted. |
| frame_num | Integer | The sequential frame index within the original video file. |
| emotion | String/Categorical | The ground truth emotion label for this clip. Classes include: Angry, Disgust, Fear, Happy, Neutral, Sad. |
| x_0 to x_477 | Float | The normalized X coordinate (horizontal position) for each of the 478 landmarks (0.0 to 1.0). |
| y_0 to y_477 | Float | The normalized Y coordinate (vertical position) for each of the 478 landmarks (0.0 to 1.0). |
| z_0 to z_477 | Float | The normalized Z coordinate (depth, relative to the face center) for each of the 478 landmarks. |
Note on Coordinates: Since the coordinates are normalized, they must be multiplied by the respective pixel width and height of the frame to visualize them on the video.
To ensure the coordinates have been extracted correctly and to demonstrate the data visually, please refer to the Optimized 3D Facial Landmark Dataset Usage provided in the Code Tab of this dataset.
The notebook includes a Python script that loads random frames from the original video files, correctly denormalizes the coordinates using the frame's dimensions, and plots the 478 landmarks on the face.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F21201155%2Fb7b309582a6a18470eab4e288224f14b%2F_results_28_0.png?generation=1760895678667957&alt=media" alt="Visualization">
If you use this dataset in your research or project, please cite the original video dataset (if applicable) and acknowledge the use of the MediaPipe framework for feature extraction.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Because of inevitable and complicated signal variations in LC-MSn-based nontargeted metabolomics, normalization of metabolites data is a highly recommended procedure to assist in improving accuracies in metabolic profiling and discovery of potential biomarkers. Despite various normalization methods having been developed and applied for processing these data sets, it is still difficult to assess their performance. Moreover, such methods are elusive and difficult to choose for users, especially those without bioinformatics training. In this study, we present a powerful and user-friendly web platform, named MetaboGroupS, for comparison and evaluation of seven popular normalization methods and provide an optimal one automatically for end users based on the group entropies of every sample data point. For examination and application of this tool, we analyzed a complex clinical human data set from maintenance hemodialysis patients with erythrin resistance. Metabolite peaks (11 027) were extracted from the experimental data and then imported into this platform; the entire analysis process was completed sequentially within 5 min. To further test the performance and universality of MetaboGroupS, we analyzed two more published data sets including a nuclear magnetic resonance (NMR) data set on this platform. The results indicated that the method with a lower intragroup entropy and higher intergroup entropy would be preferable. In addition, MetaboGroupS can be quite conveniently operated by users and does not require any profound computational expertise or background for scientists in many fields. MetaboGroupS is freely available at https://omicstools.shinyapps.io/MetaboGroupSapp/.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview of the eight quality measures used to define the quality of genome-wide gene co-expression networks.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GraphPrism files with raw/normalized data points supporting study "DExCon, DExogron, LUXon: on-demand expression control of endogenous genes reveals dynamics of Rab11 family members"
Facebook
TwitterMetagenomic time-course studies provide valuable insights into the dynamics of microbial systems and have become increasingly popular alongside the reduction in costs of next-generation sequencing technologies. Normalization is a common but critical preprocessing step before proceeding with downstream analysis. To the best of our knowledge, currently there is no reported method to appropriately normalize microbial time-series data. We propose TimeNorm, a novel normalization method that considers the compositional property and time dependency in time-course microbiome data. It is the first method designed for normalizing time-series data within the same time point (intra-time normalization) and across time points (bridge normalization), separately. Intra-time normalization normalizes microbial samples under the same condition based on common dominant features. Bridge normalization detects and utilizes a group of most stable features across two adjacent time points for normalization. Through comprehensive simulation studies and application to a real study, we demonstrate that TimeNorm outperforms existing normalization methods and boosts the power of downstream differential abundance analysis.