66 datasets found
  1. c

    Data from: LVMED: Dataset of Latvian text normalisation samples for the...

    • repository.clarin.lv
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Viesturs Jūlijs Lasmanis; Normunds Grūzītis (2023). LVMED: Dataset of Latvian text normalisation samples for the medical domain [Dataset]. https://repository.clarin.lv/repository/xmlui/handle/20.500.12574/85
    Explore at:
    Dataset updated
    May 30, 2023
    Authors
    Viesturs Jūlijs Lasmanis; Normunds Grūzītis
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The CSV dataset contains sentence pairs for a text-to-text transformation task: given a sentence that contains 0..n abbreviations, rewrite (normalize) the sentence in full words (word forms).

    Training dataset: 64,665 sentence pairs Validation dataset: 7,185 sentence pairs. Testing dataset: 7,984 sentence pairs.

    All sentences are extracted from a public web corpus (https://korpuss.lv/id/Tīmeklis2020) and contain at least one medical term.

  2. Robust RT-qPCR Data Normalization: Validation and Selection of Internal...

    • plos.figshare.com
    tiff
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daijun Ling; Paul M. Salvaterra (2023). Robust RT-qPCR Data Normalization: Validation and Selection of Internal Reference Genes during Post-Experimental Data Analysis [Dataset]. http://doi.org/10.1371/journal.pone.0017762
    Explore at:
    tiffAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Daijun Ling; Paul M. Salvaterra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Reverse transcription and real-time PCR (RT-qPCR) has been widely used for rapid quantification of relative gene expression. To offset technical confounding variations, stably-expressed internal reference genes are measured simultaneously along with target genes for data normalization. Statistic methods have been developed for reference validation; however normalization of RT-qPCR data still remains arbitrary due to pre-experimental determination of particular reference genes. To establish a method for determination of the most stable normalizing factor (NF) across samples for robust data normalization, we measured the expression of 20 candidate reference genes and 7 target genes in 15 Drosophila head cDNA samples using RT-qPCR. The 20 reference genes exhibit sample-specific variation in their expression stability. Unexpectedly the NF variation across samples does not exhibit a continuous decrease with pairwise inclusion of more reference genes, suggesting that either too few or too many reference genes may detriment the robustness of data normalization. The optimal number of reference genes predicted by the minimal and most stable NF variation differs greatly from 1 to more than 10 based on particular sample sets. We also found that GstD1, InR and Hsp70 expression exhibits an age-dependent increase in fly heads; however their relative expression levels are significantly affected by NF using different numbers of reference genes. Due to highly dependent on actual data, RT-qPCR reference genes thus have to be validated and selected at post-experimental data analysis stage rather than by pre-experimental determination.

  3. n

    Data from: A systematic evaluation of normalization methods and probe...

    • data.niaid.nih.gov
    • dataone.org
    • +2more
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra (2023). A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data [Dataset]. http://doi.org/10.5061/dryad.cnp5hqc7v
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Universidade de São Paulo
    Hospital for Sick Children
    University of Toronto
    Authors
    H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Background The Infinium EPIC array measures the methylation status of > 850,000 CpG sites. The EPIC BeadChip uses a two-array design: Infinium Type I and Type II probes. These probe types exhibit different technical characteristics which may confound analyses. Numerous normalization and pre-processing methods have been developed to reduce probe type bias as well as other issues such as background and dye bias.
    Methods This study evaluates the performance of various normalization methods using 16 replicated samples and three metrics: absolute beta-value difference, overlap of non-replicated CpGs between replicate pairs, and effect on beta-value distributions. Additionally, we carried out Pearson’s correlation and intraclass correlation coefficient (ICC) analyses using both raw and SeSAMe 2 normalized data.
    Results The method we define as SeSAMe 2, which consists of the application of the regular SeSAMe pipeline with an additional round of QC, pOOBAH masking, was found to be the best-performing normalization method, while quantile-based methods were found to be the worst performing methods. Whole-array Pearson’s correlations were found to be high. However, in agreement with previous studies, a substantial proportion of the probes on the EPIC array showed poor reproducibility (ICC < 0.50). The majority of poor-performing probes have beta values close to either 0 or 1, and relatively low standard deviations. These results suggest that probe reliability is largely the result of limited biological variation rather than technical measurement variation. Importantly, normalizing the data with SeSAMe 2 dramatically improved ICC estimates, with the proportion of probes with ICC values > 0.50 increasing from 45.18% (raw data) to 61.35% (SeSAMe 2). Methods

    Study Participants and Samples

    The whole blood samples were obtained from the Health, Well-being and Aging (Saúde, Ben-estar e Envelhecimento, SABE) study cohort. SABE is a cohort of census-withdrawn elderly from the city of São Paulo, Brazil, followed up every five years since the year 2000, with DNA first collected in 2010. Samples from 24 elderly adults were collected at two time points for a total of 48 samples. The first time point is the 2010 collection wave, performed from 2010 to 2012, and the second time point was set in 2020 in a COVID-19 monitoring project (9±0.71 years apart). The 24 individuals were 67.41±5.52 years of age (mean ± standard deviation) at time point one; and 76.41±6.17 at time point two and comprised 13 men and 11 women.

    All individuals enrolled in the SABE cohort provided written consent, and the ethic protocols were approved by local and national institutional review boards COEP/FSP/USP OF.COEP/23/10, CONEP 2044/2014, CEP HIAE 1263-10, University of Toronto RIS 39685.

    Blood Collection and Processing

    Genomic DNA was extracted from whole peripheral blood samples collected in EDTA tubes. DNA extraction and purification followed manufacturer’s recommended protocols, using Qiagen AutoPure LS kit with Gentra automated extraction (first time point) or manual extraction (second time point), due to discontinuation of the equipment but using the same commercial reagents. DNA was quantified using Nanodrop spectrometer and diluted to 50ng/uL. To assess the reproducibility of the EPIC array, we also obtained technical replicates for 16 out of the 48 samples, for a total of 64 samples submitted for further analyses. Whole Genome Sequencing data is also available for the samples described above.

    Characterization of DNA Methylation using the EPIC array

    Approximately 1,000ng of human genomic DNA was used for bisulphite conversion. Methylation status was evaluated using the MethylationEPIC array at The Centre for Applied Genomics (TCAG, Hospital for Sick Children, Toronto, Ontario, Canada), following protocols recommended by Illumina (San Diego, California, USA).

    Processing and Analysis of DNA Methylation Data

    The R/Bioconductor packages Meffil (version 1.1.0), RnBeads (version 2.6.0), minfi (version 1.34.0) and wateRmelon (version 1.32.0) were used to import, process and perform quality control (QC) analyses on the methylation data. Starting with the 64 samples, we first used Meffil to infer the sex of the 64 samples and compared the inferred sex to reported sex. Utilizing the 59 SNP probes that are available as part of the EPIC array, we calculated concordance between the methylation intensities of the samples and the corresponding genotype calls extracted from their WGS data. We then performed comprehensive sample-level and probe-level QC using the RnBeads QC pipeline. Specifically, we (1) removed probes if their target sequences overlap with a SNP at any base, (2) removed known cross-reactive probes (3) used the iterative Greedycut algorithm to filter out samples and probes, using a detection p-value threshold of 0.01 and (4) removed probes if more than 5% of the samples having a missing value. Since RnBeads does not have a function to perform probe filtering based on bead number, we used the wateRmelon package to extract bead numbers from the IDAT files and calculated the proportion of samples with bead number < 3. Probes with more than 5% of samples having low bead number (< 3) were removed. For the comparison of normalization methods, we also computed detection p-values using out-of-band probes empirical distribution with the pOOBAH() function in the SeSAMe (version 1.14.2) R package, with a p-value threshold of 0.05, and the combine.neg parameter set to TRUE. In the scenario where pOOBAH filtering was carried out, it was done in parallel with the previously mentioned QC steps, and the resulting probes flagged in both analyses were combined and removed from the data.

    Normalization Methods Evaluated

    The normalization methods compared in this study were implemented using different R/Bioconductor packages and are summarized in Figure 1. All data was read into R workspace as RG Channel Sets using minfi’s read.metharray.exp() function. One sample that was flagged during QC was removed, and further normalization steps were carried out in the remaining set of 63 samples. Prior to all normalizations with minfi, probes that did not pass QC were removed. Noob, SWAN, Quantile, Funnorm and Illumina normalizations were implemented using minfi. BMIQ normalization was implemented with ChAMP (version 2.26.0), using as input Raw data produced by minfi’s preprocessRaw() function. In the combination of Noob with BMIQ (Noob+BMIQ), BMIQ normalization was carried out using as input minfi’s Noob normalized data. Noob normalization was also implemented with SeSAMe, using a nonlinear dye bias correction. For SeSAMe normalization, two scenarios were tested. For both, the inputs were unmasked SigDF Sets converted from minfi’s RG Channel Sets. In the first, which we call “SeSAMe 1”, SeSAMe’s pOOBAH masking was not executed, and the only probes filtered out of the dataset prior to normalization were the ones that did not pass QC in the previous analyses. In the second scenario, which we call “SeSAMe 2”, pOOBAH masking was carried out in the unfiltered dataset, and masked probes were removed. This removal was followed by further removal of probes that did not pass previous QC, and that had not been removed by pOOBAH. Therefore, SeSAMe 2 has two rounds of probe removal. Noob normalization with nonlinear dye bias correction was then carried out in the filtered dataset. Methods were then compared by subsetting the 16 replicated samples and evaluating the effects that the different normalization methods had in the absolute difference of beta values (|β|) between replicated samples.

  4. Example of normalizing the word ‘foooooooooood’ and ‘welllllllllllll’ using...

    • plos.figshare.com
    xls
    Updated Mar 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zainab Mansur; Nazlia Omar; Sabrina Tiun; Eissa M. Alshari (2024). Example of normalizing the word ‘foooooooooood’ and ‘welllllllllllll’ using the proposed method and four other normalization methods. [Dataset]. http://doi.org/10.1371/journal.pone.0299652.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 21, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Zainab Mansur; Nazlia Omar; Sabrina Tiun; Eissa M. Alshari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example of normalizing the word ‘foooooooooood’ and ‘welllllllllllll’ using the proposed method and four other normalization methods.

  5. n

    Methods for normalizing microbiome data: an ecological perspective

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Oct 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger (2018). Methods for normalizing microbiome data: an ecological perspective [Dataset]. http://doi.org/10.5061/dryad.tn8qs35
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 30, 2018
    Dataset provided by
    University of New England
    James Cook University
    Authors
    Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description
    1. Microbiome sequencing data often need to be normalized due to differences in read depths, and recommendations for microbiome analyses generally warn against using proportions or rarefying to normalize data and instead advocate alternatives, such as upper quartile, CSS, edgeR-TMM, or DESeq-VS. Those recommendations are, however, based on studies that focused on differential abundance testing and variance standardization, rather than community-level comparisons (i.e., beta diversity), Also, standardizing the within-sample variance across samples may suppress differences in species evenness, potentially distorting community-level patterns. Furthermore, the recommended methods use log transformations, which we expect to exaggerate the importance of differences among rare OTUs, while suppressing the importance of differences among common OTUs. 2. We tested these theoretical predictions via simulations and a real-world data set. 3. Proportions and rarefying produced more accurate comparisons among communities and were the only methods that fully normalized read depths across samples. Additionally, upper quartile, CSS, edgeR-TMM, and DESeq-VS often masked differences among communities when common OTUs differed, and they produced false positives when rare OTUs differed. 4. Based on our simulations, normalizing via proportions may be superior to other commonly used methods for comparing ecological communities.
  6. f

    DataSheet1_TimeNorm: a novel normalization method for time course microbiome...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    An, Lingling; Lu, Meng; Butt, Hamza; Luo, Qianwen; Du, Ruofei; Lytal, Nicholas; Jiang, Hongmei (2024). DataSheet1_TimeNorm: a novel normalization method for time course microbiome data.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001407445
    Explore at:
    Dataset updated
    Sep 24, 2024
    Authors
    An, Lingling; Lu, Meng; Butt, Hamza; Luo, Qianwen; Du, Ruofei; Lytal, Nicholas; Jiang, Hongmei
    Description

    Metagenomic time-course studies provide valuable insights into the dynamics of microbial systems and have become increasingly popular alongside the reduction in costs of next-generation sequencing technologies. Normalization is a common but critical preprocessing step before proceeding with downstream analysis. To the best of our knowledge, currently there is no reported method to appropriately normalize microbial time-series data. We propose TimeNorm, a novel normalization method that considers the compositional property and time dependency in time-course microbiome data. It is the first method designed for normalizing time-series data within the same time point (intra-time normalization) and across time points (bridge normalization), separately. Intra-time normalization normalizes microbial samples under the same condition based on common dominant features. Bridge normalization detects and utilizes a group of most stable features across two adjacent time points for normalization. Through comprehensive simulation studies and application to a real study, we demonstrate that TimeNorm outperforms existing normalization methods and boosts the power of downstream differential abundance analysis.

  7. Example of normalizing the word ‘aaaaaaannnnnndddd’ using the proposed...

    • plos.figshare.com
    xls
    Updated Mar 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zainab Mansur; Nazlia Omar; Sabrina Tiun; Eissa M. Alshari (2024). Example of normalizing the word ‘aaaaaaannnnnndddd’ using the proposed method and four other normalization methods. [Dataset]. http://doi.org/10.1371/journal.pone.0299652.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 21, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Zainab Mansur; Nazlia Omar; Sabrina Tiun; Eissa M. Alshari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example of normalizing the word ‘aaaaaaannnnnndddd’ using the proposed method and four other normalization methods.

  8. G

    Security Data Normalization Platform Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Security Data Normalization Platform Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/security-data-normalization-platform-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Aug 22, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Security Data Normalization Platform Market Outlook



    According to our latest research, the global Security Data Normalization Platform market size reached USD 1.87 billion in 2024, driven by the rapid escalation of cyber threats and the growing complexity of enterprise security infrastructures. The market is expected to grow at a robust CAGR of 12.5% during the forecast period, reaching an estimated USD 5.42 billion by 2033. Growth is primarily fueled by the increasing adoption of advanced threat intelligence solutions, regulatory compliance demands, and the proliferation of connected devices across various industries.




    The primary growth factor for the Security Data Normalization Platform market is the exponential rise in cyberattacks and security breaches across all sectors. Organizations are increasingly realizing the importance of normalizing diverse security data sources to enable efficient threat detection, incident response, and compliance management. As security environments become more complex with the integration of cloud, IoT, and hybrid infrastructures, the need for platforms that can aggregate, standardize, and correlate data from disparate sources has become paramount. This trend is particularly pronounced in sectors such as BFSI, healthcare, and government, where data sensitivity and regulatory requirements are highest. The growing sophistication of cyber threats has compelled organizations to invest in robust security data normalization platforms to ensure comprehensive visibility and proactive risk mitigation.




    Another significant driver is the evolving regulatory landscape, which mandates stringent data protection and reporting standards. Regulations such as the General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA), and various national cybersecurity frameworks have compelled organizations to enhance their security postures. Security data normalization platforms play a crucial role in facilitating compliance by providing unified and actionable insights from heterogeneous data sources. These platforms enable organizations to automate compliance reporting, streamline audit processes, and reduce the risk of penalties associated with non-compliance. The increasing focus on regulatory alignment is pushing both large enterprises and SMEs to adopt advanced normalization solutions as part of their broader security strategies.




    The proliferation of digital transformation initiatives and the accelerated adoption of cloud-based solutions are further propelling market growth. As organizations migrate critical workloads to the cloud and embrace remote work models, the volume and variety of security data have surged dramatically. This shift has created new challenges in terms of data integration, normalization, and real-time analysis. Security data normalization platforms equipped with advanced analytics and machine learning capabilities are becoming indispensable for managing the scale and complexity of modern security environments. Vendors are responding to this demand by offering scalable, cloud-native solutions that can seamlessly integrate with existing security information and event management (SIEM) systems, threat intelligence platforms, and incident response tools.




    From a regional perspective, North America continues to dominate the Security Data Normalization Platform market, accounting for the largest revenue share in 2024. The region’s leadership is attributed to the high concentration of technology-driven enterprises, robust cybersecurity regulations, and significant investments in advanced security infrastructure. Europe and Asia Pacific are also witnessing strong growth, driven by increasing digitalization, rising threat landscapes, and the adoption of stringent data protection laws. Emerging markets in Latin America and the Middle East & Africa are gradually catching up, supported by growing awareness of cybersecurity challenges and the need for standardized security data management solutions.





    Component Analysis



  9. d

    Data from: Evaluation of normalization procedures for oligonucleotide array...

    • catalog.data.gov
    • odgavaprod.ogopendata.com
    • +1more
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). Evaluation of normalization procedures for oligonucleotide array data based on spiked cRNA controls [Dataset]. https://catalog.data.gov/dataset/evaluation-of-normalization-procedures-for-oligonucleotide-array-data-based-on-spiked-crna
    Explore at:
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Background Affymetrix oligonucleotide arrays simultaneously measure the abundances of thousands of mRNAs in biological samples. Comparability of array results is necessary for the creation of large-scale gene expression databases. The standard strategy for normalizing oligonucleotide array readouts has practical drawbacks. We describe alternative normalization procedures for oligonucleotide arrays based on a common pool of known biotin-labeled cRNAs spiked into each hybridization. Results We first explore the conditions for validity of the 'constant mean assumption', the key assumption underlying current normalization methods. We introduce 'frequency normalization', a 'spike-in'-based normalization method which estimates array sensitivity, reduces background noise and allows comparison between array designs. This approach does not rely on the constant mean assumption and so can be effective in conditions where standard procedures fail. We also define 'scaled frequency', a hybrid normalization method relying on both spiked transcripts and the constant mean assumption while maintaining all other advantages of frequency normalization. We compare these two procedures to a standard global normalization method using experimental data. We also use simulated data to estimate accuracy and investigate the effects of noise. We find that scaled frequency is as reproducible and accurate as global normalization while offering several practical advantages. Conclusions Scaled frequency quantitation is a convenient, reproducible technique that performs as well as global normalization on serial experiments with the same array design, while offering several additional features. Specifically, the scaled-frequency method enables the comparison of expression measurements across different array designs, yields estimates of absolute message abundance in cRNA and determines the sensitivity of individual arrays.

  10. f

    Data from: Bayesian Inference in the Presence of Intractable Normalizing...

    • datasetcatalog.nlm.nih.gov
    • tandf.figshare.com
    Updated Mar 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Park, Jaewoo; Haran, Murali (2018). Bayesian Inference in the Presence of Intractable Normalizing Functions [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000677988
    Explore at:
    Dataset updated
    Mar 14, 2018
    Authors
    Park, Jaewoo; Haran, Murali
    Description

    Models with intractable normalizing functions arise frequently in statistics. Common examples of such models include exponential random graph models for social networks and Markov point processes for ecology and disease modeling. Inference for these models is complicated because the normalizing functions of their probability distributions include the parameters of interest. In Bayesian analysis, they result in so-called doubly intractable posterior distributions which pose significant computational challenges. Several Monte Carlo methods have emerged in recent years to address Bayesian inference for such models. We provide a framework for understanding the algorithms, and elucidate connections among them. Through multiple simulated and real data examples, we compare and contrast the computational and statistical efficiency of these algorithms and discuss their theoretical bases. Our study provides practical recommendations for practitioners along with directions for future research for Markov chain Monte Carlo (MCMC) methodologists. Supplementary materials for this article are available online.

  11. Z

    Data from: Electromagnetic Calorimeter Shower Images of CaloFlow

    • data.niaid.nih.gov
    • producciocientifica.uv.es
    Updated Jul 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krause, Claudius; Shih, David (2023). Electromagnetic Calorimeter Shower Images of CaloFlow [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5904187
    Explore at:
    Dataset updated
    Jul 22, 2023
    Dataset provided by
    Rutgers University
    Authors
    Krause, Claudius; Shih, David
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These are the calorimeter showers that were used to train and evaluate the normalizing flows of "CaloFlow: Fast and Accurate Generation of Calorimeter Showers with Normalizing Flows" and "CaloFlow II: Even Faster and Still Accurate Generation of Calorimeter Showers with Normalizing Flows". The training and evaluation scripts can be found in this git repository.

    The samples were created with the same GEANT4 configuration file as the original CaloGAN samples. Said configuration can be found at the CaloGAN repository; the original CaloGAN samples are available at this DOI.

    Samples for each particle (eplus, gamma, piplus) are stored in a separate .tar.gz file. Each tarball contains the following files:

    train_particle.hdf5: 70,000 events used to train CaloFlow I and II.

    test_particle.hdf5: 30,000 events used for model selection of CaloFlow I and II.

    train_cls_particle.hdf5: 60,000 events used to train the evaluation classifier.

    val_cls_particle.hdf5: 20,000 events used for model selection and calibration of the evaluation classifier.

    test_cls_particle.hdf5: 20,000 events used for the evaluation run of the evaluation classifier.

    Each .hdf5 file has the same structure as the original CaloGAN data.

  12. A comparison of per sample global scaling and per gene normalization methods...

    • plos.figshare.com
    pdf
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaohong Li; Guy N. Brock; Eric C. Rouchka; Nigel G. F. Cooper; Dongfeng Wu; Timothy E. O’Toole; Ryan S. Gill; Abdallah M. Eteleeb; Liz O’Brien; Shesh N. Rai (2023). A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data [Dataset]. http://doi.org/10.1371/journal.pone.0176185
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xiaohong Li; Guy N. Brock; Eric C. Rouchka; Nigel G. F. Cooper; Dongfeng Wu; Timothy E. O’Toole; Ryan S. Gill; Abdallah M. Eteleeb; Liz O’Brien; Shesh N. Rai
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Normalization is an essential step with considerable impact on high-throughput RNA sequencing (RNA-seq) data analysis. Although there are numerous methods for read count normalization, it remains a challenge to choose an optimal method due to multiple factors contributing to read count variability that affects the overall sensitivity and specificity. In order to properly determine the most appropriate normalization methods, it is critical to compare the performance and shortcomings of a representative set of normalization routines based on different dataset characteristics. Therefore, we set out to evaluate the performance of the commonly used methods (DESeq, TMM-edgeR, FPKM-CuffDiff, TC, Med UQ and FQ) and two new methods we propose: Med-pgQ2 and UQ-pgQ2 (per-gene normalization after per-sample median or upper-quartile global scaling). Our per-gene normalization approach allows for comparisons between conditions based on similar count levels. Using the benchmark Microarray Quality Control Project (MAQC) and simulated datasets, we performed differential gene expression analysis to evaluate these methods. When evaluating MAQC2 with two replicates, we observed that Med-pgQ2 and UQ-pgQ2 achieved a slightly higher area under the Receiver Operating Characteristic Curve (AUC), a specificity rate > 85%, the detection power > 92% and an actual false discovery rate (FDR) under 0.06 given the nominal FDR (≤0.05). Although the top commonly used methods (DESeq and TMM-edgeR) yield a higher power (>93%) for MAQC2 data, they trade off with a reduced specificity (

  13. G

    EV Charging Data Normalization Middleware Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). EV Charging Data Normalization Middleware Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/ev-charging-data-normalization-middleware-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Oct 7, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    EV Charging Data Normalization Middleware Market Outlook



    According to our latest research, the global EV Charging Data Normalization Middleware market size reached USD 1.12 billion in 2024, reflecting a strong surge in adoption across the electric vehicle ecosystem. The market is projected to expand at a robust CAGR of 18.7% from 2025 to 2033, reaching a forecasted size of USD 5.88 billion by 2033. This remarkable growth is primarily driven by the exponential increase in electric vehicle (EV) adoption, the proliferation of charging infrastructure, and the need for seamless interoperability and data integration across disparate charging networks and platforms.




    One of the primary growth factors fueling the EV Charging Data Normalization Middleware market is the rapid expansion of EV charging networks, both public and private, on a global scale. As governments and private entities accelerate investments in EV infrastructure to meet ambitious decarbonization and electrification goals, the resulting diversity of hardware, software, and communication protocols creates a fragmented ecosystem. Middleware solutions play a crucial role in standardizing and normalizing data from these heterogeneous sources, enabling unified management, real-time analytics, and efficient billing processes. The demand for robust data normalization is further amplified by the increasing complexity of charging scenarios, such as dynamic pricing, vehicle-to-grid (V2G) integration, and multi-operator roaming, all of which require seamless data interoperability.




    Another significant driver is the rising emphasis on data-driven decision-making and predictive analytics within the EV charging sector. Stakeholders, including automotive OEMs, charging network operators, and energy providers, are leveraging normalized data to optimize charging station utilization, forecast energy demand, and enhance customer experiences. With the proliferation of IoT-enabled charging stations and smart grid initiatives, the volume and variety of data generated have grown exponentially. Middleware platforms equipped with advanced data normalization capabilities are essential for aggregating, cleansing, and harmonizing this data, thereby unlocking actionable insights and supporting the development of innovative value-added services. This trend is expected to further intensify as the industry moves towards integrated energy management and smart city initiatives.




    The regulatory landscape is also playing a pivotal role in shaping the EV Charging Data Normalization Middleware market. Governments across regions are introducing mandates for open data standards, interoperability, and secure data exchange to foster competition, enhance consumer choice, and ensure grid stability. These regulatory requirements are compelling market participants to adopt middleware solutions that facilitate compliance and enable seamless integration with national and regional charging infrastructure registries. Furthermore, the emergence of industry consortia and standardization bodies is accelerating the development and adoption of common data models and APIs, further boosting the demand for middleware platforms that can adapt to evolving standards and regulatory frameworks.




    Regionally, Europe and North America are at the forefront of market adoption, driven by mature EV markets, supportive policy frameworks, and advanced digital infrastructure. However, Asia Pacific is emerging as the fastest-growing region, propelled by aggressive electrification targets, large-scale urbanization, and significant investments in smart mobility solutions. Latin America and the Middle East & Africa, while currently at a nascent stage, are expected to witness accelerated growth as governments and private players ramp up efforts to expand EV charging networks and embrace digital transformation. The interplay of these regional dynamics is shaping a highly competitive and innovation-driven global market landscape.





    Component Analysis



    The Component segment of the EV C

  14. AdventureWorks 2022 Denormalized

    • kaggle.com
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhavesh J (2024). AdventureWorks 2022 Denormalized [Dataset]. https://www.kaggle.com/datasets/bjaising/adventureworks-2022-denormalized
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bhavesh J
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Adventure Works 2022 Denormalized dataset

    How this Dataset is created?

    The CSV data was sourced from the existing Kaggle dataset titled "Adventure Works 2022" by Algorismus. This data was normalized and consisted of seven individual CSV files. The Sales table served as a fact table that connected to other dimensions. To consolidate all the data into a single table, it was loaded into a SQLite database and transformed accordingly. The final denormalized table was then exported as a single CSV file (delimited by | ), and the column names were updated to follow snake_case style.

    DOI

    doi.org/10.6084/m9.figshare.27899706

    Data Dictionary

    Column NameDescription
    sales_order_numberUnique identifier for each sales order.
    sales_order_dateThe date and time when the sales order was placed. (e.g., Friday, August 25, 2017)
    sales_order_date_day_of_weekThe day of the week when the sales order was placed (e.g., Monday, Tuesday).
    sales_order_date_monthThe month when the sales order was placed (e.g., January, February).
    sales_order_date_dayThe day of the month when the sales order was placed (1-31).
    sales_order_date_yearThe year when the sales order was placed (e.g., 2022).
    quantityThe number of units sold in the sales order.
    unit_priceThe price per unit of the product sold.
    total_salesThe total sales amount for the sales order (quantity * unit price).
    costThe total cost associated with the products sold in the sales order.
    product_keyUnique identifier for the product sold.
    product_nameThe name of the product sold.
    reseller_keyUnique identifier for the reseller.
    reseller_nameThe name of the reseller.
    reseller_business_typeThe type of business of the reseller (e.g., Warehouse, Value Reseller, Specialty Bike Shop).
    reseller_cityThe city where the reseller is located.
    reseller_stateThe state where the reseller is located.
    reseller_countryThe country where the reseller is located.
    employee_keyUnique identifier for the employee associated with the sales order.
    employee_idThe ID of the employee who processed the sales order.
    salesperson_fullnameThe full name of the salesperson associated with the sales order.
    salesperson_titleThe title of the salesperson (e.g., North American Sales Manager, Sales Representative).
    email_addressThe email address of the salesperson.
    sales_territory_keyUnique identifier for the sales territory for the actual sale. (e.g. 3)
    assigned_sales_territoryList of sales_territory_key separated by comma assigned to the salesperson. (e.g., 3,4)
    sales_territory_regionThe region of the sales territory. US territory broken down in regions. International regions listed as country name (e.g., Northeast, France).
    sales_territory_countryThe country associated with the sales territory.
    sales_territory_groupThe group classification of the sales territory. (e.g., Europe, North America, Pacific)
    targetThe ...
  15. E

    Data from: Dataset of normalised Slovene text KonvNormSl 1.0

    • live.european-language-grid.eu
    binary format
    Updated Sep 18, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Dataset of normalised Slovene text KonvNormSl 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/8217
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Sep 18, 2016
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Data used in the experiments described in:

    Nikola Ljubešić, Katja Zupan, Darja Fišer and Tomaž Erjavec: Normalising Slovene data: historical texts vs. user-generated content. Proceedings of KONVENS 2016, September 19–21, 2016, Bochum, Germany.

    https://www.linguistics.rub.de/konvens16/pub/19_konvensproc.pdf

    (https://www.linguistics.rub.de/konvens16/)

    Data are split into the "token" folder (experiment on normalising individual tokens) and "segment" folder (experiment on normalising whole segments of text, i.e. sentences or tweets). Each experiment folder contains the "train", "dev" and "test" subfolders. Each subfolder contains two files for each sample, the original data (.orig.txt) and the data with hand-normalised words (.norm.txt). The files are aligned by lines.

    There are four datasets:

    - goo300k-bohoric: historical Slovene, hard case (<1850)

    - goo300k-gaj: historical Slovene, easy case (1850 - 1900)

    - tweet-L3: Slovene tweets, hard case (non-standard language)

    - tweet-L1: Slovene tweets, easy case (mostly standard language)

    The goo300k data come from http://hdl.handle.net/11356/1025, while the tweet data originate from the JANES project (http://nl.ijs.si/janes/english/).

    The text in the files has been split by inserting spaces between characters, with underscore (_) substituting the space character. Tokens not relevant for normalisation (e.g. URLs, hashtags) have been substituted by the inverted question mark '¿' character.

  16. (Supplementary Table S4) Bacteriohopanepolyol data (normalised to dry...

    • doi.pangaea.de
    • search.dataone.org
    html, tsv
    Updated Jun 28, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enno Schefuß; Timothy Ian Eglinton; Ricardo De Pol-Holz; Helen M Talbot; Pieter Meiert Grootes; Ralph R Schneider; Charlotte L Spencer-Jones; Jürgen Rullkötter (2016). (Supplementary Table S4) Bacteriohopanepolyol data (normalised to dry sediment) and TOC of additional samples [Dataset]. http://doi.org/10.1594/PANGAEA.862044
    Explore at:
    tsv, htmlAvailable download formats
    Dataset updated
    Jun 28, 2016
    Dataset provided by
    PANGAEA
    Authors
    Enno Schefuß; Timothy Ian Eglinton; Ricardo De Pol-Holz; Helen M Talbot; Pieter Meiert Grootes; Ralph R Schneider; Charlotte L Spencer-Jones; Jürgen Rullkötter
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Time period covered
    Jun 20, 2000
    Area covered
    Variables measured
    Depth, top/min, Depth, bottom/max, DEPTH, sediment/rock, Carbon, organic, total, Aminotriol, per unit mass total organic carbon, Aminopentol, per unit mass total organic carbon, Aminotetrol, per unit mass total organic carbon
    Description

    This dataset is about: (Supplementary Table S4) Bacteriohopanepolyol data (normalised to dry sediment) and TOC of additional samples. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.862021 for more information.

  17. Normalization for Relative Quantification of mRNA and microRNA in Soybean...

    • plos.figshare.com
    tiff
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weican Liu; Yu Deng; Yonggang Zhou; Huan Chen; Yuanyuan Dong; Nan Wang; Xiaowei Li; Aysha Jameel; He Yang; Min Zhang; Kai Chen; Fawei Wang; Haiyan Li (2023). Normalization for Relative Quantification of mRNA and microRNA in Soybean Exposed to Various Abiotic Stresses [Dataset]. http://doi.org/10.1371/journal.pone.0155606
    Explore at:
    tiffAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Weican Liu; Yu Deng; Yonggang Zhou; Huan Chen; Yuanyuan Dong; Nan Wang; Xiaowei Li; Aysha Jameel; He Yang; Min Zhang; Kai Chen; Fawei Wang; Haiyan Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Plant microRNAs are small non-coding, endogenic RNA molecule (containing 20–24 nucleotides) produced from miRNA precursors (pri-miRNA and pre-miRNA). Evidence suggests that up and down regulation of the miRNA targets the mRNA genes involved in resistance against biotic and abiotic stresses. Reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) is a powerful technique to analyze variations in mRNA levels. Normalizing the data using reference genes is essential for the analysis of reliable RT-qPCR data. In this study, two groups of candidate reference mRNAs and miRNAs in soybean leaves and roots treated with various abiotic stresses (PEG-simulated drought, salinity, alkalinity, salinity+alkalinity, and abscisic acid) were analyzed by RT-qPCR. We analyzed the most appropriate reference mRNA/miRNAs using the geNorm, NormFinder, and BestKeeper algorithms. According to the results, Act and EF1b were the most suitable reference mRNAs in leaf and root samples, for mRNA and miRNA precursor data normalization. The most suitable reference miRNAs found in leaf and root samples were 166a and 167a for mature miRNA data normalization. Hence the best combinations of reference mRNAs for mRNA and miRNA precursor data normalization were EF1a + Act or EF1b + Act in leaf samples, and EF1a + EF1b or 60s + EF1b in root samples. For mature miRNA data normalization, the most suitable combinations of reference miRNAs were 166a + 167d in leaf samples, and 171a + 156a or 167a + 171a in root samples. We identified potential reference mRNA/miRNAs for accurate RT-qPCR data normalization for mature miRNA, miRNA precursors, and their targeted mRNAs. Our results promote miRNA-based studies on soybean plants exposed to abiotic stress conditions.

  18. DEMANDE Dataset

    • zenodo.org
    • researchdiscovery.drexel.edu
    zip
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph A. Gallego-Mejia; Joseph A. Gallego-Mejia; Fabio A Gonzalez; Fabio A Gonzalez (2023). DEMANDE Dataset [Dataset]. http://doi.org/10.5281/zenodo.7822851
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 13, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Joseph A. Gallego-Mejia; Joseph A. Gallego-Mejia; Fabio A Gonzalez; Fabio A Gonzalez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the features and probabilites of ten different functions. Each dataset is saved using numpy arrays. \item The data set \textit{Arc} corresponds to a two-dimensional random sample drawn from a random vector $$X=(X_1,X_2)$$ with probability density function given by $$f(x_1,x_2)=\mathcal{N}(x_2|0,4)\mathcal{N}(x_1|0.25x_2^2,1)$$ where $$\mathcal{N}(u|\mu,\sigma^2)$$ denotes the density function of a normal distribution with mean $$\mu$$ and variance $$\sigma^2$$. \cite{Papamakarios2017} used this data set to evaluate his neural density estimation methods. \item The data set \textit{Potential 1} corresponds to a two-dimensional random sample drawn from a random vector $$X=(X_1,X_2)$$ with probability density function given by $$f(x_1,x_2)=\frac{1}{2}\left(\frac{||x||-2}{0.4}\right)^2 - \ln{\left(\exp\left\{-\frac{1}{2}\left[\frac{x_1-2}{0.6}\right]^2\right\}+\exp\left\{-\frac{1}{2}\left[\frac{x_1+2}{0.6}\right]^2\right\}\right)}$$ with a normalizing constant of approximately 6.52 calculated by Monte Carlo integration. \item The data set \textit{Potential 2} corresponds to a two-dimensional random sample drawn from a random vector $$X=(X_1,X_2)$$ with probability density function given by $$f(x_1,x_2)=\frac{1}{2}\left[ \frac{x_2-w_1(x)}{0.4}\right]^2$$ where $$w_1(x)=\sin{(\frac{2\pi x_1}{4})}$$ with a normalizing constant of approximately 8 calculated by Monte Carlo integration. \item The data set \textit{Potential 3} corresponds to a two-dimensional random sample drawn from a random vector $$x=(X_1,X_2)$$ with probability density function given by $$f(x_1,x_2)= - \ln{\left(\exp\left\{-\frac{1}{2}\left[\frac{x_2-w_1(x)}{0.35}\right]^2\right\}+\exp\left\{-\frac{1}{2}\left[\frac{x_2-w_1(x)+w_2(x)}{0.35}^2\right]\right\}\right)}$$ where $$w_1(x)=\sin{(\frac{2\pi x_1}{4})}$$ and $$w_2(x)=3 \exp \left\{-\frac{1}{2}\left[ \frac{x_1-1}{0.6}\right]^2\right\}$$ with a normalizing constant of approximately 13.9 calculated by Monte Carlo integration. \item The data set \textit{Potential 4} corresponds to a two-dimensional random sample drawn from a random vector $$x=(X_1,X_2)$$ with probability density function given by $$f(x_1,x_2)= - \ln{\left(\exp\left\{-\frac{1}{2}\left[\frac{x_2-w_1(x)}{0.4}\right]^2\right\}+\exp\left\{-\frac{1}{2}\left[\frac{x_2-w_1(x)+w_3(x)}{0.35}^2\right]\right\}\right)}$$ where $$w_1(x)=\sin{(\frac{2\pi x_1}{4})}$$, $$w_3(x)=3 \sigma \left(\left[ \frac{x_1-1}{0.3}\right]^2\right)$$, and $$\sigma(x)= \frac{1}{1+\exp(x)}$$ with a normalizing constant of approximately 13.9 calculated by Monte Carlo integration. \item The data set \textit{2D mixture} corresponds to a two-dimensional random sample drawn from the random vector $$x=(X_1, X_2)$$ with a probability density function given by $$f(x) = \frac{1}{2}\mathcal{N}(x|\mu_1,\Sigma_1) + \frac{1}{2}\mathcal{N}(x|\mu_2,\Sigma_2)$$ with means and covariance matrices $$\mu_1 = [1, -1]^T$$, $$\mu_2 = [-2, 2]^T$$, $$\Sigma_1=\left[\begin{array}{cc} 1 & 0 \\ 0 & 2 \end{array}\right]$$, and $$\Sigma_1=\left[\begin{array}{cc} 2 & 0 \\ 0 & 1 \end{array}\right]$$ \item The data set \textit{10D-mixture} corresponds to a 10-dimensional random sample drawn from the random vector $$x=(X_1,\cdots,X_{10})$$ with a mixture of four diagonal normal probability density functions $$\mathcal{N}(X_i|\mu_i, \sigma_i)$$, where each $$\mu_i$$ is drawn uniformly in the interval $$[-0.5,0.5]$$, and the $$\sigma_i$$ is drawn uniformly in the interval $$[-0.01, 0.5]$$. Each diagonal normal probability density has the same probability of being drawn $$1/4$$.

  19. G

    Building Telemetry Normalization Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Oct 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Building Telemetry Normalization Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/building-telemetry-normalization-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Oct 4, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Building Telemetry Normalization Market Outlook



    According to our latest research, the global Building Telemetry Normalization market size reached USD 2.59 billion in 2024, reflecting the growing adoption of intelligent building management solutions worldwide. The market is experiencing robust expansion with a recorded CAGR of 13.2% from 2025 through 2033, and is forecasted to reach an impressive USD 7.93 billion by 2033. This strong growth trajectory is driven by increasing demand for energy-efficient infrastructure, the proliferation of smart city initiatives, and the need for seamless integration of building systems to enhance operational efficiency and sustainability.



    One of the primary growth factors for the Building Telemetry Normalization market is the accelerating shift towards smart building ecosystems. As commercial, industrial, and residential structures become more interconnected, the volume and diversity of telemetry data generated by various building systems—such as HVAC, lighting, security, and energy management—have surged. Organizations are recognizing the value of normalizing this data to enable unified analytics, real-time monitoring, and automated decision-making. The need for interoperability among heterogeneous devices and platforms is compelling property owners and facility managers to invest in advanced telemetry normalization solutions, which streamline data collection, enhance system compatibility, and support predictive maintenance strategies.



    Another significant driver is the increasing emphasis on sustainability and regulatory compliance. Governments and industry bodies worldwide are introducing stringent mandates for energy efficiency, carbon emission reduction, and occupant safety in built environments. Building telemetry normalization plays a crucial role in helping stakeholders aggregate, standardize, and analyze data from disparate sources, thereby enabling them to monitor compliance, optimize resource consumption, and generate actionable insights for green building certifications. The trend towards net-zero energy buildings and the integration of renewable energy sources is further propelling the adoption of telemetry normalization platforms, as they facilitate seamless data exchange and holistic performance benchmarking.



    The rapid advancement of digital technologies, including IoT, edge computing, and artificial intelligence, is also transforming the landscape of the Building Telemetry Normalization market. Modern buildings are increasingly equipped with a multitude of connected sensors, controllers, and actuators, generating vast amounts of telemetry data. The normalization of this data is essential for unlocking its full potential, enabling advanced analytics, anomaly detection, and automated system optimization. The proliferation of cloud-based solutions and scalable architectures is making telemetry normalization more accessible and cost-effective, even for small and medium-sized enterprises. As a result, the market is witnessing heightened competition and innovation, with vendors focusing on user-friendly interfaces, robust security features, and seamless integration capabilities.



    From a regional perspective, North America currently leads the Building Telemetry Normalization market, driven by widespread adoption of smart building technologies, substantial investments in infrastructure modernization, and a strong focus on sustainability. Europe follows closely, benefiting from progressive energy efficiency regulations and a mature building automation ecosystem. The Asia Pacific region is emerging as the fastest-growing market, fueled by rapid urbanization, government-led smart city projects, and increasing awareness of the benefits of intelligent building management. Latin America and the Middle East & Africa are also witnessing steady growth, supported by ongoing infrastructure development and rising demand for efficient facility operations.





    Component Analysis



    The Component segment of the Building Telemetry Normalization market is categorized into software, hard

  20. Part 2 of real-time testing data for: "Identifying data sources and physical...

    • zenodo.org
    application/gzip
    Updated Aug 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2024). Part 2 of real-time testing data for: "Identifying data sources and physical strategies used by neural networks to predict TC rapid intensification" [Dataset]. http://doi.org/10.5281/zenodo.13272877
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Aug 8, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Each file in the dataset contains machine-learning-ready data for one unique tropical cyclone (TC) from the real-time testing dataset. "Machine-learning-ready" means that all data-processing methods described in the journal paper have already been applied. This includes cropping satellite images to make them TC-centered; rotating satellite images to align them with TC motion (TC motion is always towards the +x-direction, or in the direction of increasing column number); flipping satellite images in the southern hemisphere upside-down; and normalizing data via the two-step procedure.

    The file name gives you the unique identifier of the TC -- e.g., "learning_examples_2010AL01.nc.gz" contains data for storm 2010AL01, or the first North Atlantic storm of the 2010 season. Each file can be read with the method `example_io.read_file` in the ml4tc Python library (https://zenodo.org/doi/10.5281/zenodo.10268620). However, since `example_io.read_file` is a lightweight wrapper for `xarray.open_dataset`, you can equivalently just use `xarray.open_dataset`. Variables in the table are listed below (the same printout produced by `print(xarray_table)`):

    Dimensions: (
    satellite_valid_time_unix_sec: 289,
    satellite_grid_row: 380,
    satellite_grid_column: 540,
    satellite_predictor_name_gridded: 1,
    satellite_predictor_name_ungridded: 16,
    ships_valid_time_unix_sec: 19,
    ships_storm_object_index: 19,
    ships_forecast_hour: 23,
    ships_intensity_threshold_m_s01: 21,
    ships_lag_time_hours: 5,
    ships_predictor_name_lagged: 17,
    ships_predictor_name_forecast: 129)
    Coordinates:
    * satellite_grid_row (satellite_grid_row) int32 2kB ...
    * satellite_grid_column (satellite_grid_column) int32 2kB ...
    * satellite_valid_time_unix_sec (satellite_valid_time_unix_sec) int32 1kB ...
    * ships_lag_time_hours (ships_lag_time_hours) float64 40B ...
    * ships_intensity_threshold_m_s01 (ships_intensity_threshold_m_s01) float64 168B ...
    * ships_forecast_hour (ships_forecast_hour) int32 92B ...
    * satellite_predictor_name_gridded (satellite_predictor_name_gridded) object 8B ...
    * satellite_predictor_name_ungridded (satellite_predictor_name_ungridded) object 128B ...
    * ships_valid_time_unix_sec (ships_valid_time_unix_sec) int32 76B ...
    * ships_predictor_name_lagged (ships_predictor_name_lagged) object 136B ...
    * ships_predictor_name_forecast (ships_predictor_name_forecast) object 1kB ...
    Dimensions without coordinates: ships_storm_object_index
    Data variables:
    satellite_number (satellite_valid_time_unix_sec) int32 1kB ...
    satellite_band_number (satellite_valid_time_unix_sec) int32 1kB ...
    satellite_band_wavelength_micrometres (satellite_valid_time_unix_sec) float64 2kB ...
    satellite_longitude_deg_e (satellite_valid_time_unix_sec) float64 2kB ...
    satellite_cyclone_id_string (satellite_valid_time_unix_sec) |S8 2kB ...
    satellite_storm_type_string (satellite_valid_time_unix_sec) |S2 578B ...
    satellite_storm_name (satellite_valid_time_unix_sec) |S10 3kB ...
    satellite_storm_latitude_deg_n (satellite_valid_time_unix_sec) float64 2kB ...
    satellite_storm_longitude_deg_e (satellite_valid_time_unix_sec) float64 2kB ...
    satellite_storm_intensity_number (satellite_valid_time_unix_sec) float64 2kB ...
    satellite_storm_u_motion_m_s01 (satellite_valid_time_unix_sec) float64 2kB ...
    satellite_storm_v_motion_m_s01 (satellite_valid_time_unix_sec) float64 2kB ...
    satellite_predictors_gridded (satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column, satellite_predictor_name_gridded) float64 474MB ...
    satellite_grid_latitude_deg_n (satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column) float64 474MB ...
    satellite_grid_longitude_deg_e (satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column) float64 474MB ...
    satellite_predictors_ungridded (satellite_valid_time_unix_sec, satellite_predictor_name_ungridded) float64 37kB ...
    ships_storm_intensity_m_s01 (ships_valid_time_unix_sec) float64 152B ...
    ships_storm_type_enum (ships_storm_object_index, ships_forecast_hour) int32 2kB ...
    ships_forecast_latitude_deg_n (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_forecast_longitude_deg_e (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_v_wind_200mb_0to500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_vorticity_850mb_0to1000km_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_vortex_latitude_deg_n (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_vortex_longitude_deg_e (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_mean_tangential_wind_850mb_0to600km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_max_tangential_wind_850mb_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_mean_tangential_wind_1000mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_mean_tangential_wind_850mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_mean_tangential_wind_500mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_mean_tangential_wind_300mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_srh_1000to700mb_200to800km_j_kg01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_srh_1000to500mb_200to800km_j_kg01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ...
    ships_threshold_exceedance_num_6hour_periods (ships_storm_object_index, ships_intensity_threshold_m_s01) int32 2kB ...
    ships_v_motion_observed_m_s01 (ships_storm_object_index) float64 152B ...
    ships_v_motion_1000to100mb_flow_m_s01 (ships_storm_object_index) float64 152B ...
    ships_v_motion_optimal_flow_m_s01 (ships_storm_object_index) float64 152B ...
    ships_cyclone_id_string (ships_storm_object_index) object 152B ...
    ships_storm_latitude_deg_n (ships_storm_object_index) float64 152B ...
    ships_storm_longitude_deg_e (ships_storm_object_index) float64 152B ...
    ships_predictors_lagged (ships_valid_time_unix_sec, ships_lag_time_hours, ships_predictor_name_lagged) float64 13kB ...
    ships_predictors_forecast (ships_valid_time_unix_sec, ships_forecast_hour, ships_predictor_name_forecast) float64 451kB ...

    Variable names are meant to be as self-explanatory as possible. Potentially confusing ones are listed below.

    • The dimension ships_storm_object_index is redundant with the dimension ships_valid_time_unix_sec and can be ignored.
    • ships_forecast_hour ranges up to values that we do not actually use in the paper. Keep in mind that our max forecast hour used in machine learning is 24.
    • The dimension ships_intensity_threshold_m_s01 (and any variable including this dimension) can be ignored.
    • ships_lag_time_hours corresponds to lag times for the SHIPS satellite-based predictors. The only lag time we use in machine learning is "NaN", which is a stand-in for the best available of all lag times. See the discussion of the "priority list" in the paper for more details.
    • Most of the data variables can be ignored, unless you're doing a deep dive into storm properties. The important variables are satellite_predictors_gridded (full satellite images), ships_predictors_lagged (satellite-based SHIPS predictors), and ships_predictors_forecast (environmental and storm-history-based SHIPS predictors). These variables are all discussed in the paper.
    • Every variable name (including elements of the coordinate lists ships_predictor_name_lagged and ships_predictor_name_forecast) includes units at the end. For example, "m_s01" = metres per second; "deg_n" = degrees north; "deg_e" = degrees east; "j_kg01" = Joules per kilogram; ...; etc.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Viesturs Jūlijs Lasmanis; Normunds Grūzītis (2023). LVMED: Dataset of Latvian text normalisation samples for the medical domain [Dataset]. https://repository.clarin.lv/repository/xmlui/handle/20.500.12574/85

Data from: LVMED: Dataset of Latvian text normalisation samples for the medical domain

Related Article
Explore at:
Dataset updated
May 30, 2023
Authors
Viesturs Jūlijs Lasmanis; Normunds Grūzītis
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

The CSV dataset contains sentence pairs for a text-to-text transformation task: given a sentence that contains 0..n abbreviations, rewrite (normalize) the sentence in full words (word forms).

Training dataset: 64,665 sentence pairs Validation dataset: 7,185 sentence pairs. Testing dataset: 7,984 sentence pairs.

All sentences are extracted from a public web corpus (https://korpuss.lv/id/Tīmeklis2020) and contain at least one medical term.

Search
Clear search
Close search
Google apps
Main menu