100+ datasets found
  1. c

    Data from: LVMED: Dataset of Latvian text normalisation samples for the...

    • repository.clarin.lv
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Viesturs Jūlijs Lasmanis; Normunds Grūzītis (2023). LVMED: Dataset of Latvian text normalisation samples for the medical domain [Dataset]. https://repository.clarin.lv/repository/xmlui/handle/20.500.12574/85
    Explore at:
    Dataset updated
    May 30, 2023
    Authors
    Viesturs Jūlijs Lasmanis; Normunds Grūzītis
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The CSV dataset contains sentence pairs for a text-to-text transformation task: given a sentence that contains 0..n abbreviations, rewrite (normalize) the sentence in full words (word forms).

    Training dataset: 64,665 sentence pairs Validation dataset: 7,185 sentence pairs. Testing dataset: 7,984 sentence pairs.

    All sentences are extracted from a public web corpus (https://korpuss.lv/id/Tīmeklis2020) and contain at least one medical term.

  2. N

    Normalizing Service Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Normalizing Service Report [Dataset]. https://www.marketreportanalytics.com/reports/normalizing-service-53022
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Apr 2, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Normalizing Service market is experiencing robust growth, driven by increasing demand for [Insert specific drivers based on your knowledge of the Normalizing Service market, e.g., improved data quality, enhanced data analysis capabilities, rising adoption of cloud-based solutions, stringent data governance regulations]. The market is segmented by application [Insert specific applications, e.g., healthcare, finance, manufacturing] and type [Insert specific types of Normalizing Services, e.g., data cleansing, data transformation, data integration]. While precise market sizing data is unavailable, based on industry trends and comparable markets with similar growth trajectories, a reasonable estimate for the 2025 market size could be placed in the range of $500-750 million USD, with a Compound Annual Growth Rate (CAGR) of approximately 15-20% projected from 2025 to 2033. This growth is expected to be fueled by the continued expansion of big data analytics and the rising need for data standardization across diverse industries. However, challenges such as data security concerns, integration complexities, and high initial investment costs can act as potential restraints on market expansion. Regional analysis suggests a strong presence across North America and Europe, driven by early adoption and robust technological infrastructure. Asia-Pacific is poised for significant growth in the coming years due to increasing digitalization and expanding data centers. The market is highly competitive, with a mix of established players and emerging technology companies vying for market share. Successful players will need to differentiate their offerings through specialized solutions, strategic partnerships, and a focus on addressing specific industry needs. Future growth will depend on advancements in AI and machine learning technologies, further integration with cloud platforms, and the development of user-friendly, scalable solutions.

  3. N

    Normalizing Service Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jan 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Normalizing Service Report [Dataset]. https://www.datainsightsmarket.com/reports/normalizing-service-531581
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Jan 12, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Market Analysis for Normalizing Service The global normalizing service market is anticipated to reach a value of xx million USD by 2033, exhibiting a CAGR of xx% during the forecast period. The market growth is attributed to the rising demand for efficient data management solutions, increased adoption of cloud-based applications, and growing awareness of data normalization techniques. The market size was valued at xx million USD in 2025. North America dominates the market, followed by Europe and Asia Pacific. The market is segmented based on application into banking and financial services, healthcare, retail, manufacturing, and other industries. The banking and financial services segment is expected to hold the largest market share due to the need for data accuracy and compliance with regulatory requirements. In terms of types, the market is divided into data integration and reconciliation, data standardization, and data profiling. Data integration and reconciliation is expected to dominate the market as it helps eliminate inconsistencies and redundancy in data sets. Major players in the market include Infosys, Capgemini, IBM, Accenture, and Wipro. The Normalizing Service Market reached a value of USD 1.16 Billion in 2023 and is poised to grow at a rate of 11.7% during the forecast period, reaching a value of USD 2.23 Billion by 2032.

  4. N

    Normalizing Service Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Apr 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Normalizing Service Report [Dataset]. https://www.marketreportanalytics.com/reports/normalizing-service-53595
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Apr 2, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Normalizing Service market is experiencing robust growth, driven by increasing demand for [insert specific drivers, e.g., improved data quality, enhanced data security, rising adoption of cloud-based solutions]. The market size in 2025 is estimated at $5 billion, projecting a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033. This expansion is fueled by several key trends, including the growing adoption of [insert specific trends, e.g., big data analytics, AI-powered normalization tools, increasing regulatory compliance requirements]. While challenges remain, such as [insert specific restraints, e.g., high implementation costs, data integration complexities, lack of skilled professionals], the market's positive trajectory is expected to continue. Segmentation reveals that the [insert dominant application segment, e.g., financial services] application segment holds the largest market share, with [insert dominant type segment, e.g., cloud-based] solutions demonstrating significant growth. Regional analysis shows a strong presence across North America and Europe, particularly in the United States, United Kingdom, and Germany, driven by early adoption of advanced technologies and robust digital infrastructure. However, emerging markets in Asia-Pacific, particularly in China and India, are exhibiting significant growth potential due to expanding digitalization and increasing data volumes. The competitive landscape is characterized by a mix of established players and emerging companies, leading to innovation and market consolidation. The forecast period (2025-2033) promises continued market expansion, underpinned by technological advancements, increased regulatory pressures, and evolving business needs across diverse industries. The long-term outlook is optimistic, indicating a substantial market opportunity for companies offering innovative and cost-effective Normalizing Services.

  5. n

    Data from: A systematic evaluation of normalization methods and probe...

    • data.niaid.nih.gov
    • dataone.org
    • +2more
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra (2023). A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data [Dataset]. http://doi.org/10.5061/dryad.cnp5hqc7v
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Universidade de São Paulo
    University of Toronto
    Hospital for Sick Children
    Authors
    H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Background The Infinium EPIC array measures the methylation status of > 850,000 CpG sites. The EPIC BeadChip uses a two-array design: Infinium Type I and Type II probes. These probe types exhibit different technical characteristics which may confound analyses. Numerous normalization and pre-processing methods have been developed to reduce probe type bias as well as other issues such as background and dye bias.
    Methods This study evaluates the performance of various normalization methods using 16 replicated samples and three metrics: absolute beta-value difference, overlap of non-replicated CpGs between replicate pairs, and effect on beta-value distributions. Additionally, we carried out Pearson’s correlation and intraclass correlation coefficient (ICC) analyses using both raw and SeSAMe 2 normalized data.
    Results The method we define as SeSAMe 2, which consists of the application of the regular SeSAMe pipeline with an additional round of QC, pOOBAH masking, was found to be the best-performing normalization method, while quantile-based methods were found to be the worst performing methods. Whole-array Pearson’s correlations were found to be high. However, in agreement with previous studies, a substantial proportion of the probes on the EPIC array showed poor reproducibility (ICC < 0.50). The majority of poor-performing probes have beta values close to either 0 or 1, and relatively low standard deviations. These results suggest that probe reliability is largely the result of limited biological variation rather than technical measurement variation. Importantly, normalizing the data with SeSAMe 2 dramatically improved ICC estimates, with the proportion of probes with ICC values > 0.50 increasing from 45.18% (raw data) to 61.35% (SeSAMe 2). Methods

    Study Participants and Samples

    The whole blood samples were obtained from the Health, Well-being and Aging (Saúde, Ben-estar e Envelhecimento, SABE) study cohort. SABE is a cohort of census-withdrawn elderly from the city of São Paulo, Brazil, followed up every five years since the year 2000, with DNA first collected in 2010. Samples from 24 elderly adults were collected at two time points for a total of 48 samples. The first time point is the 2010 collection wave, performed from 2010 to 2012, and the second time point was set in 2020 in a COVID-19 monitoring project (9±0.71 years apart). The 24 individuals were 67.41±5.52 years of age (mean ± standard deviation) at time point one; and 76.41±6.17 at time point two and comprised 13 men and 11 women.

    All individuals enrolled in the SABE cohort provided written consent, and the ethic protocols were approved by local and national institutional review boards COEP/FSP/USP OF.COEP/23/10, CONEP 2044/2014, CEP HIAE 1263-10, University of Toronto RIS 39685.

    Blood Collection and Processing

    Genomic DNA was extracted from whole peripheral blood samples collected in EDTA tubes. DNA extraction and purification followed manufacturer’s recommended protocols, using Qiagen AutoPure LS kit with Gentra automated extraction (first time point) or manual extraction (second time point), due to discontinuation of the equipment but using the same commercial reagents. DNA was quantified using Nanodrop spectrometer and diluted to 50ng/uL. To assess the reproducibility of the EPIC array, we also obtained technical replicates for 16 out of the 48 samples, for a total of 64 samples submitted for further analyses. Whole Genome Sequencing data is also available for the samples described above.

    Characterization of DNA Methylation using the EPIC array

    Approximately 1,000ng of human genomic DNA was used for bisulphite conversion. Methylation status was evaluated using the MethylationEPIC array at The Centre for Applied Genomics (TCAG, Hospital for Sick Children, Toronto, Ontario, Canada), following protocols recommended by Illumina (San Diego, California, USA).

    Processing and Analysis of DNA Methylation Data

    The R/Bioconductor packages Meffil (version 1.1.0), RnBeads (version 2.6.0), minfi (version 1.34.0) and wateRmelon (version 1.32.0) were used to import, process and perform quality control (QC) analyses on the methylation data. Starting with the 64 samples, we first used Meffil to infer the sex of the 64 samples and compared the inferred sex to reported sex. Utilizing the 59 SNP probes that are available as part of the EPIC array, we calculated concordance between the methylation intensities of the samples and the corresponding genotype calls extracted from their WGS data. We then performed comprehensive sample-level and probe-level QC using the RnBeads QC pipeline. Specifically, we (1) removed probes if their target sequences overlap with a SNP at any base, (2) removed known cross-reactive probes (3) used the iterative Greedycut algorithm to filter out samples and probes, using a detection p-value threshold of 0.01 and (4) removed probes if more than 5% of the samples having a missing value. Since RnBeads does not have a function to perform probe filtering based on bead number, we used the wateRmelon package to extract bead numbers from the IDAT files and calculated the proportion of samples with bead number < 3. Probes with more than 5% of samples having low bead number (< 3) were removed. For the comparison of normalization methods, we also computed detection p-values using out-of-band probes empirical distribution with the pOOBAH() function in the SeSAMe (version 1.14.2) R package, with a p-value threshold of 0.05, and the combine.neg parameter set to TRUE. In the scenario where pOOBAH filtering was carried out, it was done in parallel with the previously mentioned QC steps, and the resulting probes flagged in both analyses were combined and removed from the data.

    Normalization Methods Evaluated

    The normalization methods compared in this study were implemented using different R/Bioconductor packages and are summarized in Figure 1. All data was read into R workspace as RG Channel Sets using minfi’s read.metharray.exp() function. One sample that was flagged during QC was removed, and further normalization steps were carried out in the remaining set of 63 samples. Prior to all normalizations with minfi, probes that did not pass QC were removed. Noob, SWAN, Quantile, Funnorm and Illumina normalizations were implemented using minfi. BMIQ normalization was implemented with ChAMP (version 2.26.0), using as input Raw data produced by minfi’s preprocessRaw() function. In the combination of Noob with BMIQ (Noob+BMIQ), BMIQ normalization was carried out using as input minfi’s Noob normalized data. Noob normalization was also implemented with SeSAMe, using a nonlinear dye bias correction. For SeSAMe normalization, two scenarios were tested. For both, the inputs were unmasked SigDF Sets converted from minfi’s RG Channel Sets. In the first, which we call “SeSAMe 1”, SeSAMe’s pOOBAH masking was not executed, and the only probes filtered out of the dataset prior to normalization were the ones that did not pass QC in the previous analyses. In the second scenario, which we call “SeSAMe 2”, pOOBAH masking was carried out in the unfiltered dataset, and masked probes were removed. This removal was followed by further removal of probes that did not pass previous QC, and that had not been removed by pOOBAH. Therefore, SeSAMe 2 has two rounds of probe removal. Noob normalization with nonlinear dye bias correction was then carried out in the filtered dataset. Methods were then compared by subsetting the 16 replicated samples and evaluating the effects that the different normalization methods had in the absolute difference of beta values (|β|) between replicated samples.

  6. n

    Methods for normalizing microbiome data: an ecological perspective

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Oct 30, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger (2018). Methods for normalizing microbiome data: an ecological perspective [Dataset]. http://doi.org/10.5061/dryad.tn8qs35
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 30, 2018
    Dataset provided by
    James Cook University
    University of New England
    Authors
    Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description
    1. Microbiome sequencing data often need to be normalized due to differences in read depths, and recommendations for microbiome analyses generally warn against using proportions or rarefying to normalize data and instead advocate alternatives, such as upper quartile, CSS, edgeR-TMM, or DESeq-VS. Those recommendations are, however, based on studies that focused on differential abundance testing and variance standardization, rather than community-level comparisons (i.e., beta diversity), Also, standardizing the within-sample variance across samples may suppress differences in species evenness, potentially distorting community-level patterns. Furthermore, the recommended methods use log transformations, which we expect to exaggerate the importance of differences among rare OTUs, while suppressing the importance of differences among common OTUs. 2. We tested these theoretical predictions via simulations and a real-world data set. 3. Proportions and rarefying produced more accurate comparisons among communities and were the only methods that fully normalized read depths across samples. Additionally, upper quartile, CSS, edgeR-TMM, and DESeq-VS often masked differences among communities when common OTUs differed, and they produced false positives when rare OTUs differed. 4. Based on our simulations, normalizing via proportions may be superior to other commonly used methods for comparing ecological communities.
  7. f

    Identification of Novel Reference Genes Suitable for qRT-PCR Normalization...

    • plos.figshare.com
    tiff
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yu Hu; Shuying Xie; Jihua Yao (2023). Identification of Novel Reference Genes Suitable for qRT-PCR Normalization with Respect to the Zebrafish Developmental Stage [Dataset]. http://doi.org/10.1371/journal.pone.0149277
    Explore at:
    tiffAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Yu Hu; Shuying Xie; Jihua Yao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Reference genes used in normalizing qRT-PCR data are critical for the accuracy of gene expression analysis. However, many traditional reference genes used in zebrafish early development are not appropriate because of their variable expression levels during embryogenesis. In the present study, we used our previous RNA-Seq dataset to identify novel reference genes suitable for gene expression analysis during zebrafish early developmental stages. We first selected 197 most stably expressed genes from an RNA-Seq dataset (29,291 genes in total), according to the ratio of their maximum to minimum RPKM values. Among the 197 genes, 4 genes with moderate expression levels and the least variation throughout 9 developmental stages were identified as candidate reference genes. Using four independent statistical algorithms (delta-CT, geNorm, BestKeeper and NormFinder), the stability of qRT-PCR expression of these candidates was then evaluated and compared to that of actb1 and actb2, two commonly used zebrafish reference genes. Stability rankings showed that two genes, namely mobk13 (mob4) and lsm12b, were more stable than actb1 and actb2 in most cases. To further test the suitability of mobk13 and lsm12b as novel reference genes, they were used to normalize three well-studied target genes. The results showed that mobk13 and lsm12b were more suitable than actb1 and actb2 with respect to zebrafish early development. We recommend mobk13 and lsm12b as new optimal reference genes for zebrafish qRT-PCR analysis during embryogenesis and early larval stages.

  8. e

    Post-processed and normalized data sets for the data processing, analysis,...

    • b2find.eudat.eu
    Updated Jan 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Post-processed and normalized data sets for the data processing, analysis, and evaluation methods for co-design of coreless filament-wound structures - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/ab86210e-83e9-548a-b4e4-f5a9f72d1593
    Explore at:
    Dataset updated
    Jan 25, 2025
    Description

    Post-processed and normalized data sets for specimens S2-0, S2-1, S2-2, S2-4, S2-8 and S2-9, used in Figure 14 of the publication: "Data processing, analysis, and evaluation methods for co-design of coreless filament-wound building systems", in the Journal of Computational Design and Engineering. The data allows the comparison of different geometrical, fabrication and structural parameters per segment of each specimen. The raw data was obtained during the robotic fabrication and mechanical testing of specimens S1, S2 and S3 for the publication "Computational co-design framework for coreless wound fibre-polymer composite structures. Journal of Computational Design and Engineering 9(2), 310-32", and the complete raw data is published in the data set "Object model data sets of the case study specimens for the computational co-design framework for coreless wound fibre-polymer composite structures (V1)". To extend the research, 6 specimens of the series S2 were chosen for further postprocessing. A representative number per segment was calculated for each data set. The fabrication data, which originally is produced per layer wound, is either accumulated or averaged for the total of layers in one segment. While for the geometrical or structural data, the average or maximum number of all bar elements in one segment was chosen. These decisions were taken to find representative values based on the experience of the researchers, and it is described in the data set. Finally, by normalizing all data with respect to the 6 specimens and all segments, the data can be analyzed in the same format, making compatible the comparison of geometrical, structural and fabrication data to find interrelations and possible reasons for the failure of the specimens during the mechanical test.

  9. f

    Table_3_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...

    • frontiersin.figshare.com
    docx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Lytal; Di Ran; Lingling An (2023). Table_3_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00041.s004
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Nicholas Lytal; Di Ran; Lingling An
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.

  10. Dataset supporting: Normalizing and denoising protein expression data from...

    • nih.figshare.com
    • figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew P. Mulé; Andrew J. Martins; John Tsang (2023). Dataset supporting: Normalizing and denoising protein expression data from droplet-based single cell profiling [Dataset]. http://doi.org/10.35092/yhjc.13370915.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Matthew P. Mulé; Andrew J. Martins; John Tsang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data for reproducing analysis in the manuscript:Normalizing and denoising protein expression data from droplet-based single cell profilinglink to manuscript: https://www.biorxiv.org/content/10.1101/2020.02.24.963603v1

    Data deposited here are for the purposes of reproducing the analysis results and figures reported in the manuscript above. These data are all publicly available downloaded and converted to R datasets prior to Dec 4, 2020. For a full description of all the data included in this repository and instructions for reproducing all analysis results and figures, please see the repository: https://github.com/niaid/dsb_manuscript.

    For usage of the dsb R package for normalizing CITE-seq data please see the repository: https://github.com/niaid/dsb

    If you use the dsb R package in your work please cite:Mulè MP, Martins AJ, Tsang JS. Normalizing and denoising protein expression data from droplet-based single cell profiling. bioRxiv. 2020;2020.02.24.963603.

    General contact: John Tsang (john.tsang AT nih.gov)

    Questions about software/code: Matt Mulè (mulemp AT nih.gov)

  11. f

    Table_1_Normalization Methods on Single-Cell RNA-seq Data: An Empirical...

    • frontiersin.figshare.com
    docx
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Lytal; Di Ran; Lingling An (2023). Table_1_Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey.docx [Dataset]. http://doi.org/10.3389/fgene.2020.00041.s002
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Nicholas Lytal; Di Ran; Lingling An
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is vital to single-cell sequencing, addressing limitations presented by low input material and various forms of bias or noise present in the sequencing process. Several such normalization methods exist, some of which rely on spike-in genes, molecules added in known quantities to serve as a basis for a normalization model. Depending on available information and the type of data, some methods may express certain advantages over others. We compare the effectiveness of seven available normalization methods designed specifically for single-cell sequencing using two real data sets containing spike-in genes and one simulation study. Additionally, we test those methods not dependent on spike-in genes using a real data set with three distinct cell-cycle states and a real data set under the 10X Genomics GemCode platform with multiple cell types represented. We demonstrate the differences in effectiveness for the featured methods using visualization and classification assessment and conclude which methods are preferable for normalizing a certain type of data for further downstream analysis, such as classification or differential analysis. The comparison in computational time for all methods is addressed as well.

  12. q

    REMNet Tutorial, R Part 5: Normalizing Microbiome Data in R 5.2.19

    • qubeshub.org
    Updated Aug 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jessica Joyner (2019). REMNet Tutorial, R Part 5: Normalizing Microbiome Data in R 5.2.19 [Dataset]. http://doi.org/10.25334/M13H-XT81
    Explore at:
    Dataset updated
    Aug 28, 2019
    Dataset provided by
    QUBES
    Authors
    Jessica Joyner
    Description

    Video on normalizing microbiome data from the Research Experiences in Microbiomes Network

  13. t

    Normalizing Flow Model - Dataset - LDM

    • service.tib.eu
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Normalizing Flow Model - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/normalizing-flow-model
    Explore at:
    Dataset updated
    Dec 3, 2024
    Description

    The dataset used in the paper is a normalizing flow model, which is a type of generative model. The model is trained to generate data distributions from a given data distribution. The dataset is used to evaluate the performance of the model.

  14. Z

    Data from: Adapting Phrase-based Machine Translation to Normalise Medical...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Collier, Nigel (2020). Adapting Phrase-based Machine Translation to Normalise Medical Terms in Social Media Messages [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_27354
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Collier, Nigel
    Limsopatham, Nut
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data and supplementary information for the paper entitled "Adapting Phrase-based Machine Translation to Normalise Medical Terms in Social Media Messages" to be published at EMNLP 2015: Conference on Empirical Methods in Natural Language Processing — September 17–21, 2015 — Lisboa, Portugal.

    ABSTRACT: Previous studies have shown that health reports in social media, such as DailyStrength and Twitter, have potential for monitoring health conditions (e.g. adverse drug reactions, infectious diseases) in particular communities. However, in order for a machine to understand and make inferences on these health conditions, the ability to recognise when laymen's terms refer to a particular medical concept (i.e. text normalisation) is required. To achieve this, we propose to adapt an existing phrase-based machine translation (MT) technique and a vector representation of words to map between a social media phrase and a medical concept. We evaluate our proposed approach using a collection of phrases from tweets related to adverse drug reactions. Our experimental results show that the combination of a phrase-based MT technique and the similarity between word vector representations outperforms the baselines that apply only either of them by up to 55%.

  15. o

    Datasets and Trained Models of the paper: The NFLikelihood: an unsupervised...

    • explore.openaire.eu
    • data.niaid.nih.gov
    • +1more
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Humberto Reyes-Gonzalez; Riccardo Torre (2023). Datasets and Trained Models of the paper: The NFLikelihood: an unsupervised DNNLikelihood from Normalizing Flows [Dataset]. http://doi.org/10.5281/zenodo.8349144
    Explore at:
    Dataset updated
    Sep 15, 2023
    Authors
    Humberto Reyes-Gonzalez; Riccardo Torre
    Description

    Training Data and Trained models corresponding to the publication: 'The NFLikelihood: an unsupervised DNNLikelihood from Normalizing Flows' (arXiv:2309.09743). The files cointain the relevant resources for 3 trained Likelihood functions: The Toy-Likelihood, the EW-Likelihood and the Flavor-Likelihood. In each corresponding directory, the training data is found in \data. The trained model and generated samples are found in \NFmodel. To reproduce the published results, git clone https://github.com/NF4HEP/NFLikelihoods, and plug in the provided resources in the corresponding directories of the code.

  16. C

    Municipal Building Energy Usage

    • data.wprdc.org
    • datadiscoverystudio.org
    • +3more
    csv, xlsx
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Pittsburgh (2024). Municipal Building Energy Usage [Dataset]. https://data.wprdc.org/dataset/municipal-building-energy-usage
    Explore at:
    xlsx, csvAvailable download formats
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    City of Pittsburgh
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This data set contains energy use data from 2009-2014 for 139 municipally operated buildings. Metrics include: Site & Source EUI, annual electricity, natural gas and district steam consumption, greenhouse gas emissions and energy cost. Weather-normalized data enable building performance comparisons over time, despite unusual weather events.

  17. d

    2018 LiDAR - Normalized Digital Surface Model - Tiles

    • catalog.data.gov
    • opendata.dc.gov
    • +2more
    Updated Feb 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    D.C. Office of the Chief Technology Officer (2025). 2018 LiDAR - Normalized Digital Surface Model - Tiles [Dataset]. https://catalog.data.gov/dataset/2018-lidar-normalized-digital-surface-model-tiles
    Explore at:
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    D.C. Office of the Chief Technology Officer
    Description

    Normalised Digital Surface Model - 1m resolution. The dataset contains the Normalised Digital Surface Model for the Washington Area.Voids exist in the data due to data redaction conducted under the guidance of the United States Secret Service. All lidar data returns and collected data were removed from the dataset based on the redaction footprint shapefile generated in 2017.

  18. Additional file 4: of DBNorm: normalizing high-density oligonucleotide...

    • springernature.figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qinxue Meng; Daniel Catchpoole; David Skillicorn; Paul Kennedy (2023). Additional file 4: of DBNorm: normalizing high-density oligonucleotide microarray data based on distributions [Dataset]. http://doi.org/10.6084/m9.figshare.5648956.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Qinxue Meng; Daniel Catchpoole; David Skillicorn; Paul Kennedy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DBNorm installation. Describes how to install DBNorm via devtools in R. (TXT 4Â kb)

  19. United States MCT Inflation: Normalized

    • ceicdata.com
    Updated Mar 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). United States MCT Inflation: Normalized [Dataset]. https://www.ceicdata.com/en/united-states/multivariate-core-trend-inflation/mct-inflation-normalized
    Explore at:
    Dataset updated
    Mar 15, 2025
    Dataset provided by
    CEIC Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Feb 1, 2024 - Jan 1, 2025
    Area covered
    United States
    Description

    United States MCT Inflation: Normalized data was reported at 1.190 % in Mar 2025. This records an increase from the previous number of 1.080 % for Feb 2025. United States MCT Inflation: Normalized data is updated monthly, averaging 0.600 % from Jan 1960 (Median) to Mar 2025, with 783 observations. The data reached an all-time high of 9.310 % in Jul 1974 and a record low of -1.050 % in Aug 1962. United States MCT Inflation: Normalized data remains active status in CEIC and is reported by Federal Reserve Bank of New York. The data is categorized under Global Database’s United States – Table US.I027: Multivariate Core Trend Inflation.

  20. f

    Comparisons of estimates of normalizing parameter.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bin Wang (2023). Comparisons of estimates of normalizing parameter. [Dataset]. http://doi.org/10.1371/journal.pone.0230594.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Bin Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparisons of estimates of normalizing parameter.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Viesturs Jūlijs Lasmanis; Normunds Grūzītis (2023). LVMED: Dataset of Latvian text normalisation samples for the medical domain [Dataset]. https://repository.clarin.lv/repository/xmlui/handle/20.500.12574/85

Data from: LVMED: Dataset of Latvian text normalisation samples for the medical domain

Related Article
Explore at:
Dataset updated
May 30, 2023
Authors
Viesturs Jūlijs Lasmanis; Normunds Grūzītis
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

The CSV dataset contains sentence pairs for a text-to-text transformation task: given a sentence that contains 0..n abbreviations, rewrite (normalize) the sentence in full words (word forms).

Training dataset: 64,665 sentence pairs Validation dataset: 7,185 sentence pairs. Testing dataset: 7,984 sentence pairs.

All sentences are extracted from a public web corpus (https://korpuss.lv/id/Tīmeklis2020) and contain at least one medical term.

Search
Clear search
Close search
Google apps
Main menu