8 datasets found
  1. f

    Data_Sheet_1_The impact of transitive annotation on the training of...

    • figshare.com
    pdf
    Updated Jan 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harihara Subrahmaniam Muralidharan; Noam Y. Fox; Mihai Pop (2024). Data_Sheet_1_The impact of transitive annotation on the training of taxonomic classifiers.PDF [Dataset]. http://doi.org/10.3389/fmicb.2023.1240957.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jan 3, 2024
    Dataset provided by
    Frontiers
    Authors
    Harihara Subrahmaniam Muralidharan; Noam Y. Fox; Mihai Pop
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionA common task in the analysis of microbial communities involves assigning taxonomic labels to the sequences derived from organisms found in the communities. Frequently, such labels are assigned using machine learning algorithms that are trained to recognize individual taxonomic groups based on training data sets that comprise sequences with known taxonomic labels. Ideally, the training data should rely on labels that are experimentally verified—formal taxonomic labels require knowledge of physical and biochemical properties of organisms that cannot be directly inferred from sequence alone. However, the labels associated with sequences in biological databases are most commonly computational predictions which themselves may rely on computationally-generated data—a process commonly referred to as “transitive annotation.”MethodsIn this manuscript we explore the implications of training a machine learning classifier (the Ribosomal Database Project’s Bayesian classifier in our case) on data that itself has been computationally generated. We generate new training examples based on 16S rRNA data from a metagenomic experiment, and evaluate the extent to which the taxonomic labels predicted by the classifier change after re-training.ResultsWe demonstrate that even a few computationally-generated training data points can significantly skew the output of the classifier to the point where entire regions of the taxonomic space can be disturbed.Discussion and conclusionsWe conclude with a discussion of key factors that affect the resilience of classifiers to transitively-annotated training data, and propose best practices to avoid the artifacts described in our paper.

  2. Fight Lead Poisoning with a Healthy Diet

    • data.virginia.gov
    pdf
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency (2024). Fight Lead Poisoning with a Healthy Diet [Dataset]. https://data.virginia.gov/dataset/fight-lead-poisoning-with-a-healthy-diet
    Explore at:
    pdf(314493), pdf(165456)Available download formats
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Authors
    U.S. Environmental Protection Agency
    Description

    Lead is a poisonous metal that our bodies cannot use. Lead poisoning can cause learning, hearing, and behavioral problems, and can harm your child’s brain, kidneys, and other organs. Lead in the body stops good minerals such as iron and calcium from working right. Some of these effects may be permanent.

  3. f

    Data from: S1 Dataset -

    • plos.figshare.com
    xlsx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lufeng Hu; Huaizhong Li; Zhennao Cai; Feiyan Lin; Guangliang Hong; Huiling Chen; Zhongqiu Lu (2023). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0186427.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Lufeng Hu; Huaizhong Li; Zhennao Cai; Feiyan Lin; Guangliang Hong; Huiling Chen; Zhongqiu Lu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The file contains the first time tests of coagulation, liver, kidney indices, deceased group (1) and survival group (2). (XLSX)

  4. Data from: Puerarin protects against damage to spatial learning and memory...

    • scielo.figshare.com
    jpeg
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S.Q. Cui; Q. Wang; Y. Zheng; B. Xiao; H.W. Sun; X.L. Gu; Y.C. Zhang; C.H. Fu; P.X. Dong; X.M. Wang (2023). Puerarin protects against damage to spatial learning and memory ability in mice with chronic alcohol poisoning [Dataset]. http://doi.org/10.6084/m9.figshare.7898927.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    S.Q. Cui; Q. Wang; Y. Zheng; B. Xiao; H.W. Sun; X.L. Gu; Y.C. Zhang; C.H. Fu; P.X. Dong; X.M. Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We evaluated the effect of puerarin on spatial learning and memory ability of mice with chronic alcohol poisoning. A total of 30 male C57BL/6 mice were randomly divided into model, puerarin, and control groups (n=10 each). The model group received 60% (v/v) ethanol by intragastric administration followed by intraperitoneal injection of normal saline 30 min later. The puerarin group received intragastric 60% ethanol followed by intraperitoneal puerarin 30 min later, and the control group received intragastric saline followed by intraperitoneal saline. Six weeks after treatment, the Morris water maze and Tru Scan behavioral tests and immunofluorescence staining of cerebral cortex and hippocampal neurons (by Neu-N) and microglia (by Ib1) were conducted. Glutamic acid (Glu) and gamma amino butyric acid (GABA) in the cortex and hippocampus were assayed by high-performance liquid chromatography (HPLC), and tumor necrosis factor (TNF)-α and interleukin (IL)-1β were determined by ELISA. Compared with mice in the control group, escape latency and distance were prolonged, and spontaneous movement distance was shortened (P

  5. d

    Year, State-wise number of unnatural elephant deaths by cause

    • dataful.in
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataful (Factly) (2025). Year, State-wise number of unnatural elephant deaths by cause [Dataset]. https://dataful.in/datasets/20250
    Explore at:
    application/x-parquet, xlsx, csvAvailable download formats
    Dataset updated
    Mar 18, 2025
    Dataset authored and provided by
    Dataful (Factly)
    License

    https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions

    Area covered
    States of India
    Variables measured
    Number of deaths
    Description

    The dataset consists of the annual number of deaths of elephants due to unnatural deaths by causes like poaching, electrocution, train accidents, and poisoning across states.

  6. f

    Statistics concerning different logistic regression models’ abilities to...

    • plos.figshare.com
    xls
    Updated Jul 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Howard-Azzeh; David L. Pearl; Terri L. O’Sullivan; Olaf Berke (2023). Statistics concerning different logistic regression models’ abilities to predict opioid poisoning calls to the APCCa in US dogs (2005–2014). [Dataset]. http://doi.org/10.1371/journal.pone.0288339.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 10, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Mohammad Howard-Azzeh; David L. Pearl; Terri L. O’Sullivan; Olaf Berke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Statistics concerning different logistic regression models’ abilities to predict opioid poisoning calls to the APCCa in US dogs (2005–2014).

  7. f

    Number of coefficients in models fitted using various logistic regression...

    • plos.figshare.com
    xls
    Updated Jul 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Number of coefficients in models fitted using various logistic regression models examining the associations between dog-level variables and a poisoning call to the APCCa being related to cannabinoids or opioids (2005–2014). [Dataset]. https://plos.figshare.com/articles/dataset/Number_of_coefficients_in_models_fitted_using_various_logistic_regression_models_examining_the_associations_between_dog-level_variables_and_a_poisoning_call_to_the_APCC_sup_a_sup_being_related_to_cannabinoids_or_opioids_2005_2014_/23655732
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 10, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Mohammad Howard-Azzeh; David L. Pearl; Terri L. O’Sullivan; Olaf Berke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Number of coefficients in models fitted using various logistic regression models examining the associations between dog-level variables and a poisoning call to the APCCa being related to cannabinoids or opioids (2005–2014).

  8. f

    The impact of label noise on the performance of ranking models.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahabeddin Sotudian; Ruidi Chen; Ioannis Ch. Paschalidis (2023). The impact of label noise on the performance of ranking models. [Dataset]. http://doi.org/10.1371/journal.pone.0283574.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Shahabeddin Sotudian; Ruidi Chen; Ioannis Ch. Paschalidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The impact of label noise on the performance of ranking models.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Harihara Subrahmaniam Muralidharan; Noam Y. Fox; Mihai Pop (2024). Data_Sheet_1_The impact of transitive annotation on the training of taxonomic classifiers.PDF [Dataset]. http://doi.org/10.3389/fmicb.2023.1240957.s001

Data_Sheet_1_The impact of transitive annotation on the training of taxonomic classifiers.PDF

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
Jan 3, 2024
Dataset provided by
Frontiers
Authors
Harihara Subrahmaniam Muralidharan; Noam Y. Fox; Mihai Pop
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

IntroductionA common task in the analysis of microbial communities involves assigning taxonomic labels to the sequences derived from organisms found in the communities. Frequently, such labels are assigned using machine learning algorithms that are trained to recognize individual taxonomic groups based on training data sets that comprise sequences with known taxonomic labels. Ideally, the training data should rely on labels that are experimentally verified—formal taxonomic labels require knowledge of physical and biochemical properties of organisms that cannot be directly inferred from sequence alone. However, the labels associated with sequences in biological databases are most commonly computational predictions which themselves may rely on computationally-generated data—a process commonly referred to as “transitive annotation.”MethodsIn this manuscript we explore the implications of training a machine learning classifier (the Ribosomal Database Project’s Bayesian classifier in our case) on data that itself has been computationally generated. We generate new training examples based on 16S rRNA data from a metagenomic experiment, and evaluate the extent to which the taxonomic labels predicted by the classifier change after re-training.ResultsWe demonstrate that even a few computationally-generated training data points can significantly skew the output of the classifier to the point where entire regions of the taxonomic space can be disturbed.Discussion and conclusionsWe conclude with a discussion of key factors that affect the resilience of classifiers to transitively-annotated training data, and propose best practices to avoid the artifacts described in our paper.

Search
Clear search
Close search
Google apps
Main menu