9 datasets found
  1. f

    Table_1_Structured data vs. unstructured data in machine learning prediction...

    • frontiersin.figshare.com
    • figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danielle Hopkins; Debra J. Rickwood; David J. Hallford; Clare Watsford (2023). Table_1_Structured data vs. unstructured data in machine learning prediction models for suicidal behaviors: A systematic review and meta-analysis.xlsx [Dataset]. http://doi.org/10.3389/fdgth.2022.945006.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Danielle Hopkins; Debra J. Rickwood; David J. Hallford; Clare Watsford
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Suicide remains a leading cause of preventable death worldwide, despite advances in research and decreases in mental health stigma through government health campaigns. Machine learning (ML), a type of artificial intelligence (AI), is the use of algorithms to simulate and imitate human cognition. Given the lack of improvement in clinician-based suicide prediction over time, advancements in technology have allowed for novel approaches to predicting suicide risk. This systematic review and meta-analysis aimed to synthesize current research regarding data sources in ML prediction of suicide risk, incorporating and comparing outcomes between structured data (human interpretable such as psychometric instruments) and unstructured data (only machine interpretable such as electronic health records). Online databases and gray literature were searched for studies relating to ML and suicide risk prediction. There were 31 eligible studies. The outcome for all studies combined was AUC = 0.860, structured data showed AUC = 0.873, and unstructured data was calculated at AUC = 0.866. There was substantial heterogeneity between the studies, the sources of which were unable to be defined. The studies showed good accuracy levels in the prediction of suicide risk behavior overall. Structured data and unstructured data also showed similar outcome accuracy according to meta-analysis, despite different volumes and types of input data.

  2. Amount of data created, consumed, and stored 2010-2023, with forecasts to...

    • statista.com
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Nov 21, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 2024
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching 149 zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than 394 zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just two percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of 19.2 percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached 6.7 zettabytes.

  3. Strategies for Controlling Non-Transmissible Infection Outbreaks Using a...

    • plos.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Penelope A. Hancock; Yasmin Rehman; Ian M. Hall; Obaghe Edeghere; Leon Danon; Thomas A. House; Matthew J. Keeling (2023). Strategies for Controlling Non-Transmissible Infection Outbreaks Using a Large Human Movement Data Set [Dataset]. http://doi.org/10.1371/journal.pcbi.1003809
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Penelope A. Hancock; Yasmin Rehman; Ian M. Hall; Obaghe Edeghere; Leon Danon; Thomas A. House; Matthew J. Keeling
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Prediction and control of the spread of infectious disease in human populations benefits greatly from our growing capacity to quantify human movement behavior. Here we develop a mathematical model for non-transmissible infections contracted from a localized environmental source, informed by a detailed description of movement patterns of the population of Great Britain. The model is applied to outbreaks of Legionnaires' disease, a potentially life-threatening form of pneumonia caused by the bacteria Legionella pneumophilia. We use case-report data from three recent outbreaks that have occurred in Great Britain where the source has already been identified by public health agencies. We first demonstrate that the amount of individual-level heterogeneity incorporated in the movement data greatly influences our ability to predict the source location. The most accurate predictions were obtained using reported travel histories to describe movements of infected individuals, but using detailed simulation models to estimate movement patterns offers an effective fast alternative. Secondly, once the source is identified, we show that our model can be used to accurately determine the population likely to have been exposed to the pathogen, and hence predict the residential locations of infected individuals. The results give rise to an effective control strategy that can be implemented rapidly in response to an outbreak.

  4. f

    Results of Gene Families (gf) and Biological Processes (GO-BP) enrichment.

    • figshare.com
    • plos.figshare.com
    xlsx
    Updated Oct 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ilaria Granata; Lucia Maddalena; Mario Manzo; Mario Rosario Guarracino; Maurizio Giordano (2024). Results of Gene Families (gf) and Biological Processes (GO-BP) enrichment. [Dataset]. http://doi.org/10.1371/journal.pcbi.1012076.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 9, 2024
    Dataset provided by
    PLOS Computational Biology
    Authors
    Ilaria Granata; Lucia Maddalena; Mario Manzo; Mario Rosario Guarracino; Maurizio Giordano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The first was obtained by downloading gene families annotation from https://www.genenames.org/download/statistics-and-files/, applying a hypergeometric test (R version 4.1.2) using all the genes in the DepMap matrix as background. The GO-BP enrichment, instead, was performed by using DAVID Bioinformatics tool (https://david.ncifcrf.gov/tools.jsp). Each sheet is named according to the content “tissue_enrichment_class”. The columns’ content is detailed in each sheet. (XLSX)

  5. f

    Description of all features used in this work.

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damian Dailisan; Marissa Liponhay; Christian Alis; Christopher Monterola (2023). Description of all features used in this work. [Dataset]. http://doi.org/10.1371/journal.pone.0265771.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Damian Dailisan; Marissa Liponhay; Christian Alis; Christopher Monterola
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description of all features used in this work.

  6. Data and code from: Learning a deep language model for microbiomes: The...

    • zenodo.org
    • search.dataone.org
    • +2more
    bin, zip
    Updated Jun 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Quintin Pope; Quintin Pope; Rohan Varma; Christine Tataru; Maude David; Xiaoli Fern; Rohan Varma; Christine Tataru; Maude David; Xiaoli Fern (2024). Data and code from: Learning a deep language model for microbiomes: The power of large scale unlabeled microbiome data [Dataset]. http://doi.org/10.5061/dryad.tb2rbp08p
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jun 10, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Quintin Pope; Quintin Pope; Rohan Varma; Christine Tataru; Maude David; Xiaoli Fern; Rohan Varma; Christine Tataru; Maude David; Xiaoli Fern
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Measurement technique
    <p>No additional raw data was collected for this project. All inputs are available publicly. American Gut Project, Halfvarson, and Schirmer raw data are available from the NCBI database (accession numbers PRJEB11419, PRJEB18471, and PRJNA398089, respectively). We used the curated data produced by <a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007859">Tataru and David, 2020</a>.</p>
    Description

    We use open source human gut microbiome data to learn a microbial "language" model by adapting techniques from Natural Language Processing (NLP). Our microbial "language" model is trained in a self-supervised fashion (i.e., without additional external labels) to capture the interactions among different microbial species and the common compositional patterns in microbial communities. The learned model produces contextualized taxa representations that allow a single bacteria species to be represented differently according to the specific microbial environment it appears in. The model further provides a sample representation by collectively interpreting different bacteria species in the sample and their interactions as a whole. We show that, compared to baseline representations, our sample representation consistently leads to improved performance for multiple prediction tasks including predicting Irritable Bowel Disease (IBD) and diet patterns. Coupled with a simple ensemble strategy, it produces a highly robust IBD prediction model that generalizes well to microbiome data independently collected from different populations with substantial distribution shift.

    We visualize the contextualized taxa representations and find that they exhibit meaningful phylum-level structure, despite never exposing the model to such a signal. Finally, we apply an interpretation method to highlight bacterial species that are particularly influential in driving our model's predictions for IBD.

  7. f

    Data_Sheet_2_Applications of Machine Learning in Human Microbiome Studies: A...

    • figshare.com
    • frontiersin.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Judith Marcos-Zambrano; Kanita Karaduzovic-Hadziabdic; Tatjana Loncar Turukalo; Piotr Przymus; Vladimir Trajkovik; Oliver Aasmets; Magali Berland; Aleksandra Gruca; Jasminka Hasic; Karel Hron; Thomas Klammsteiner; Mikhail Kolev; Leo Lahti; Marta B. Lopes; Victor Moreno; Irina Naskinova; Elin Org; Inês Paciência; Georgios Papoutsoglou; Rajesh Shigdel; Blaz Stres; Baiba Vilne; Malik Yousef; Eftim Zdravevski; Ioannis Tsamardinos; Enrique Carrillo de Santa Pau; Marcus J. Claesson; Isabel Moreno-Indias; Jaak Truu (2023). Data_Sheet_2_Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment.docx [Dataset]. http://doi.org/10.3389/fmicb.2021.634511.s002
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Laura Judith Marcos-Zambrano; Kanita Karaduzovic-Hadziabdic; Tatjana Loncar Turukalo; Piotr Przymus; Vladimir Trajkovik; Oliver Aasmets; Magali Berland; Aleksandra Gruca; Jasminka Hasic; Karel Hron; Thomas Klammsteiner; Mikhail Kolev; Leo Lahti; Marta B. Lopes; Victor Moreno; Irina Naskinova; Elin Org; Inês Paciência; Georgios Papoutsoglou; Rajesh Shigdel; Blaz Stres; Baiba Vilne; Malik Yousef; Eftim Zdravevski; Ioannis Tsamardinos; Enrique Carrillo de Santa Pau; Marcus J. Claesson; Isabel Moreno-Indias; Jaak Truu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.

  8. HLA-class I alleles and supertypes.

    • plos.figshare.com
    xlsx
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joana Pissarra; Franck Dorkeld; Etienne Loire; Vincent Bonhomme; Denis Sereno; Jean-Loup Lemesre; Philippe Holzmuller (2023). HLA-class I alleles and supertypes. [Dataset]. http://doi.org/10.1371/journal.pone.0273494.s006
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Joana Pissarra; Franck Dorkeld; Etienne Loire; Vincent Bonhomme; Denis Sereno; Jean-Loup Lemesre; Philippe Holzmuller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supertypes according to Sydney J. et al 2008 BMC Immunology 9:1. (XLSX)

  9. Example proteins and validated epitopes present in the IEDB 3.0 database.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joana Pissarra; Franck Dorkeld; Etienne Loire; Vincent Bonhomme; Denis Sereno; Jean-Loup Lemesre; Philippe Holzmuller (2023). Example proteins and validated epitopes present in the IEDB 3.0 database. [Dataset]. http://doi.org/10.1371/journal.pone.0273494.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Joana Pissarra; Franck Dorkeld; Etienne Loire; Vincent Bonhomme; Denis Sereno; Jean-Loup Lemesre; Philippe Holzmuller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example proteins and validated epitopes present in the IEDB 3.0 database.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Danielle Hopkins; Debra J. Rickwood; David J. Hallford; Clare Watsford (2023). Table_1_Structured data vs. unstructured data in machine learning prediction models for suicidal behaviors: A systematic review and meta-analysis.xlsx [Dataset]. http://doi.org/10.3389/fdgth.2022.945006.s001

Table_1_Structured data vs. unstructured data in machine learning prediction models for suicidal behaviors: A systematic review and meta-analysis.xlsx

Related Article
Explore at:
xlsxAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Danielle Hopkins; Debra J. Rickwood; David J. Hallford; Clare Watsford
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Suicide remains a leading cause of preventable death worldwide, despite advances in research and decreases in mental health stigma through government health campaigns. Machine learning (ML), a type of artificial intelligence (AI), is the use of algorithms to simulate and imitate human cognition. Given the lack of improvement in clinician-based suicide prediction over time, advancements in technology have allowed for novel approaches to predicting suicide risk. This systematic review and meta-analysis aimed to synthesize current research regarding data sources in ML prediction of suicide risk, incorporating and comparing outcomes between structured data (human interpretable such as psychometric instruments) and unstructured data (only machine interpretable such as electronic health records). Online databases and gray literature were searched for studies relating to ML and suicide risk prediction. There were 31 eligible studies. The outcome for all studies combined was AUC = 0.860, structured data showed AUC = 0.873, and unstructured data was calculated at AUC = 0.866. There was substantial heterogeneity between the studies, the sources of which were unable to be defined. The studies showed good accuracy levels in the prediction of suicide risk behavior overall. Structured data and unstructured data also showed similar outcome accuracy according to meta-analysis, despite different volumes and types of input data.

Search
Clear search
Close search
Google apps
Main menu