9 datasets found

f
Table_1_Structured data vs. unstructured data in machine learning prediction...
frontiersin.figshare.com
figshare.com
xlsx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Danielle Hopkins; Debra J. Rickwood; David J. Hallford; Clare Watsford (2023). Table_1_Structured data vs. unstructured data in machine learning prediction models for suicidal behaviors: A systematic review and meta-analysis.xlsx [Dataset]. http://doi.org/10.3389/fdgth.2022.945006.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fdgth.2022.945006.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Danielle Hopkins; Debra J. Rickwood; David J. Hallford; Clare Watsford
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Suicide remains a leading cause of preventable death worldwide, despite advances in research and decreases in mental health stigma through government health campaigns. Machine learning (ML), a type of artificial intelligence (AI), is the use of algorithms to simulate and imitate human cognition. Given the lack of improvement in clinician-based suicide prediction over time, advancements in technology have allowed for novel approaches to predicting suicide risk. This systematic review and meta-analysis aimed to synthesize current research regarding data sources in ML prediction of suicide risk, incorporating and comparing outcomes between structured data (human interpretable such as psychometric instruments) and unstructured data (only machine interpretable such as electronic health records). Online databases and gray literature were searched for studies relating to ML and suicide risk prediction. There were 31 eligible studies. The outcome for all studies combined was AUC = 0.860, structured data showed AUC = 0.873, and unstructured data was calculated at AUC = 0.866. There was substantial heterogeneity between the studies, the sources of which were unable to be defined. The studies showed good accuracy levels in the prediction of suicide risk behavior overall. Structured data and unstructured data also showed similar outcome accuracy according to meta-analysis, despite different volumes and types of input data.
Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Nov 21, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching 149 zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than 394 zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just two percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of 19.2 percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached 6.7 zettabytes.
Strategies for Controlling Non-Transmissible Infection Outbreaks Using a...
plos.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Penelope A. Hancock; Yasmin Rehman; Ian M. Hall; Obaghe Edeghere; Leon Danon; Thomas A. House; Matthew J. Keeling (2023). Strategies for Controlling Non-Transmissible Infection Outbreaks Using a Large Human Movement Data Set [Dataset]. http://doi.org/10.1371/journal.pcbi.1003809
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1003809
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Penelope A. Hancock; Yasmin Rehman; Ian M. Hall; Obaghe Edeghere; Leon Danon; Thomas A. House; Matthew J. Keeling
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Prediction and control of the spread of infectious disease in human populations benefits greatly from our growing capacity to quantify human movement behavior. Here we develop a mathematical model for non-transmissible infections contracted from a localized environmental source, informed by a detailed description of movement patterns of the population of Great Britain. The model is applied to outbreaks of Legionnaires' disease, a potentially life-threatening form of pneumonia caused by the bacteria Legionella pneumophilia. We use case-report data from three recent outbreaks that have occurred in Great Britain where the source has already been identified by public health agencies. We first demonstrate that the amount of individual-level heterogeneity incorporated in the movement data greatly influences our ability to predict the source location. The most accurate predictions were obtained using reported travel histories to describe movements of infected individuals, but using detailed simulation models to estimate movement patterns offers an effective fast alternative. Secondly, once the source is identified, we show that our model can be used to accurately determine the population likely to have been exposed to the pathogen, and hence predict the residential locations of infected individuals. The results give rise to an effective control strategy that can be implemented rapidly in response to an outbreak.
f
Results of Gene Families (gf) and Biological Processes (GO-BP) enrichment.
figshare.com
plos.figshare.com
xlsx
Updated Oct 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ilaria Granata; Lucia Maddalena; Mario Manzo; Mario Rosario Guarracino; Maurizio Giordano (2024). Results of Gene Families (gf) and Biological Processes (GO-BP) enrichment. [Dataset]. http://doi.org/10.1371/journal.pcbi.1012076.s004
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1012076.s004
Dataset updated
Oct 9, 2024
Dataset provided by
PLOS Computational Biology
Authors
Ilaria Granata; Lucia Maddalena; Mario Manzo; Mario Rosario Guarracino; Maurizio Giordano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The first was obtained by downloading gene families annotation from https://www.genenames.org/download/statistics-and-files/, applying a hypergeometric test (R version 4.1.2) using all the genes in the DepMap matrix as background. The GO-BP enrichment, instead, was performed by using DAVID Bioinformatics tool (https://david.ncifcrf.gov/tools.jsp). Each sheet is named according to the content “tissue_enrichment_class”. The columns’ content is detailed in each sheet. (XLSX)
f
Description of all features used in this work.
plos.figshare.com
xls
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Damian Dailisan; Marissa Liponhay; Christian Alis; Christopher Monterola (2023). Description of all features used in this work. [Dataset]. http://doi.org/10.1371/journal.pone.0265771.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0265771.t001
Dataset updated
Jun 16, 2023
Dataset provided by
PLOS ONE
Authors
Damian Dailisan; Marissa Liponhay; Christian Alis; Christopher Monterola
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description of all features used in this work.
Data and code from: Learning a deep language model for microbiomes: The...
zenodo.org
search.dataone.org
+2more
bin, zip
Updated Jun 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Quintin Pope; Quintin Pope; Rohan Varma; Christine Tataru; Maude David; Xiaoli Fern; Rohan Varma; Christine Tataru; Maude David; Xiaoli Fern (2024). Data and code from: Learning a deep language model for microbiomes: The power of large scale unlabeled microbiome data [Dataset]. http://doi.org/10.5061/dryad.tb2rbp08p
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.tb2rbp08p
Dataset updated
Jun 10, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Quintin Pope; Quintin Pope; Rohan Varma; Christine Tataru; Maude David; Xiaoli Fern; Rohan Varma; Christine Tataru; Maude David; Xiaoli Fern
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Measurement technique
<p>No additional raw data was collected for this project. All inputs are available publicly. American Gut Project, Halfvarson, and Schirmer raw data are available from the NCBI database (accession numbers PRJEB11419, PRJEB18471, and PRJNA398089, respectively). We used the curated data produced by <a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007859">Tataru and David, 2020</a>.</p>
Description
We use open source human gut microbiome data to learn a microbial "language" model by adapting techniques from Natural Language Processing (NLP). Our microbial "language" model is trained in a self-supervised fashion (i.e., without additional external labels) to capture the interactions among different microbial species and the common compositional patterns in microbial communities. The learned model produces contextualized taxa representations that allow a single bacteria species to be represented differently according to the specific microbial environment it appears in. The model further provides a sample representation by collectively interpreting different bacteria species in the sample and their interactions as a whole. We show that, compared to baseline representations, our sample representation consistently leads to improved performance for multiple prediction tasks including predicting Irritable Bowel Disease (IBD) and diet patterns. Coupled with a simple ensemble strategy, it produces a highly robust IBD prediction model that generalizes well to microbiome data independently collected from different populations with substantial distribution shift.

We visualize the contextualized taxa representations and find that they exhibit meaningful phylum-level structure, despite never exposing the model to such a signal. Finally, we apply an interpretation method to highlight bacterial species that are particularly influential in driving our model's predictions for IBD.
f
Data_Sheet_2_Applications of Machine Learning in Human Microbiome Studies: A...
figshare.com
frontiersin.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Judith Marcos-Zambrano; Kanita Karaduzovic-Hadziabdic; Tatjana Loncar Turukalo; Piotr Przymus; Vladimir Trajkovik; Oliver Aasmets; Magali Berland; Aleksandra Gruca; Jasminka Hasic; Karel Hron; Thomas Klammsteiner; Mikhail Kolev; Leo Lahti; Marta B. Lopes; Victor Moreno; Irina Naskinova; Elin Org; Inês Paciência; Georgios Papoutsoglou; Rajesh Shigdel; Blaz Stres; Baiba Vilne; Malik Yousef; Eftim Zdravevski; Ioannis Tsamardinos; Enrique Carrillo de Santa Pau; Marcus J. Claesson; Isabel Moreno-Indias; Jaak Truu (2023). Data_Sheet_2_Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment.docx [Dataset]. http://doi.org/10.3389/fmicb.2021.634511.s002
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fmicb.2021.634511.s002
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Laura Judith Marcos-Zambrano; Kanita Karaduzovic-Hadziabdic; Tatjana Loncar Turukalo; Piotr Przymus; Vladimir Trajkovik; Oliver Aasmets; Magali Berland; Aleksandra Gruca; Jasminka Hasic; Karel Hron; Thomas Klammsteiner; Mikhail Kolev; Leo Lahti; Marta B. Lopes; Victor Moreno; Irina Naskinova; Elin Org; Inês Paciência; Georgios Papoutsoglou; Rajesh Shigdel; Blaz Stres; Baiba Vilne; Malik Yousef; Eftim Zdravevski; Ioannis Tsamardinos; Enrique Carrillo de Santa Pau; Marcus J. Claesson; Isabel Moreno-Indias; Jaak Truu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.
HLA-class I alleles and supertypes.
plos.figshare.com
xlsx
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joana Pissarra; Franck Dorkeld; Etienne Loire; Vincent Bonhomme; Denis Sereno; Jean-Loup Lemesre; Philippe Holzmuller (2023). HLA-class I alleles and supertypes. [Dataset]. http://doi.org/10.1371/journal.pone.0273494.s006
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0273494.s006
Dataset updated
Jun 16, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Joana Pissarra; Franck Dorkeld; Etienne Loire; Vincent Bonhomme; Denis Sereno; Jean-Loup Lemesre; Philippe Holzmuller
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supertypes according to Sydney J. et al 2008 BMC Immunology 9:1. (XLSX)
Example proteins and validated epitopes present in the IEDB 3.0 database.
plos.figshare.com
figshare.com
xls
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joana Pissarra; Franck Dorkeld; Etienne Loire; Vincent Bonhomme; Denis Sereno; Jean-Loup Lemesre; Philippe Holzmuller (2023). Example proteins and validated epitopes present in the IEDB 3.0 database. [Dataset]. http://doi.org/10.1371/journal.pone.0273494.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0273494.t001
Dataset updated
Jun 13, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Joana Pissarra; Franck Dorkeld; Etienne Loire; Vincent Bonhomme; Denis Sereno; Jean-Loup Lemesre; Philippe Holzmuller
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example proteins and validated epitopes present in the IEDB 3.0 database.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Danielle Hopkins; Debra J. Rickwood; David J. Hallford; Clare Watsford (2023). Table_1_Structured data vs. unstructured data in machine learning prediction models for suicidal behaviors: A systematic review and meta-analysis.xlsx [Dataset]. http://doi.org/10.3389/fdgth.2022.945006.s001

Table_1_Structured data vs. unstructured data in machine learning prediction models for suicidal behaviors: A systematic review and meta-analysis.xlsx

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.3389/fdgth.2022.945006.s001

Dataset updated

Jun 1, 2023

Dataset provided by

Frontiers

Authors

Danielle Hopkins; Debra J. Rickwood; David J. Hallford; Clare Watsford

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Suicide remains a leading cause of preventable death worldwide, despite advances in research and decreases in mental health stigma through government health campaigns. Machine learning (ML), a type of artificial intelligence (AI), is the use of algorithms to simulate and imitate human cognition. Given the lack of improvement in clinician-based suicide prediction over time, advancements in technology have allowed for novel approaches to predicting suicide risk. This systematic review and meta-analysis aimed to synthesize current research regarding data sources in ML prediction of suicide risk, incorporating and comparing outcomes between structured data (human interpretable such as psychometric instruments) and unstructured data (only machine interpretable such as electronic health records). Online databases and gray literature were searched for studies relating to ML and suicide risk prediction. There were 31 eligible studies. The outcome for all studies combined was AUC = 0.860, structured data showed AUC = 0.873, and unstructured data was calculated at AUC = 0.866. There was substantial heterogeneity between the studies, the sources of which were unable to be defined. The studies showed good accuracy levels in the prediction of suicide risk behavior overall. Structured data and unstructured data also showed similar outcome accuracy according to meta-analysis, despite different volumes and types of input data.

Clear search

Close search

Google apps

Main menu

Table_1_Structured data vs. unstructured data in machine learning prediction...

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

Strategies for Controlling Non-Transmissible Infection Outbreaks Using a...

Results of Gene Families (gf) and Biological Processes (GO-BP) enrichment.

Description of all features used in this work.

Data and code from: Learning a deep language model for microbiomes: The...

Data_Sheet_2_Applications of Machine Learning in Human Microbiome Studies: A...

HLA-class I alleles and supertypes.

Example proteins and validated epitopes present in the IEDB 3.0 database.

Table_1_Structured data vs. unstructured data in machine learning prediction models for suicidal behaviors: A systematic review and meta-analysis.xlsx