9 datasets found
  1. f

    Gaussian Finder's cavity dataset in XML

    • figshare.com
    zip
    Updated Sep 28, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abel Gomes (2019). Gaussian Finder's cavity dataset in XML [Dataset]. http://doi.org/10.6084/m9.figshare.9916733.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 28, 2019
    Dataset provided by
    figshare
    Authors
    Abel Gomes
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    Gaussian Finder's cavity dataset in XML. This dataset describes the protein cavities output by a protein cavity detection method called Gaussian Finder. This method is described in the article available at: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1913-4

  2. Data from: TBGA: A Large-Scale Gene-Disease Association Dataset for...

    • data.europa.eu
    • data-staging.niaid.nih.gov
    unknown
    Updated Jan 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2022). TBGA: A Large-Scale Gene-Disease Association Dataset for Biomedical Relation Extraction [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-5911097?locale=pl
    Explore at:
    unknown(24338684)Available download formats
    Dataset updated
    Jan 26, 2022
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the TBGA dataset. TBGA is a large-scale, semi-automatically annotated dataset for Gene-Disease Association (GDA) extraction. The dataset consists of three text files, corresponding to train, validation, and test sets, plus an additional JSON file containing the mapping between relation names and IDs. Each record in train, validation, or test files corresponds to a single GDA extracted from a sentence. Records are represented as JSON objects with the following structure: text: sentence from which the GDA was extracted. relation: relation name associated with the given GDA. h: JSON object representing the gene entity, composed of: id: NCBI Entrez ID associated with the gene entity. name: NCBI official gene symbol associated with the gene entity. pos: list consisting of starting position and length of the gene mention within text. t: JSON object representing the disease entity, composed of: id: UMLS CUI associated with the disease entity. name: UMLS preferred term associated with the disease entity. pos: list consisting of starting position and length of the disease mention within text. TBGA contains over 200,000 instances and 100,000 bags. The zip file consists of one folder, named TBGA, containing the files corresponding to the dataset. If you use or extend our work, please cite the following: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04646-6#citeas TBGA paper can be found at: https://rdcu.be/cKkY2 TBGA code is available at: https://github.com/GDAMining/gda-extraction

  3. PATH_SURVEYOR_ExampleUseCases

    • zenodo.org
    zip
    Updated Apr 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timothy Shaw; Timothy Shaw (2024). PATH_SURVEYOR_ExampleUseCases [Dataset]. http://doi.org/10.5281/zenodo.10937799
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 7, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Timothy Shaw; Timothy Shaw
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PATH SURVEYOR pipeline examples that were originally hosted on http://shawlab.science/shiny/PATH_SURVEYOR_ExampleUseCases/

    It was originally presented in our publication PMID: 37380943

    https://pubmed.ncbi.nlm.nih.gov/37380943/ and https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05393-y

    Please contact timothy.shaw@moffitt.org for any additional questions.

  4. Repositiry for the article: "Gene regulatory network inference using...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated May 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alain Mbebi; Zoran Nikoloski; Alain Mbebi; Zoran Nikoloski (2023). Repositiry for the article: "Gene regulatory network inference using mixed-norms regularized multivariate model with covariance selection" by Alain Mbebi & Zoran Nikoloski [Dataset]. http://doi.org/10.5281/zenodo.7965949
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alain Mbebi; Zoran Nikoloski; Alain Mbebi; Zoran Nikoloski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the repository for the manuscript "Gene regulatory network inference using mixed-norms regularized multivariate model with covariance selection" by Alain J. Mbebi & Zoran Nikoloski.

    Organisation

    1. The folder Codes contains the following R scripts with the K-folds cross-validation option to learn the hyperparameters:
    • Mixed_L1L21_GRN.R which computes L1L21-solution
    • Mixed_L1L21G_GRN.R which computes L1L21G-solution
    • Mixed_L2L21_GRN.R which computes L2L21-solution
    • Mixed_L2L21G_GRN.R which computes L2L21G-solution
    • L1L21_Dream5_Scerevisiae_example_run.R is an example run using the L1L21-solution with S. cerevisiae data (Network 4 in DREAM5 challenge) All files needed to successfully run "L1L21_Dream5_Scerevisiae_example_run" are locaded in the folder Codes.

    2. The folder Figures contains all figures in the manuscript.

    3. The folder Inferred-networks contains all network objects for each dataset and each inference methods in the comparative analysis.

    Dependencies and required packages

    The following packages are required for the contending approaches in the comparative analysis: "devtools", "foreach", "plyr", "glmnet" and "randomForest".

    GENIE3

    The GENIE3 package can be installed from: http://bioconductor.org/packages/release/bioc/html/GENIE3.html

    TIGRESS

    The TIGRESS repository can be obtained from: https://github.com/jpvert/tigress

    ENNET

    The ENNET repository can be obtained from: https://github.com/slawekj/ennet

    PLSNET

    The Matlab source code of PLSNET can be obtained from: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1398-6#Sec17

    PORTIA

    The PORTIA repository can be obtained from: https://github.com/AntoinePassemiers/PORTIA

    D3GRN

    The Matlab source code of D3GRN can be obtained from: https://github.com/chenxofhit/D3GRN

    Fused-LASSO

    The fused-LASSO repository can be obtained from: https://github.com/omranian/inference-of-GRN-using-Fused-LASSO

    ANOVerence

    Because of some technical issues (e.g code's accessibility: http://www2.bio.ifi.lmu.de/˜kueffner/anova.tar.gz), we were not able to reproduce ANOVerence results and used the inferred network from DREAM5 challenge instead.

    4. Although the codes here were tested on Fedora 29 (Workstation Edition) using R (version 4.2.2), they can run under any Linux or Windows OS distributions, as long as all the required packages are compatible with the desired R version.

  5. GENIA Bio-medical event dataset

    • kaggle.com
    zip
    Updated Dec 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nishanth (2020). GENIA Bio-medical event dataset [Dataset]. https://www.kaggle.com/nishanthsalian/genia-biomedical-event-dataset
    Explore at:
    zip(813625 bytes)Available download formats
    Dataset updated
    Dec 5, 2020
    Authors
    Nishanth
    License

    http://www.gnu.org/licenses/agpl-3.0.htmlhttp://www.gnu.org/licenses/agpl-3.0.html

    Description

    Context

    Bio-medical texts have a lot of information which can be used for developments in the medical field. Traditionally, domain experts used to manually extract such information. Automating this information extraction task can help speed up progress in the field. To name a few use cases of bio-medical events, they show the effects of drugs on a person. They can also be used to identify certain medical conditions in a person. Hence automating extraction of events from bio-medical texts is very beneficial

    Content

    The dataset is just a simplified version of the event annotated GENIA dataset derived from the version available in TEES

    It consists of the original bio-medical text, labelled trigger words, location of trigger word in the text and the event type associated with the trigger word There are 3 sets of data (train (8k+ sentences), devel (about 3k sentences) and test (about 3k sentences)). Each set has 4 columns namely "Sentence", "TriggerWord", "TriggerWordLoc" and "EventType", capturing the original bio-medical text, trigger words in the sentence, location of the trigger words in the sentence and the event type associated with the trigger words respectively.

    Acknowledgements

    The dataset is just a simplified version of the event annotated GENIA dataset derived from the version available in TEES The original source dataset is from BioNLP Shared Task 2011 A complete unprocessed version seems to be present in genia-event-2011 dataset too

    For TEES licensing information please refer this link For GENIA dataset licensing information, please refer the file "GE11-LICENSE" present beside the data files (.csv) in this kaggle dataset

    Photo Credits: Louis Reed on Unsplash

  6. Data set 1 - Proteome set description

    • figshare.com
    txt
    Updated Mar 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mick Van Vlierberghe; Denis BAURAIN (2021). Data set 1 - Proteome set description [Dataset]. http://doi.org/10.6084/m9.figshare.13113893.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 20, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Mick Van Vlierberghe; Denis BAURAIN
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
  7. TBGA: Gene Disease Association Data

    • kaggle.com
    zip
    Updated May 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aravind_S (2024). TBGA: Gene Disease Association Data [Dataset]. https://www.kaggle.com/datasets/aravind012/tbga-gene-disease-association-data
    Explore at:
    zip(24889642 bytes)Available download formats
    Dataset updated
    May 13, 2024
    Authors
    Aravind_S
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the TBGA dataset. TBGA is a large-scale, semi-automatically annotated dataset for Gene-Disease Association (GDA) extraction. The dataset consists of three text files, corresponding to train, validation, and test sets, plus an additional JSON file containing the mapping between relation names and IDs. Each record in train, validation, or test files corresponds to a single GDA extracted from a sentence. Records are represented as JSON objects with the following structure:

    text: sentence from which the GDA was extracted. relation: relation name associated with the given GDA.

    h: JSON object representing the gene entity, composed of:

    id: NCBI Entrez ID associated with the gene entity. name: NCBI official gene symbol associated with the gene entity. pos: list consisting of starting position and length of the gene mention within text.

    t: JSON object representing the disease entity, composed of:

    id: UMLS CUI associated with the disease entity. name: UMLS preferred term associated with the disease entity. pos: list consisting of starting position and length of the disease mention within text. TBGA contains over 200,000 instances and 100,000 bags. The zip file consists of one folder, named TBGA, containing the files corresponding to the dataset.

    If you use or extend our work, please cite the following: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04646-6#citeas TBGA paper can be found at: https://rdcu.be/cKkY2 TBGA code is available at: https://github.com/GDAMining/gda-extraction

    Keeping Citation here because I don't know where else to keep it.

    """Cite all versions? You can cite all versions by using the DOI 10.5281/zenodo.5911096. This DOI represents all versions, and will always resolve to the latest one. Read more."""

    Data set is taken from https://zenodo.org/records/5911097

  8. KVFinder's cavity dataset in CSV

    • figshare.com
    zip
    Updated Sep 28, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abel Gomes (2019). KVFinder's cavity dataset in CSV [Dataset]. http://doi.org/10.6084/m9.figshare.9917012.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 28, 2019
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Abel Gomes
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    KVFinder's cavity dataset in CVS.This dataset describes the protein cavities output by a protein cavity detection method called KVFinder. This method is described in the article available at:https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-197

  9. Z

    CT-EBM-SP vs 2 - Corpus of Clinical Trials for Evidence-Based-Medicine in...

    • data-staging.niaid.nih.gov
    Updated Oct 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Campillos-Llanos, Leonardo; Valverde-Mateos, Ana; Capllónch-Carrión, Adrián (2024). CT-EBM-SP vs 2 - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 2) [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_13880598
    Explore at:
    Dataset updated
    Oct 2, 2024
    Dataset provided by
    Consejo Superior de Investigaciones Científicas
    Centro de Salud Mental Retiro
    Medical Terminology Unit, Spanish Royal Academy of Medicine
    Authors
    Campillos-Llanos, Leonardo; Valverde-Mateos, Ana; Capllónch-Carrión, Adrián
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish:

    • 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).- 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos.

    Texts were annotated with the following entities types:

    • Semantic groups from the Unified Medical Language System: • ANAT: anatomy • CHEM: pharmacological and chemical substances • DEVI: medical devices • DISO: pathologic conditions • LIVB: living beings, included the human being • PHYS: physiological processes • PROC: lab tests, diagnostic or therapeutic procedures- Medical drug information: • Contraindicated: a contraindicated drug or treatment • Dose: dose or strength • Form: dosage form • Route: administration route or mode- Temporal expressions • Age • Date • Duration • Frequency • Time- Miscellaneous medical entities: • Concept: abstract concepts, statistical tests or measurement scales • Food: foods or drinks • Observation: medical observations or clinical findings • Quantifier_or_Qualifier: quantifier or qualifier adjective • Result_or_Value: result or value of a measurement, laboratory analysis or procedure- Negation/Speculation: • Neg_cue: negation cue • Negated: negated event • Spec_cue: speculation cue • Speculated: speculated or uncertain event- Attributes: • Temporality: ◦ History_of: past event ◦ Future: future event • Experiencer: ◦ Patient: patient or participant on a clinical trial ◦ Family_member ◦ Other: other person different from the patient or the family member

    86 389 entities and 16 590 attributes were annotated. 10% of the corpus was doubly annotated, and high inter-annotator agreement (IAA) values were achieved: F1-score = 0.84% for entities; and F1-score = 0.88% for attributes (both in strict match).

    The dataset includes the texts and annotations used for the human evaluation of the medical named entity tool:

    • 100 clinical trial announcements from EudraCT not used for system development: we provide files of the version revised by medical professionals (Reference folder)- 100 clinical cases with Creative Commons license: we provide files with the files revised by medical professionals (Reference folder). These data come from:

      • Urgencias Bidasoa (https://urgenciasbidasoa.wordpress.com/casos-clinicos-3/) • Hipocampo.org (https://www.hipocampo.org/) • Cases published by Sociedad Andaluza de Medicina Familiar y Comunitaria (SAMFyC): we are greatly thankful for giving us permission to use these cases and we acknowledge that the copyright belongs to the authors' contents. Clinical cases were extracted from books published from 2016 to 2022 (https://www.samfyc.es/tipos-publicacion/publicaciones/). If you use these data, please, acknowledge the copyright and intellectual property rights to the authors' contents.

    The dataset is freely distributed for research and educational purposes under a Creative Commons Non-Commercial Attribution (CC-BY-NC-A) License.

    If you use the CT-EBM-SP vs. 2 dataset, please, cite as follows:

    Campillos-Llanos, L., A. Valverde-Mateos & A. Capllonch-Carrion (2024) Hybrid natural language processing tool for semantic annotation of medical texts in Spanish. BMC Bioinformatics. BioMed Central.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Abel Gomes (2019). Gaussian Finder's cavity dataset in XML [Dataset]. http://doi.org/10.6084/m9.figshare.9916733.v1

Gaussian Finder's cavity dataset in XML

Explore at:
zipAvailable download formats
Dataset updated
Sep 28, 2019
Dataset provided by
figshare
Authors
Abel Gomes
License

https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

Description

Gaussian Finder's cavity dataset in XML. This dataset describes the protein cavities output by a protein cavity detection method called Gaussian Finder. This method is described in the article available at: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1913-4

Search
Clear search
Close search
Google apps
Main menu