10 datasets found
  1. Medical Q&A Dataset

    • kaggle.com
    • huggingface.co
    zip
    Updated Dec 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    gvaldenebro (2023). Medical Q&A Dataset [Dataset]. https://www.kaggle.com/gvaldenebro/cancer-q-and-a-dataset
    Explore at:
    zip(10303223 bytes)Available download formats
    Dataset updated
    Dec 1, 2023
    Authors
    gvaldenebro
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset was scrapped from the MedQuAD repository and then converted it to a csv file. It only contains the questions and answers related to Cancer.

    MedQuAD: Medical Question Answering Dataset

    MedQuAD includes 47,457 medical question-answer pairs created from 12 NIH websites (e.g. cancer.gov, niddk.nih.gov, GARD, MedlinePlus Health Topics). The collection covers 37 question types (e.g. Treatment, Diagnosis, Side Effects) associated with diseases, drugs and other medical entities such as tests.

    We included additional annotations in the XML files, that could be used for diverse IR and NLP tasks, such as the question type, the question focus, its syonyms, its UMLS Concept Unique Identifier (CUI) and Semantic Type. We added the category of the question focus (Disease, Drug or Other) in the 4 MedlinePlus collections. All other collections are about diseases.

    The paper cited below describes the collection, the construction method as well as its use and evaluation within a medical question answering system.

    N.B. We removed the answers from 3 subsets to respect the MedlinePlus copyright (https://medlineplus.gov/copyright.html):
    (1) A.D.A.M. Medical Encyclopedia, (2) MedlinePlus Drug information, and (3) MedlinePlus Herbal medicine and supplement information. -- We kept all the other information including the URLs in case you want to crawl the answers. Please contact me if you have any questions.

    QA Test Collection

    We used the test questions of the TREC-2017 LiveQA medical task: https://github.com/abachaa/LiveQA_MedicalTask_TREC2017/tree/master/TestDataset.

    As described in our BMC paper, we have manually judged the answers retrieved by the IR and QA systems from the MedQuAD collection. We used the same judgment scores as the LiveQA Track: 1-Incorrect, 2-Related, 3-Incomplete, and 4-Excellent. -- Format of the qrels file: Question_ID judgment Answer_ID

    The QA test collection contains 2,479 judged answers that can be used to evaluate the performance of IR & QA systems on the LiveQA-Med test questions: https://github.com/abachaa/MedQuAD/blob/master/QA-TestSet-LiveQA-Med-Qrels-2479-Answers.zip

    Reference

    If you use the MedQuAD dataset and/or the collection of 2,479 judged answers, please cite the following paper: "A Question-Entailment Approach to Question Answering". Asma Ben Abacha and Dina Demner-Fushman. BMC Bioinformatics, 2019.

    @ARTICLE{BenAbacha-BMC-2019,  
       author  = {Asma {Ben Abacha} and Dina Demner{-}Fushman},
       title   = {A Question-Entailment Approach to Question Answering},
       journal = {{BMC} Bioinform.}, 
       volume  = {20},
       number  = {1},
         pages   = {511:1--511:23},
       year   = {2019},
    url    = {https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3119-4}
        }   
    

    License - The MedQuAD dataset is published under a Creative Commons Attribution 4.0 International Licence (CC BY). https://creativecommons.org/licenses/by/4.0/

  2. Cell Segmentation Data

    • kaggle.com
    zip
    Updated Sep 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ameer Hamza (2022). Cell Segmentation Data [Dataset]. https://www.kaggle.com/datasets/ameerhamza0311/cell-segmentation-data
    Explore at:
    zip(943037 bytes)Available download formats
    Dataset updated
    Sep 15, 2022
    Authors
    Ameer Hamza
    Description

    Feature data for 2019 training and test cells. The structure of the data table is as follows. The first column contains unique cell identifiers. The first row contains headers, including: "Case" – indicates if the cell was randomly assigned to the training or test sets "Class" – the true class of the cell by human review (poorly segmented (PS) or well-segmented (WS) ) The remaining 116 columns contain the feature data for each cell. A total of ~190 features were generated by the Cellomics Morphology Explorer software, but only a subset of 116 features with complete data for every cell were included in our analysis. (CSV 2 MB)

    SOURCE: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-340

  3. Z

    Data from: TBGA: A Large-Scale Gene-Disease Association Dataset for...

    • data-staging.niaid.nih.gov
    • nde-dev.biothings.io
    Updated Apr 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefano Marchesin; Gianmaria Silvello (2022). TBGA: A Large-Scale Gene-Disease Association Dataset for Biomedical Relation Extraction [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_5911096
    Explore at:
    Dataset updated
    Apr 1, 2022
    Dataset provided by
    University of Padua
    Authors
    Stefano Marchesin; Gianmaria Silvello
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the TBGA dataset. TBGA is a large-scale, semi-automatically annotated dataset for Gene-Disease Association (GDA) extraction. The dataset consists of three text files, corresponding to train, validation, and test sets, plus an additional JSON file containing the mapping between relation names and IDs. Each record in train, validation, or test files corresponds to a single GDA extracted from a sentence. Records are represented as JSON objects with the following structure:

    text: sentence from which the GDA was extracted.

    relation: relation name associated with the given GDA.

    h: JSON object representing the gene entity, composed of:

    id: NCBI Entrez ID associated with the gene entity.

    name: NCBI official gene symbol associated with the gene entity.

    pos: list consisting of starting position and length of the gene mention within text.

    t: JSON object representing the disease entity, composed of:

    id: UMLS CUI associated with the disease entity.

    name: UMLS preferred term associated with the disease entity.

    pos: list consisting of starting position and length of the disease mention within text.

    TBGA contains over 200,000 instances and 100,000 bags. The zip file consists of one folder, named TBGA, containing the files corresponding to the dataset.

    If you use or extend our work, please cite the following: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04646-6#citeas TBGA paper can be found at: https://rdcu.be/cKkY2 TBGA code is available at: https://github.com/GDAMining/gda-extraction

  4. KVFinder's cavity dataset in CSV

    • figshare.com
    zip
    Updated Sep 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abel Gomes (2019). KVFinder's cavity dataset in CSV [Dataset]. http://doi.org/10.6084/m9.figshare.9917012.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 28, 2019
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Abel Gomes
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    KVFinder's cavity dataset in CVS.This dataset describes the protein cavities output by a protein cavity detection method called KVFinder. This method is described in the article available at:https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-197

  5. Z

    Repositiry for the article: "Gene regulatory network inference using...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated May 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alain Mbebi; Zoran Nikoloski (2023). Repositiry for the article: "Gene regulatory network inference using mixed-norms regularized multivariate model with covariance selection" by Alain Mbebi & Zoran Nikoloski [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7965948
    Explore at:
    Dataset updated
    May 25, 2023
    Dataset provided by
    Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam-Golm, Germany // Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am M\"{u}hlenberg 1, 14476 Potsdam-Golm, Germany
    Authors
    Alain Mbebi; Zoran Nikoloski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the repository for the manuscript "Gene regulatory network inference using mixed-norms regularized multivariate model with covariance selection" by Alain J. Mbebi & Zoran Nikoloski.

    Organisation

    The folder Codes contains the following R scripts with the K-folds cross-validation option to learn the hyperparameters:

    Mixed_L1L21_GRN.R which computes L1L21-solution

    Mixed_L1L21G_GRN.R which computes L1L21G-solution

    Mixed_L2L21_GRN.R which computes L2L21-solution

    Mixed_L2L21G_GRN.R which computes L2L21G-solution

    L1L21_Dream5_Scerevisiae_example_run.R is an example run using the L1L21-solution with S. cerevisiae data (Network 4 in DREAM5 challenge) All files needed to successfully run "L1L21_Dream5_Scerevisiae_example_run" are locaded in the folder Codes.

    1. The folder Figures contains all figures in the manuscript.

    2. The folder Inferred-networks contains all network objects for each dataset and each inference methods in the comparative analysis.

    Dependencies and required packages

    The following packages are required for the contending approaches in the comparative analysis: "devtools", "foreach", "plyr", "glmnet" and "randomForest".

    GENIE3

    The GENIE3 package can be installed from: http://bioconductor.org/packages/release/bioc/html/GENIE3.html

    TIGRESS

    The TIGRESS repository can be obtained from: https://github.com/jpvert/tigress

    ENNET

    The ENNET repository can be obtained from: https://github.com/slawekj/ennet

    PLSNET

    The Matlab source code of PLSNET can be obtained from: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1398-6#Sec17

    PORTIA

    The PORTIA repository can be obtained from: https://github.com/AntoinePassemiers/PORTIA

    D3GRN

    The Matlab source code of D3GRN can be obtained from: https://github.com/chenxofhit/D3GRN

    Fused-LASSO

    The fused-LASSO repository can be obtained from: https://github.com/omranian/inference-of-GRN-using-Fused-LASSO

    ANOVerence

    Because of some technical issues (e.g code's accessibility: http://www2.bio.ifi.lmu.de/˜kueffner/anova.tar.gz), we were not able to reproduce ANOVerence results and used the inferred network from DREAM5 challenge instead.

    1. Although the codes here were tested on Fedora 29 (Workstation Edition) using R (version 4.2.2), they can run under any Linux or Windows OS distributions, as long as all the required packages are compatible with the desired R version.
  6. d

    Data from: Phylogenomic branch length estimation using quartets

    • search.dataone.org
    • datadryad.org
    Updated Dec 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasamin Tabatabaee; Chao Zhang; Tandy Warnow; Siavash Mirarab (2025). Phylogenomic branch length estimation using quartets [Dataset]. http://doi.org/10.5061/dryad.pg4f4qs3q
    Explore at:
    Dataset updated
    Dec 4, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Yasamin Tabatabaee; Chao Zhang; Tandy Warnow; Siavash Mirarab
    Description

    Branch lengths and topology of a species tree are essential in most downstream analyses, including estimation of diversification dates, characterization of selection, understanding adaptation, and comparative genomics. Modern phylogenomic analyses often use methods that account for the heterogeneity of evolutionary histories across the genome due to processes such as incomplete lineage sorting. However, these methods typically do not generate branch lengths in units that are usable by downstream applications, forcing phylogenomic analyses to resort to alternative shortcuts such as estimating branch lengths by concatenating gene alignments into a supermatrix. Yet, concatenation and other available approaches for estimating branch lengths fail to address heterogeneity across the genome. In this article, we derive expected values of gene tree branch lengths in substitution units under an extension of the multispecies coalescent (MSC) model that allows substitutions with varying rates acros..., , # Data from: Phylogenomic branch length estimation using quartets

    This repository contains the datasets used in the following paper:

    Y. Tabatabaee, C. Zhang, T. Warnow, S. Mirarab, Phylogenomic branch length estimation using quartets, Bioinformatics, Volume 39, Issue Supplement_1, June 2023, Pages i185–i193, https://doi.org/10.1093/bioinformatics/btad221

    For experiments in this study, we studied a collection of simulated and biological datasets with incomplete lineage sorting (ILS). We generated a new quartet dataset and regenerated species trees with substitution-unit branch lengths for previously published datasets from Zhang et. al. (2018) and Mai et. al. (2017). We also analyzed the mammalian biological dataset from [Song et. al. (2012)](https://www.pnas.org/doi/full/10.1073...,

  7. Data set 1 - Proteome set description

    • figshare.com
    txt
    Updated Mar 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mick Van Vlierberghe; Denis BAURAIN (2021). Data set 1 - Proteome set description [Dataset]. http://doi.org/10.6084/m9.figshare.13113893.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 20, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Mick Van Vlierberghe; Denis BAURAIN
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
  8. GENIA Bio-medical event dataset

    • kaggle.com
    zip
    Updated Dec 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nishanth (2020). GENIA Bio-medical event dataset [Dataset]. https://www.kaggle.com/nishanthsalian/genia-biomedical-event-dataset
    Explore at:
    zip(813625 bytes)Available download formats
    Dataset updated
    Dec 5, 2020
    Authors
    Nishanth
    License

    http://www.gnu.org/licenses/agpl-3.0.htmlhttp://www.gnu.org/licenses/agpl-3.0.html

    Description

    Context

    Bio-medical texts have a lot of information which can be used for developments in the medical field. Traditionally, domain experts used to manually extract such information. Automating this information extraction task can help speed up progress in the field. To name a few use cases of bio-medical events, they show the effects of drugs on a person. They can also be used to identify certain medical conditions in a person. Hence automating extraction of events from bio-medical texts is very beneficial

    Content

    The dataset is just a simplified version of the event annotated GENIA dataset derived from the version available in TEES

    It consists of the original bio-medical text, labelled trigger words, location of trigger word in the text and the event type associated with the trigger word There are 3 sets of data (train (8k+ sentences), devel (about 3k sentences) and test (about 3k sentences)). Each set has 4 columns namely "Sentence", "TriggerWord", "TriggerWordLoc" and "EventType", capturing the original bio-medical text, trigger words in the sentence, location of the trigger words in the sentence and the event type associated with the trigger words respectively.

    Acknowledgements

    The dataset is just a simplified version of the event annotated GENIA dataset derived from the version available in TEES The original source dataset is from BioNLP Shared Task 2011 A complete unprocessed version seems to be present in genia-event-2011 dataset too

    For TEES licensing information please refer this link For GENIA dataset licensing information, please refer the file "GE11-LICENSE" present beside the data files (.csv) in this kaggle dataset

    Photo Credits: Louis Reed on Unsplash

  9. TBGA: Gene Disease Association Data

    • kaggle.com
    zip
    Updated May 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aravind_S (2024). TBGA: Gene Disease Association Data [Dataset]. https://www.kaggle.com/datasets/aravind012/tbga-gene-disease-association-data
    Explore at:
    zip(24889642 bytes)Available download formats
    Dataset updated
    May 13, 2024
    Authors
    Aravind_S
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the TBGA dataset. TBGA is a large-scale, semi-automatically annotated dataset for Gene-Disease Association (GDA) extraction. The dataset consists of three text files, corresponding to train, validation, and test sets, plus an additional JSON file containing the mapping between relation names and IDs. Each record in train, validation, or test files corresponds to a single GDA extracted from a sentence. Records are represented as JSON objects with the following structure:

    text: sentence from which the GDA was extracted. relation: relation name associated with the given GDA.

    h: JSON object representing the gene entity, composed of:

    id: NCBI Entrez ID associated with the gene entity. name: NCBI official gene symbol associated with the gene entity. pos: list consisting of starting position and length of the gene mention within text.

    t: JSON object representing the disease entity, composed of:

    id: UMLS CUI associated with the disease entity. name: UMLS preferred term associated with the disease entity. pos: list consisting of starting position and length of the disease mention within text. TBGA contains over 200,000 instances and 100,000 bags. The zip file consists of one folder, named TBGA, containing the files corresponding to the dataset.

    If you use or extend our work, please cite the following: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04646-6#citeas TBGA paper can be found at: https://rdcu.be/cKkY2 TBGA code is available at: https://github.com/GDAMining/gda-extraction

    Keeping Citation here because I don't know where else to keep it.

    """Cite all versions? You can cite all versions by using the DOI 10.5281/zenodo.5911096. This DOI represents all versions, and will always resolve to the latest one. Read more."""

    Data set is taken from https://zenodo.org/records/5911097

  10. r

    Data from: CT-EBM-SP - Corpus of Clinical Trials for Evidence-Based-Medicine...

    • resodate.org
    Updated Jul 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leonardo Campillos-Llanos; Ana Valverde-Mateos; Adrián Capllónch-Carrión (2025). CT-EBM-SP - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 2) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly96ZW5vZG8ub3JnL3JlY29yZHMvMTM4ODA1OTk=
    Explore at:
    Dataset updated
    Jul 13, 2025
    Dataset provided by
    Zenodo
    Authors
    Leonardo Campillos-Llanos; Ana Valverde-Mateos; Adrián Capllónch-Carrión
    Description

    A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish: - 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).- 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Espaol de Estudios Clnicos. Texts were annotated with the following entities types: - Semantic groups from the Unified Medical Language System: ANAT: anatomy CHEM: pharmacological and chemical substances DEVI: medical devices DISO: pathologic conditions LIVB: living beings, included the human being PHYS: physiological processes PROC: lab tests, diagnostic or therapeutic procedures- Medical drug information: Contraindicated: a contraindicated drug or treatment Dose: dose or strength Form: dosage form Route: administration route or mode- Temporal expressions Age Date Duration Frequency Time- Miscellaneous medical entities: Concept: abstract concepts, statistical tests or measurement scales Food: foods or drinks Observation: medical observations or clinical findings Quantifier_or_Qualifier: quantifier or qualifier adjective Result_or_Value: result or value of a measurement, laboratory analysis or procedure- Negation/Speculation: Neg_cue: negation cue Negated: negated event Spec_cue: speculation cue Speculated: speculated or uncertain event- Attributes: Temporality: ◦ History_of: past event ◦ Future: future event Experiencer: ◦ Patient: patient or participant on a clinical trial ◦ Family_member ◦ Other: other person different from the patient or the family member 86 389 entities and 16 590 attributes were annotated. 10% of the corpus was doubly annotated, and high inter-annotator agreement (IAA) values were achieved: F1-score = 0.84% for entities; and F1-score = 0.88% for attributes (both in strict match). The dataset includes the texts and annotations used for the human evaluation of the medical named entity tool: - 100 clinical trial announcements from EudraCT not used for system development: we provide files of the version revised by medical professionals (Reference folder)- 100 clinical cases with Creative Commons license: we provide files with the files revised by medical professionals (Reference folder). These data come from: Urgencias Bidasoa (https://urgenciasbidasoa.wordpress.com/casos-clinicos-3/) Hipocampo.org (https://www.hipocampo.org/) Cases published by Sociedad Andaluza de Medicina Familiar y Comunitaria (SAMFyC): we are greatly thankful for giving us permission to use these cases and we acknowledge that the copyright belongs to the authors' contents. Clinical cases were extracted from books published from 2016 to 2022 (https://www.samfyc.es/tipos-publicacion/publicaciones/). If you use these data, please, acknowledge the copyright and intellectual property rights to the authors' contents. The dataset is freely distributed for research and educational purposes under a Creative Commons Non-Commercial Attribution (CC-BY-NC-A) License. If you use the CT-EBM-SP vs. 2 dataset, please, cite as follows: Campillos-Llanos, L., A. Valverde-Mateos A. Capllonch-Carrion (2024) Hybrid natural language processing tool for semantic annotation of medical texts in Spanish. BMC Bioinformatics. BioMed Central.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
gvaldenebro (2023). Medical Q&A Dataset [Dataset]. https://www.kaggle.com/gvaldenebro/cancer-q-and-a-dataset
Organization logo

Medical Q&A Dataset

Medical related Question/ Answer data

Explore at:
zip(10303223 bytes)Available download formats
Dataset updated
Dec 1, 2023
Authors
gvaldenebro
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This dataset was scrapped from the MedQuAD repository and then converted it to a csv file. It only contains the questions and answers related to Cancer.

MedQuAD: Medical Question Answering Dataset

MedQuAD includes 47,457 medical question-answer pairs created from 12 NIH websites (e.g. cancer.gov, niddk.nih.gov, GARD, MedlinePlus Health Topics). The collection covers 37 question types (e.g. Treatment, Diagnosis, Side Effects) associated with diseases, drugs and other medical entities such as tests.

We included additional annotations in the XML files, that could be used for diverse IR and NLP tasks, such as the question type, the question focus, its syonyms, its UMLS Concept Unique Identifier (CUI) and Semantic Type. We added the category of the question focus (Disease, Drug or Other) in the 4 MedlinePlus collections. All other collections are about diseases.

The paper cited below describes the collection, the construction method as well as its use and evaluation within a medical question answering system.

N.B. We removed the answers from 3 subsets to respect the MedlinePlus copyright (https://medlineplus.gov/copyright.html):
(1) A.D.A.M. Medical Encyclopedia, (2) MedlinePlus Drug information, and (3) MedlinePlus Herbal medicine and supplement information. -- We kept all the other information including the URLs in case you want to crawl the answers. Please contact me if you have any questions.

QA Test Collection

We used the test questions of the TREC-2017 LiveQA medical task: https://github.com/abachaa/LiveQA_MedicalTask_TREC2017/tree/master/TestDataset.

As described in our BMC paper, we have manually judged the answers retrieved by the IR and QA systems from the MedQuAD collection. We used the same judgment scores as the LiveQA Track: 1-Incorrect, 2-Related, 3-Incomplete, and 4-Excellent. -- Format of the qrels file: Question_ID judgment Answer_ID

The QA test collection contains 2,479 judged answers that can be used to evaluate the performance of IR & QA systems on the LiveQA-Med test questions: https://github.com/abachaa/MedQuAD/blob/master/QA-TestSet-LiveQA-Med-Qrels-2479-Answers.zip

Reference

If you use the MedQuAD dataset and/or the collection of 2,479 judged answers, please cite the following paper: "A Question-Entailment Approach to Question Answering". Asma Ben Abacha and Dina Demner-Fushman. BMC Bioinformatics, 2019.

@ARTICLE{BenAbacha-BMC-2019,  
   author  = {Asma {Ben Abacha} and Dina Demner{-}Fushman},
   title   = {A Question-Entailment Approach to Question Answering},
   journal = {{BMC} Bioinform.}, 
   volume  = {20},
   number  = {1},
     pages   = {511:1--511:23},
   year   = {2019},
url    = {https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3119-4}
    }   

License - The MedQuAD dataset is published under a Creative Commons Attribution 4.0 International Licence (CC BY). https://creativecommons.org/licenses/by/4.0/

Search
Clear search
Close search
Google apps
Main menu