26 datasets found
  1. P

    MedQuAD Dataset

    • paperswithcode.com
    Updated Feb 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asma Ben Abacha; Dina Demner-Fushman (2024). MedQuAD Dataset [Dataset]. https://paperswithcode.com/dataset/medquad
    Explore at:
    Dataset updated
    Feb 16, 2024
    Authors
    Asma Ben Abacha; Dina Demner-Fushman
    Description

    MedQuAD includes 47,457 medical question-answer pairs created from 12 NIH websites (e.g. cancer.gov, niddk.nih.gov, GARD, MedlinePlus Health Topics). The collection covers 37 question types (e.g. Treatment, Diagnosis, Side Effects) associated with diseases, drugs and other medical entities such as tests.

  2. MedQuAD

    • huggingface.co
    Updated Oct 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lavita AI (2023). MedQuAD [Dataset]. https://huggingface.co/datasets/lavita/MedQuAD
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 20, 2023
    Dataset authored and provided by
    Lavita AI
    Description

    Dataset Card for "MedQuAD"

    This dataset is the converted version of MedQuAD. Some notes about the data:

    Multiple values in the umls_cui, umls_semantic_types, synonyms columns are separated by | character. Answers for [GARD, MPlusHerbsSupplements, ADAM, MPlusDrugs] sources (31,034 records) are removed from the original dataset to respect the MedlinePlus copyright. UMLS (umls): Unified Medical Language System CUI (cui): Concept Unique Identifier

      Question type discrepancies… See the full description on the dataset page: https://huggingface.co/datasets/lavita/MedQuAD.
    
  3. h

    MedQnA_version3

    • huggingface.co
    Updated Feb 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jose Gomez (2024). MedQnA_version3 [Dataset]. https://huggingface.co/datasets/joseagmz/MedQnA_version3
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 14, 2024
    Authors
    Jose Gomez
    Description

    Reference:

    "A Question-Entailment Approach to Question Answering". Asma Ben Abacha and Dina Demner-Fushman. BMC Bioinformatics, 2019.

  4. h

    MedQuad

    • huggingface.co
    Updated Aug 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hasan Mehdi (2024). MedQuad [Dataset]. https://huggingface.co/datasets/Hmehdi515/MedQuad
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 7, 2024
    Authors
    Hasan Mehdi
    Description

    Hmehdi515/MedQuad dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. medquad

    • kaggle.com
    Updated Mar 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rudrik Patel (2024). medquad [Dataset]. https://www.kaggle.com/datasets/rudrik01/medquad/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 12, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rudrik Patel
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Rudrik Patel

    Released under MIT

    Contents

  6. h

    MedQuad-MedicalQnADataset_test

    • huggingface.co
    Updated Nov 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jd (2024). MedQuad-MedicalQnADataset_test [Dataset]. https://huggingface.co/datasets/missjd123/MedQuad-MedicalQnADataset_test
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 14, 2024
    Authors
    jd
    Description

    missjd123/MedQuad-MedicalQnADataset_test dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    medquad

    • huggingface.co
    Updated Feb 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Panashe Manjengwa (2024). medquad [Dataset]. https://huggingface.co/datasets/whoispanashe/medquad
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 19, 2024
    Authors
    Panashe Manjengwa
    Description

    whoispanashe/medquad dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    medquad

    • huggingface.co
    Updated Feb 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhagyajit Pingua (2025). medquad [Dataset]. https://huggingface.co/datasets/bpingua/medquad
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2025
    Authors
    Bhagyajit Pingua
    Description

    bpingua/medquad dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    medquad

    • huggingface.co
    Updated Jun 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aisar Nasrun Bin Ramjee (2024). medquad [Dataset]. https://huggingface.co/datasets/sarnsrun/medquad
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 14, 2024
    Authors
    Aisar Nasrun Bin Ramjee
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    sarnsrun/medquad dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    MedQuad-phi2-1k

    • huggingface.co
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasad Mahamulkar (2024). MedQuad-phi2-1k [Dataset]. https://huggingface.co/datasets/prsdm/MedQuad-phi2-1k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 16, 2024
    Authors
    Prasad Mahamulkar
    Description

    prsdm/MedQuad-phi2-1k dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    clustered_MEDQUAD_dataset_with_groups

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mukul bedwa, clustered_MEDQUAD_dataset_with_groups [Dataset]. https://huggingface.co/datasets/mukulb/clustered_MEDQUAD_dataset_with_groups
    Explore at:
    Authors
    mukul bedwa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MedQuAD - Medical Question Answering Dataset

      Dataset Overview
    

    MedQuAD (Medical Question Answering Dataset) is a collection of 16,407 medical question-answer pairs derived from 9 NIH websites. It covers 37 question types related to diseases, drugs, medical tests, and treatments. This dataset is useful for building medical question-answering models, retrieval-augmented generation (RAG) systems, and other NLP applications in the healthcare domain.

      Dataset… See the full description on the dataset page: https://huggingface.co/datasets/mukulb/clustered_MEDQUAD_dataset_with_groups.
    
  12. h

    medquad-alpaca

    • huggingface.co
    Updated Apr 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhihao REN (2025). medquad-alpaca [Dataset]. https://huggingface.co/datasets/Nin8520/medquad-alpaca
    Explore at:
    Dataset updated
    Apr 26, 2025
    Authors
    Zhihao REN
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Reference

    If you use MedQuAD, please cite the original paper: @ARTICLE{BenAbacha-BMC-2019, author = {Asma {Ben Abacha} and Dina Demner{-}Fushman}, title = {A Question-Entailment Approach to Question Answering}, journal = {{BMC} Bioinform.}, volume = {20}, number = {1}, pages = {511:1--511:23}, year = {2019}, url =… See the full description on the dataset page: https://huggingface.co/datasets/Nin8520/medquad-alpaca.

  13. h

    medquad

    • huggingface.co
    Updated Jun 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mariane (2025). medquad [Dataset]. https://huggingface.co/datasets/marianeft/medquad
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 6, 2025
    Authors
    Mariane
    Description

    marianeft/medquad dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    MedQuad-MedicalQnADataset-1024-synth-aug-1024-synth-aug-1024-synth-aug

    • huggingface.co
    Updated Apr 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yiqiao Yin (2025). MedQuad-MedicalQnADataset-1024-synth-aug-1024-synth-aug-1024-synth-aug [Dataset]. https://huggingface.co/datasets/eagle0504/MedQuad-MedicalQnADataset-1024-synth-aug-1024-synth-aug-1024-synth-aug
    Explore at:
    Dataset updated
    Apr 29, 2025
    Authors
    Yiqiao Yin
    Description

    eagle0504/MedQuad-MedicalQnADataset-1024-synth-aug-1024-synth-aug-1024-synth-aug dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    PHI2-MEDQUAD-16407

    • huggingface.co
    Updated Aug 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nawaf22 (2024). PHI2-MEDQUAD-16407 [Dataset]. https://huggingface.co/datasets/DataScientist1122/PHI2-MEDQUAD-16407
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2024
    Authors
    Nawaf22
    Description

    DataScientist1122/PHI2-MEDQUAD-16407 dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. h

    MedQuad-QA16407-CL-PIPE_electra-base-squad2

    • huggingface.co
    Updated Jul 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nawaf22 (2024). MedQuad-QA16407-CL-PIPE_electra-base-squad2 [Dataset]. https://huggingface.co/datasets/DataScientist1122/MedQuad-QA16407-CL-PIPE_electra-base-squad2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 28, 2024
    Authors
    Nawaf22
    Description

    DataScientist1122/MedQuad-QA16407-CL-PIPE_electra-base-squad2 dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. QALM

    • huggingface.co
    Updated Sep 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ASUS Intelligent Cloud Services (2023). QALM [Dataset]. https://huggingface.co/datasets/asus-aics/QALM
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 27, 2023
    Dataset provided by
    ASUShttp://asus.com/
    Authors
    ASUS Intelligent Cloud Services
    Description

    The QALM Benchmark utilizes the following datasets:

    MEDQA (USMLE dataset) [1] MEDMCQA [2] BioASQ (2022) [3] [4] HEADQA [5] ProcessBank [6] PubmedQA [7] MMLU (subset of datasets focussing on clinical and medical knowledge) [8] BioMRC (Tiny A and B) [9] Fellowship of the Royal College of Ophthalmologists (FRCOphth) Exams [10] QA4MRE (Alzheimer's Questions) [11] MedicationInfo [12] MedQuad [13] LiveQA dataset (Ranked version of answers used to evaluate MedQuad) [13] [14] MashQA [15] MEDIQA-ANS… See the full description on the dataset page: https://huggingface.co/datasets/asus-aics/QALM.

  18. h

    medical_qa

    • huggingface.co
    Updated Apr 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thakur (2024). medical_qa [Dataset]. https://huggingface.co/datasets/Sakshamrzt/medical_qa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 7, 2024
    Authors
    Thakur
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Dataset Card for Dataset Name

      Dataset Details
    

    The MedQuad dataset normalised for use with mteb. The dataset contains questions and answers related to medical conditions, treatments, and protocols

      Dataset Sources [optional]
    

    Repository: [More Information Needed] Paper [optional]: [More Information Needed] Demo [optional]: [More Information Needed]

      Uses
    
    
    
    
    
    
    
      Direct Use
    

    [More Information Needed]

      Out-of-Scope Use
    

    [More… See the full description on the dataset page: https://huggingface.co/datasets/Sakshamrzt/medical_qa.

  19. h

    DoctorGPT

    • huggingface.co
    Updated Dec 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Czikkel (2023). DoctorGPT [Dataset]. https://huggingface.co/datasets/RobCzikkel/DoctorGPT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 14, 2023
    Authors
    Robert Czikkel
    Description

    Dataset

    This is an edited and tokenized version of the MedQuad-MedicalQnADataset dataset by keivalya. The original dataset contains 16K+ questions and answers between patient and doctor, which have been converted into a full prompt to train BioGPT by Microsoft.

      Tokenizer used
    

    microsoft/BioGPT-Large (BPE tokenizer)

      Full prompt
    

    prompt = f"""You are a helpful AI Doctor who answers medical questions. Below is a question from a patient. Your task is to answer the… See the full description on the dataset page: https://huggingface.co/datasets/RobCzikkel/DoctorGPT.

  20. h

    clean-medqa

    • huggingface.co
    Updated Apr 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chakradhar (2025). clean-medqa [Dataset]. https://huggingface.co/datasets/ChakradharS/clean-medqa
    Explore at:
    Dataset updated
    Apr 13, 2025
    Authors
    Chakradhar
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    🩺 Clean MedQA Dataset

    Improving healthcare through language-based AI.

      📝 Dataset Summary
    

    The Clean MedQA dataset is a refined version of data originally sourced from the MedQuAD (Medical Question Answering Dataset) — a well-known resource for building question-answering systems in the healthcare domain. This cleaned version is optimized for Natural Language Processing (NLP) tasks, particularly for training and evaluating models that need to understand or generate… See the full description on the dataset page: https://huggingface.co/datasets/ChakradharS/clean-medqa.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Asma Ben Abacha; Dina Demner-Fushman (2024). MedQuAD Dataset [Dataset]. https://paperswithcode.com/dataset/medquad

MedQuAD Dataset

Medical Question Answering Dataset

Explore at:
196 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Feb 16, 2024
Authors
Asma Ben Abacha; Dina Demner-Fushman
Description

MedQuAD includes 47,457 medical question-answer pairs created from 12 NIH websites (e.g. cancer.gov, niddk.nih.gov, GARD, MedlinePlus Health Topics). The collection covers 37 question types (e.g. Treatment, Diagnosis, Side Effects) associated with diseases, drugs and other medical entities such as tests.

Search
Clear search
Close search
Google apps
Main menu