MedQuAD includes 47,457 medical question-answer pairs created from 12 NIH websites (e.g. cancer.gov, niddk.nih.gov, GARD, MedlinePlus Health Topics). The collection covers 37 question types (e.g. Treatment, Diagnosis, Side Effects) associated with diseases, drugs and other medical entities such as tests.
Dataset Card for "MedQuAD"
This dataset is the converted version of MedQuAD. Some notes about the data:
Multiple values in the umls_cui, umls_semantic_types, synonyms columns are separated by | character. Answers for [GARD, MPlusHerbsSupplements, ADAM, MPlusDrugs] sources (31,034 records) are removed from the original dataset to respect the MedlinePlus copyright. UMLS (umls): Unified Medical Language System CUI (cui): Concept Unique Identifier
Question type discrepancies… See the full description on the dataset page: https://huggingface.co/datasets/lavita/MedQuAD.
Reference:
"A Question-Entailment Approach to Question Answering". Asma Ben Abacha and Dina Demner-Fushman. BMC Bioinformatics, 2019.
Hmehdi515/MedQuad dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Rudrik Patel
Released under MIT
missjd123/MedQuad-MedicalQnADataset_test dataset hosted on Hugging Face and contributed by the HF Datasets community
whoispanashe/medquad dataset hosted on Hugging Face and contributed by the HF Datasets community
bpingua/medquad dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
sarnsrun/medquad dataset hosted on Hugging Face and contributed by the HF Datasets community
prsdm/MedQuad-phi2-1k dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MedQuAD - Medical Question Answering Dataset
Dataset Overview
MedQuAD (Medical Question Answering Dataset) is a collection of 16,407 medical question-answer pairs derived from 9 NIH websites. It covers 37 question types related to diseases, drugs, medical tests, and treatments. This dataset is useful for building medical question-answering models, retrieval-augmented generation (RAG) systems, and other NLP applications in the healthcare domain.
Dataset… See the full description on the dataset page: https://huggingface.co/datasets/mukulb/clustered_MEDQUAD_dataset_with_groups.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Reference
If you use MedQuAD, please cite the original paper: @ARTICLE{BenAbacha-BMC-2019, author = {Asma {Ben Abacha} and Dina Demner{-}Fushman}, title = {A Question-Entailment Approach to Question Answering}, journal = {{BMC} Bioinform.}, volume = {20}, number = {1}, pages = {511:1--511:23}, year = {2019}, url =… See the full description on the dataset page: https://huggingface.co/datasets/Nin8520/medquad-alpaca.
marianeft/medquad dataset hosted on Hugging Face and contributed by the HF Datasets community
eagle0504/MedQuad-MedicalQnADataset-1024-synth-aug-1024-synth-aug-1024-synth-aug dataset hosted on Hugging Face and contributed by the HF Datasets community
DataScientist1122/PHI2-MEDQUAD-16407 dataset hosted on Hugging Face and contributed by the HF Datasets community
DataScientist1122/MedQuad-QA16407-CL-PIPE_electra-base-squad2 dataset hosted on Hugging Face and contributed by the HF Datasets community
The QALM Benchmark utilizes the following datasets:
MEDQA (USMLE dataset) [1] MEDMCQA [2] BioASQ (2022) [3] [4] HEADQA [5] ProcessBank [6] PubmedQA [7] MMLU (subset of datasets focussing on clinical and medical knowledge) [8] BioMRC (Tiny A and B) [9] Fellowship of the Royal College of Ophthalmologists (FRCOphth) Exams [10] QA4MRE (Alzheimer's Questions) [11] MedicationInfo [12] MedQuad [13] LiveQA dataset (Ranked version of answers used to evaluate MedQuad) [13] [14] MashQA [15] MEDIQA-ANS… See the full description on the dataset page: https://huggingface.co/datasets/asus-aics/QALM.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Dataset Card for Dataset Name
Dataset Details
The MedQuad dataset normalised for use with mteb. The dataset contains questions and answers related to medical conditions, treatments, and protocols
Dataset Sources [optional]
Repository: [More Information Needed] Paper [optional]: [More Information Needed] Demo [optional]: [More Information Needed]
Uses
Direct Use
[More Information Needed]
Out-of-Scope Use
[More… See the full description on the dataset page: https://huggingface.co/datasets/Sakshamrzt/medical_qa.
Dataset
This is an edited and tokenized version of the MedQuad-MedicalQnADataset dataset by keivalya. The original dataset contains 16K+ questions and answers between patient and doctor, which have been converted into a full prompt to train BioGPT by Microsoft.
Tokenizer used
microsoft/BioGPT-Large (BPE tokenizer)
Full prompt
prompt = f"""You are a helpful AI Doctor who answers medical questions. Below is a question from a patient. Your task is to answer the… See the full description on the dataset page: https://huggingface.co/datasets/RobCzikkel/DoctorGPT.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
🩺 Clean MedQA Dataset
Improving healthcare through language-based AI.
📝 Dataset Summary
The Clean MedQA dataset is a refined version of data originally sourced from the MedQuAD (Medical Question Answering Dataset) — a well-known resource for building question-answering systems in the healthcare domain. This cleaned version is optimized for Natural Language Processing (NLP) tasks, particularly for training and evaluating models that need to understand or generate… See the full description on the dataset page: https://huggingface.co/datasets/ChakradharS/clean-medqa.
MedQuAD includes 47,457 medical question-answer pairs created from 12 NIH websites (e.g. cancer.gov, niddk.nih.gov, GARD, MedlinePlus Health Topics). The collection covers 37 question types (e.g. Treatment, Diagnosis, Side Effects) associated with diseases, drugs and other medical entities such as tests.