100+ datasets found
  1. h

    medical-question-answering-datasets

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malikeh Ehghaghi, medical-question-answering-datasets [Dataset]. https://huggingface.co/datasets/Malikeh1375/medical-question-answering-datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Malikeh Ehghaghi
    Description

    Malikeh1375/medical-question-answering-datasets dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. h

    subjqa

    • huggingface.co
    Updated May 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Megagon Labs (2024). subjqa [Dataset]. https://huggingface.co/datasets/megagonlabs/subjqa
    Explore at:
    Dataset updated
    May 24, 2024
    Dataset authored and provided by
    Megagon Labs
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    SubjQA is a question answering dataset that focuses on subjective questions and answers. The dataset consists of roughly 10,000 questions over reviews from 6 different domains: books, movies, grocery, electronics, TripAdvisor (i.e. hotels), and restaurants.

  3. h

    psychology-question-answer

    • huggingface.co
    Updated Sep 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Toomey (2025). psychology-question-answer [Dataset]. https://huggingface.co/datasets/BoltMonkey/psychology-question-answer
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 10, 2025
    Authors
    Andrew Toomey
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    A JSON formatted dataset comprising 197,180 question and answer pairs covering a wide range of topics encountered in a Bachelor level psychology course. I have included a broad range of question types, topics, and answer styles. The dataset was created using personal notes and several LLMs (such as GPT4) and manually assessed for veracity and completeness of response. Despite this, the size of the dataset prohibits me from ensuring every single answer is 100% accurate and up-to-date. As such… See the full description on the dataset page: https://huggingface.co/datasets/BoltMonkey/psychology-question-answer.

  4. h

    question-answering-paul-graham

    • huggingface.co
    Updated May 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LangChainDatasets (2023). question-answering-paul-graham [Dataset]. https://huggingface.co/datasets/LangChainDatasets/question-answering-paul-graham
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 15, 2023
    Dataset authored and provided by
    LangChainDatasets
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    LangChainDatasets/question-answering-paul-graham dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    squad

    • huggingface.co
    • tensorflow.org
    • +1more
    Updated Mar 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranav R (2024). squad [Dataset]. https://huggingface.co/datasets/rajpurkar/squad
    Explore at:
    Dataset updated
    Mar 5, 2024
    Authors
    Pranav R
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for SQuAD

      Dataset Summary
    

    Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 1.1 contains 100,000+ question-answer pairs on 500+ articles.

      Supported Tasks and Leaderboards
    

    Question Answering.… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad.

  6. h

    mlqa

    • huggingface.co
    • opendatalab.com
    • +1more
    Updated May 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI at Meta (2024). mlqa [Dataset]. https://huggingface.co/datasets/facebook/mlqa
    Explore at:
    Dataset updated
    May 29, 2024
    Dataset authored and provided by
    AI at Meta
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    MLQA (MultiLingual Question Answering) is a benchmark dataset for evaluating cross-lingual question answering performance. MLQA consists of over 5K extractive QA instances (12K in English) in SQuAD format in seven languages - English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese. MLQA is highly parallel, with QA instances parallel between 4 different languages on average.

  7. h

    medical-question-answering-all

    • huggingface.co
    Updated Apr 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Petko Petkov (2025). medical-question-answering-all [Dataset]. https://huggingface.co/datasets/petkopetkov/medical-question-answering-all
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 23, 2025
    Authors
    Petko Petkov
    Description

    petkopetkov/medical-question-answering-all dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    question-answering-state-of-the-union

    • huggingface.co
    Updated Apr 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LangChainDatasets (2023). question-answering-state-of-the-union [Dataset]. https://huggingface.co/datasets/LangChainDatasets/question-answering-state-of-the-union
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 4, 2023
    Dataset authored and provided by
    LangChainDatasets
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    LangChainDatasets/question-answering-state-of-the-union dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. wiki_qa

    • huggingface.co
    • opendatalab.com
    Updated Jun 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft (2024). wiki_qa [Dataset]. https://huggingface.co/datasets/microsoft/wiki_qa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 3, 2024
    Dataset authored and provided by
    Microsofthttp://microsoft.com/
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Card for "wiki_qa"

      Dataset Summary
    

    Wiki Question Answering corpus from Microsoft. The WikiQA corpus is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering.

      Supported Tasks and Leaderboards
    

    More Information Needed

      Languages
    

    More Information Needed

      Dataset Structure
    
    
    
    
    
      Data Instances
    
    
    
    
    
      default
    

    Size of downloaded dataset files: 7.10 MB Size… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/wiki_qa.

  10. h

    fquad

    • huggingface.co
    Updated May 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Illuin Technology (2024). fquad [Dataset]. https://huggingface.co/datasets/illuin/fquad
    Explore at:
    Dataset updated
    May 24, 2024
    Dataset authored and provided by
    Illuin Technology
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Description

    FQuAD: French Question Answering Dataset We introduce FQuAD, a native French Question Answering Dataset. FQuAD contains 25,000+ question and answer pairs. Finetuning CamemBERT on FQuAD yields a F1 score of 88% and an exact match of 77.9%.

  11. h

    video-game-question-answering

    • huggingface.co
    Updated Dec 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    taesiri (2023). video-game-question-answering [Dataset]. https://huggingface.co/datasets/taesiri/video-game-question-answering
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 7, 2023
    Authors
    taesiri
    Description

    taesiri/video-game-question-answering dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. stackexchange-question-answering

    • huggingface.co
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prime Intellect (2025). stackexchange-question-answering [Dataset]. https://huggingface.co/datasets/PrimeIntellect/stackexchange-question-answering
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 7, 2025
    Dataset provided by
    Prime Intellect, Inc.
    Authors
    Prime Intellect
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    SYNTHETIC-1

    This is a subset of the task data used to construct SYNTHETIC-1. You can find the full collection here

  13. document-question-answering-checkpoint-downloads

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face OSS Metrics, document-question-answering-checkpoint-downloads [Dataset]. https://huggingface.co/datasets/open-source-metrics/document-question-answering-checkpoint-downloads
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face OSS Metrics
    Description

    open-source-metrics/document-question-answering-checkpoint-downloads dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. qasc

    • huggingface.co
    • opendatalab.com
    • +1more
    Updated Apr 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2023). qasc [Dataset]. https://huggingface.co/datasets/allenai/qasc
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 30, 2023
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for "qasc"

      Dataset Summary
    

    QASC is a question-answering dataset with a focus on sentence composition. It consists of 9,980 8-way multiple-choice questions about grade school science (8,134 train, 926 dev, 920 test), and comes with a corpus of 17M sentences.

      Supported Tasks and Leaderboards
    

    More Information Needed

      Languages
    

    More Information Needed

      Dataset Structure
    
    
    
    
    
      Data Instances
    
    
    
    
    
      default
    

    Size of… See the full description on the dataset page: https://huggingface.co/datasets/allenai/qasc.

  15. h

    coqa

    • huggingface.co
    • tensorflow.org
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford NLP (2024). coqa [Dataset]. https://huggingface.co/datasets/stanfordnlp/coqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 24, 2024
    Dataset authored and provided by
    Stanford NLP
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Card for "coqa"

      Dataset Summary
    

    CoQA is a large-scale dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage.

      Supported Tasks and Leaderboards
    

    More Information Needed

      Languages… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/coqa.
    
  16. openbookqa

    • huggingface.co
    • opendatalab.com
    • +1more
    Updated Mar 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2024). openbookqa [Dataset]. https://huggingface.co/datasets/allenai/openbookqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2024
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for OpenBookQA

      Dataset Summary
    

    OpenBookQA aims to promote research in advanced question-answering, probing a deeper understanding of both the topic (with salient facts summarized as an open book, also provided with the dataset) and the language it is expressed in. In particular, it contains questions that require multi-step reasoning, use of additional common and commonsense knowledge, and rich text comprehension. OpenBookQA is a new kind of… See the full description on the dataset page: https://huggingface.co/datasets/allenai/openbookqa.

  17. h

    NaturalQuestionsV2

    • huggingface.co
    Updated Sep 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Zhang (2022). NaturalQuestionsV2 [Dataset]. https://huggingface.co/datasets/rongzhangibm/NaturalQuestionsV2
    Explore at:
    Dataset updated
    Sep 21, 2022
    Authors
    Rong Zhang
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Dataset Card for Natural Questions

      Dataset Summary
    

    The NQ corpus contains questions from real users, and it requires QA systems to read and comprehend an entire Wikipedia article that may or may not contain the answer to the question. The inclusion of real user questions, and the requirement that solutions should read an entire page to find the answer, cause NQ to be a more realistic and challenging task than prior QA datasets.

      Supported Tasks and Leaderboards… See the full description on the dataset page: https://huggingface.co/datasets/rongzhangibm/NaturalQuestionsV2.
    
  18. Data from: quac

    • huggingface.co
    • tensorflow.org
    • +1more
    Updated Dec 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2020). quac [Dataset]. https://huggingface.co/datasets/allenai/quac
    Explore at:
    Dataset updated
    Dec 12, 2020
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Question Answering in Context is a dataset for modeling, understanding, and participating in information seeking dialog. Data instances consist of an interactive dialog between two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts (spans) from the text. QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context.

  19. h

    squad_v2

    • huggingface.co
    • kaggle.com
    Updated Jun 15, 2005
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranav R (2005). squad_v2 [Dataset]. https://huggingface.co/datasets/rajpurkar/squad_v2
    Explore at:
    Dataset updated
    Jun 15, 2005
    Authors
    Pranav R
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for SQuAD 2.0

      Dataset Summary
    

    Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad_v2.

  20. h

    TQA

    • huggingface.co
    Updated Sep 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yifan Hou (2025). TQA [Dataset]. https://huggingface.co/datasets/yyyyifan/TQA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 26, 2025
    Authors
    Yifan Hou
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    This dataset:

    This is a visual question-answering dataset cleaned from "Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension". I only keep the questions that require the diagrams. For the whole dataset including more annotations such as captions, non-diagram questions, please check their webpage: https://prior.allenai.org/projects/tqa

      Citation
    

    Please cite the paper if you use this dataset.… See the full description on the dataset page: https://huggingface.co/datasets/yyyyifan/TQA.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Malikeh Ehghaghi, medical-question-answering-datasets [Dataset]. https://huggingface.co/datasets/Malikeh1375/medical-question-answering-datasets

medical-question-answering-datasets

Malikeh1375/medical-question-answering-datasets

Explore at:
187 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Malikeh Ehghaghi
Description

Malikeh1375/medical-question-answering-datasets dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu