100+ datasets found

h
medical-question-answering-datasets
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Malikeh Ehghaghi, medical-question-answering-datasets [Dataset]. https://huggingface.co/datasets/Malikeh1375/medical-question-answering-datasets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Malikeh Ehghaghi
Description
Malikeh1375/medical-question-answering-datasets dataset hosted on Hugging Face and contributed by the HF Datasets community
h
subjqa
huggingface.co
Updated May 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Megagon Labs (2024). subjqa [Dataset]. https://huggingface.co/datasets/megagonlabs/subjqa
Explore at:
Dataset updated
May 24, 2024
Dataset authored and provided by
Megagon Labs
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
SubjQA is a question answering dataset that focuses on subjective questions and answers. The dataset consists of roughly 10,000 questions over reviews from 6 different domains: books, movies, grocery, electronics, TripAdvisor (i.e. hotels), and restaurants.
h
psychology-question-answer
huggingface.co
Updated Sep 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Toomey (2025). psychology-question-answer [Dataset]. https://huggingface.co/datasets/BoltMonkey/psychology-question-answer
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 10, 2025
Authors
Andrew Toomey
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
A JSON formatted dataset comprising 197,180 question and answer pairs covering a wide range of topics encountered in a Bachelor level psychology course. I have included a broad range of question types, topics, and answer styles. The dataset was created using personal notes and several LLMs (such as GPT4) and manually assessed for veracity and completeness of response. Despite this, the size of the dataset prohibits me from ensuring every single answer is 100% accurate and up-to-date. As such… See the full description on the dataset page: https://huggingface.co/datasets/BoltMonkey/psychology-question-answer.
h
question-answering-paul-graham
huggingface.co
Updated May 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LangChainDatasets (2023). question-answering-paul-graham [Dataset]. https://huggingface.co/datasets/LangChainDatasets/question-answering-paul-graham
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 15, 2023
Dataset authored and provided by
LangChainDatasets
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
LangChainDatasets/question-answering-paul-graham dataset hosted on Hugging Face and contributed by the HF Datasets community
h
squad
huggingface.co
tensorflow.org
+1more
Updated Mar 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pranav R (2024). squad [Dataset]. https://huggingface.co/datasets/rajpurkar/squad
Explore at:
Dataset updated
Mar 5, 2024
Authors
Pranav R
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for SQuAD

Dataset Summary

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 1.1 contains 100,000+ question-answer pairs on 500+ articles.

Supported Tasks and Leaderboards

Question Answering.… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad.
h
mlqa
huggingface.co
opendatalab.com
+1more
Updated May 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI at Meta (2024). mlqa [Dataset]. https://huggingface.co/datasets/facebook/mlqa
Explore at:
Dataset updated
May 29, 2024
Dataset authored and provided by
AI at Meta
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
MLQA (MultiLingual Question Answering) is a benchmark dataset for evaluating cross-lingual question answering performance. MLQA consists of over 5K extractive QA instances (12K in English) in SQuAD format in seven languages - English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese. MLQA is highly parallel, with QA instances parallel between 4 different languages on average.
h
medical-question-answering-all
huggingface.co
Updated Apr 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Petko Petkov (2025). medical-question-answering-all [Dataset]. https://huggingface.co/datasets/petkopetkov/medical-question-answering-all
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 23, 2025
Authors
Petko Petkov
Description
petkopetkov/medical-question-answering-all dataset hosted on Hugging Face and contributed by the HF Datasets community
h
question-answering-state-of-the-union
huggingface.co
Updated Apr 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LangChainDatasets (2023). question-answering-state-of-the-union [Dataset]. https://huggingface.co/datasets/LangChainDatasets/question-answering-state-of-the-union
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 4, 2023
Dataset authored and provided by
LangChainDatasets
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
LangChainDatasets/question-answering-state-of-the-union dataset hosted on Hugging Face and contributed by the HF Datasets community
wiki_qa
huggingface.co
opendatalab.com
Updated Jun 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft (2024). wiki_qa [Dataset]. https://huggingface.co/datasets/microsoft/wiki_qa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 3, 2024
Dataset authored and provided by
Microsofthttp://microsoft.com/
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Dataset Card for "wiki_qa"

Dataset Summary

Wiki Question Answering corpus from Microsoft. The WikiQA corpus is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering.

Supported Tasks and Leaderboards

More Information Needed

Languages

More Information Needed

Dataset Structure Data Instances default

Size of downloaded dataset files: 7.10 MB Size… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/wiki_qa.
h
fquad
huggingface.co
Updated May 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Illuin Technology (2024). fquad [Dataset]. https://huggingface.co/datasets/illuin/fquad
Explore at:
Dataset updated
May 24, 2024
Dataset authored and provided by
Illuin Technology
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
FQuAD: French Question Answering Dataset We introduce FQuAD, a native French Question Answering Dataset. FQuAD contains 25,000+ question and answer pairs. Finetuning CamemBERT on FQuAD yields a F1 score of 88% and an exact match of 77.9%.
h
video-game-question-answering
huggingface.co
Updated Dec 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
taesiri (2023). video-game-question-answering [Dataset]. https://huggingface.co/datasets/taesiri/video-game-question-answering
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 7, 2023
Authors
taesiri
Description
taesiri/video-game-question-answering dataset hosted on Hugging Face and contributed by the HF Datasets community
stackexchange-question-answering
huggingface.co
Updated Oct 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prime Intellect (2025). stackexchange-question-answering [Dataset]. https://huggingface.co/datasets/PrimeIntellect/stackexchange-question-answering
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 7, 2025
Dataset provided by
Prime Intellect, Inc.
Authors
Prime Intellect
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
SYNTHETIC-1

This is a subset of the task data used to construct SYNTHETIC-1. You can find the full collection here
document-question-answering-checkpoint-downloads
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face OSS Metrics, document-question-answering-checkpoint-downloads [Dataset]. https://huggingface.co/datasets/open-source-metrics/document-question-answering-checkpoint-downloads
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face OSS Metrics
Description
open-source-metrics/document-question-answering-checkpoint-downloads dataset hosted on Hugging Face and contributed by the HF Datasets community
qasc
huggingface.co
opendatalab.com
+1more
Updated Apr 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2023). qasc [Dataset]. https://huggingface.co/datasets/allenai/qasc
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 30, 2023
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for "qasc"

Dataset Summary

QASC is a question-answering dataset with a focus on sentence composition. It consists of 9,980 8-way multiple-choice questions about grade school science (8,134 train, 926 dev, 920 test), and comes with a corpus of 17M sentences.

Supported Tasks and Leaderboards

More Information Needed

Languages

More Information Needed

Dataset Structure Data Instances default

Size of… See the full description on the dataset page: https://huggingface.co/datasets/allenai/qasc.
h
coqa
huggingface.co
tensorflow.org
Updated Jan 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford NLP (2024). coqa [Dataset]. https://huggingface.co/datasets/stanfordnlp/coqa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 24, 2024
Dataset authored and provided by
Stanford NLP
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Dataset Card for "coqa"

Dataset Summary

CoQA is a large-scale dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage.

Supported Tasks and Leaderboards

More Information Needed

Languages… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/coqa.
openbookqa
huggingface.co
opendatalab.com
+1more
Updated Mar 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2024). openbookqa [Dataset]. https://huggingface.co/datasets/allenai/openbookqa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 1, 2024
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
Dataset Card for OpenBookQA

Dataset Summary

OpenBookQA aims to promote research in advanced question-answering, probing a deeper understanding of both the topic (with salient facts summarized as an open book, also provided with the dataset) and the language it is expressed in. In particular, it contains questions that require multi-step reasoning, use of additional common and commonsense knowledge, and rich text comprehension. OpenBookQA is a new kind of… See the full description on the dataset page: https://huggingface.co/datasets/allenai/openbookqa.
h
NaturalQuestionsV2
huggingface.co
Updated Sep 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Zhang (2022). NaturalQuestionsV2 [Dataset]. https://huggingface.co/datasets/rongzhangibm/NaturalQuestionsV2
Explore at:
Dataset updated
Sep 21, 2022
Authors
Rong Zhang
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Dataset Card for Natural Questions

Dataset Summary

The NQ corpus contains questions from real users, and it requires QA systems to read and comprehend an entire Wikipedia article that may or may not contain the answer to the question. The inclusion of real user questions, and the requirement that solutions should read an entire page to find the answer, cause NQ to be a more realistic and challenging task than prior QA datasets.

Supported Tasks and Leaderboards… See the full description on the dataset page: https://huggingface.co/datasets/rongzhangibm/NaturalQuestionsV2.
Data from: quac
huggingface.co
tensorflow.org
+1more
Updated Dec 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2020). quac [Dataset]. https://huggingface.co/datasets/allenai/quac
Explore at:
Dataset updated
Dec 12, 2020
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Question Answering in Context is a dataset for modeling, understanding, and participating in information seeking dialog. Data instances consist of an interactive dialog between two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts (spans) from the text. QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context.
h
squad_v2
huggingface.co
kaggle.com
Updated Jun 15, 2005
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pranav R (2005). squad_v2 [Dataset]. https://huggingface.co/datasets/rajpurkar/squad_v2
Explore at:
Dataset updated
Jun 15, 2005
Authors
Pranav R
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for SQuAD 2.0

Dataset Summary

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad_v2.
h
TQA
huggingface.co
Updated Sep 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yifan Hou (2025). TQA [Dataset]. https://huggingface.co/datasets/yyyyifan/TQA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 26, 2025
Authors
Yifan Hou
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
This dataset:

This is a visual question-answering dataset cleaned from "Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension". I only keep the questions that require the diagrams. For the whole dataset including more annotations such as captions, non-diagram questions, please check their webpage: https://prior.allenai.org/projects/tqa

Citation

Please cite the paper if you use this dataset.… See the full description on the dataset page: https://huggingface.co/datasets/yyyyifan/TQA.