100+ datasets found

h
Dermatology-Question-Answer-Dataset-For-Fine-Tuning
huggingface.co
Updated Nov 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Areeb Khan (2023). Dermatology-Question-Answer-Dataset-For-Fine-Tuning [Dataset]. https://huggingface.co/datasets/Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 17, 2023
Authors
Muhammad Areeb Khan
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Details

The data set has about 1 Million Tokens for Training and about 1500 question answers.

Dataset Description

This dataset is a comprehensive compilation of questions related to dermatology, spanning inquiries about various skin diseases, their symptoms, recommended medications, and available treatment modalities. Each question is paired with a concise and informative response, making it an ideal resource for training and fine-tuning language models in the… See the full description on the dataset page: https://huggingface.co/datasets/Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning.
h
medical-question-answering-datasets
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Malikeh Ehghaghi, medical-question-answering-datasets [Dataset]. https://huggingface.co/datasets/Malikeh1375/medical-question-answering-datasets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Malikeh Ehghaghi
Description
Malikeh1375/medical-question-answering-datasets dataset hosted on Hugging Face and contributed by the HF Datasets community
h
subjqa
huggingface.co
Updated Apr 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
subjqa [Dataset]. https://huggingface.co/datasets/megagonlabs/subjqa
Explore at:
Dataset updated
Apr 16, 2024
Dataset authored and provided by
Megagon Labs
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
SubjQA is a question answering dataset that focuses on subjective questions and answers. The dataset consists of roughly 10,000 questions over reviews from 6 different domains: books, movies, grocery, electronics, TripAdvisor (i.e. hotels), and restaurants.
h
squad
huggingface.co
tensorflow.org
+1more
Updated Jun 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pranav R (2020). squad [Dataset]. https://huggingface.co/datasets/rajpurkar/squad
Explore at:
Dataset updated
Jun 12, 2020
Authors
Pranav R
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for SQuAD

Dataset Summary

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 1.1 contains 100,000+ question-answer pairs on 500+ articles.

Supported Tasks and Leaderboards

Question Answering.… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad.
h
Data from: quora-question-answer-dataset
huggingface.co
Updated Sep 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gregory Bizup (2023). quora-question-answer-dataset [Dataset]. https://huggingface.co/datasets/toughdata/quora-question-answer-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 2, 2023
Authors
Gregory Bizup
License
https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/
Description
Quora Question Answer Dataset (Quora-QuAD) contains 56,402 question-answer pairs scraped from Quora.

Usage:

For instructions on fine-tuning a model (Flan-T5) with this dataset, please check out the article: https://www.toughdata.net/blog/post/finetune-flan-t5-question-answer-quora-dataset
wiki_qa
huggingface.co
opendatalab.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft, wiki_qa [Dataset]. https://huggingface.co/datasets/microsoft/wiki_qa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Microsofthttp://microsoft.com/
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Dataset Card for "wiki_qa"

Dataset Summary

Wiki Question Answering corpus from Microsoft. The WikiQA corpus is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering.

Supported Tasks and Leaderboards

More Information Needed

Languages

More Information Needed

Dataset Structure Data Instances default

Size of downloaded dataset files: 7.10 MB Size… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/wiki_qa.
gooaq
huggingface.co
paperswithcode.com
+1more
Updated May 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2023). gooaq [Dataset]. https://huggingface.co/datasets/allenai/gooaq
Explore at:
Dataset updated
May 23, 2024
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
GooAQ is a large-scale dataset with a variety of answer types. This dataset contains over 5 million questions and 3 million answers collected from Google. GooAQ questions are collected semi-automatically from the Google search engine using its autocomplete feature. This results in naturalistic questions of practical interest that are nonetheless short and expressed using simple language. GooAQ answers are mined from Google's responses to our collected questions, specifically from the answer boxes in the search results. This yields a rich space of answer types, containing both textual answers (short and long) as well as more structured ones such as collections.
h
fquad
huggingface.co
Updated May 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Illuin Technology (2024). fquad [Dataset]. https://huggingface.co/datasets/illuin/fquad
Explore at:
Dataset updated
May 24, 2024
Dataset authored and provided by
Illuin Technology
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
FQuAD: French Question Answering Dataset We introduce FQuAD, a native French Question Answering Dataset. FQuAD contains 25,000+ question and answer pairs. Finetuning CamemBERT on FQuAD yields a F1 score of 88% and an exact match of 77.9%.
Data from: quac
huggingface.co
tensorflow.org
+1more
Updated Dec 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2020). quac [Dataset]. https://huggingface.co/datasets/allenai/quac
Explore at:
Dataset updated
Dec 12, 2020
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Question Answering in Context is a dataset for modeling, understanding, and participating in information seeking dialog. Data instances consist of an interactive dialog between two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts (spans) from the text. QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context.
h
arxiv_qa
huggingface.co
Updated Sep 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
taesiri (2023). arxiv_qa [Dataset]. https://huggingface.co/datasets/taesiri/arxiv_qa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 30, 2023
Authors
taesiri
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ArXiv QA

(TBD) Automated ArXiv question answering via large language models Github | Homepage | Simple QA - Hugging Face Space

Automated Question Answering with ArXiv Papers Latest 25 Papers

LIME: Localized Image Editing via Attention Regularization in Diffusion Models - [Arxiv] [QA]

Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization - [Arxiv] [QA]

VL-GPT: A Generative Pre-trained Transformer for Vision and… See the full description on the dataset page: https://huggingface.co/datasets/taesiri/arxiv_qa.
h
tweet_qa
huggingface.co
opendatalab.com
Updated Jun 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UC Santa Barbara NLP Group (2021). tweet_qa [Dataset]. https://huggingface.co/datasets/ucsbnlp/tweet_qa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2021
Dataset authored and provided by
UC Santa Barbara NLP Group
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for TweetQA

Dataset Summary

With social media becoming increasingly popular on which lots of news and real-time events are reported, developing automated question answering systems is critical to the effectiveness of many applications that rely on real-time knowledge. While previous question answering (QA) datasets have concentrated on formal text like news and Wikipedia, the first large-scale dataset for QA over social media data is presented. To make sure… See the full description on the dataset page: https://huggingface.co/datasets/ucsbnlp/tweet_qa.
h
cncf-question-and-answer-dataset-for-llm-training
huggingface.co
Updated Nov 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kubermatic (2020). cncf-question-and-answer-dataset-for-llm-training [Dataset]. https://huggingface.co/datasets/Kubermatic/cncf-question-and-answer-dataset-for-llm-training
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 29, 2020
Dataset authored and provided by
Kubermatic
Description
CNCF QA Dataset for LLM Tuning

Description

This dataset, named cncf-qa-dataset-for-llm-tuning, is designed for fine-tuning large language models (LLMs) and is formatted in a question-answer (QA) style. The data is sourced from PDF and markdown (MD) files extracted from various project repositories within the CNCF (Cloud Native Computing Foundation) landscape. These files were processed and converted into a QA format to be fed into the LLM model. The dataset includes the… See the full description on the dataset page: https://huggingface.co/datasets/Kubermatic/cncf-question-and-answer-dataset-for-llm-training.
h
natural-questions
huggingface.co
Updated Jan 30, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sentence Transformers (2018). natural-questions [Dataset]. https://huggingface.co/datasets/sentence-transformers/natural-questions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 30, 2018
Dataset authored and provided by
Sentence Transformers
Description
Dataset Card for Natural Questions

This dataset is a collection of question-answer pairs from the Natural Questions dataset. See Natural Questions for additional information. This dataset can be used directly with Sentence Transformers to train embedding models.

Dataset Subsets pair subset

Columns: "question", "answer" Column types: str, str Examples:{ 'query': 'the si unit of the electric field is', 'answer': 'Electric field An electric field is a field… See the full description on the dataset page: https://huggingface.co/datasets/sentence-transformers/natural-questions.
h
medmcqa
huggingface.co
Updated May 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open Life Science AI (2022). medmcqa [Dataset]. https://huggingface.co/datasets/openlifescienceai/medmcqa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 22, 2022
Dataset authored and provided by
Open Life Science AI
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for MedMCQA

Dataset Summary

MedMCQA is a large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real-world medical entrance exam questions. MedMCQA has more than 194k high-quality AIIMS & NEET PG entrance exam MCQs covering 2.4k healthcare topics and 21 medical subjects are collected with an average token length of 12.77 and high topical diversity. Each sample contains a question, correct answer(s), and other options which require… See the full description on the dataset page: https://huggingface.co/datasets/openlifescienceai/medmcqa.
h
NewQA
huggingface.co
Updated Jun 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
brenda Adokorach (2023). NewQA [Dataset]. https://huggingface.co/datasets/badokorach/NewQA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 24, 2023
Authors
brenda Adokorach
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for "squad"

Dataset Summary

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

Supported Tasks and Leaderboards

More Information Needed

Languages

More Information Needed

Dataset… See the full description on the dataset page: https://huggingface.co/datasets/badokorach/NewQA.
h
mlqa
huggingface.co
paperswithcode.com
+2more
Updated May 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI at Meta (2024). mlqa [Dataset]. https://huggingface.co/datasets/facebook/mlqa
Explore at:
Dataset updated
May 29, 2024
Dataset authored and provided by
AI at Meta
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
MLQA (MultiLingual Question Answering) is a benchmark dataset for evaluating cross-lingual question answering performance. MLQA consists of over 5K extractive QA instances (12K in English) in SQuAD format in seven languages - English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese. MLQA is highly parallel, with QA instances parallel between 4 different languages on average.
h
qa-expert-multi-hop-qa-V1.0
huggingface.co
Updated Oct 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
qa-expert-multi-hop-qa-V1.0 [Dataset]. https://huggingface.co/datasets/khaimaitien/qa-expert-multi-hop-qa-V1.0
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 13, 2023
Authors
Khai Mai
Description
Dataset Card for QA-Expert-multi-hop-qa-V1.0

This dataset aims to provide multi-domain training data for the task: Question Answering, with a focus on Multi-hop Question Answering. In total, this dataset contains 25.5k for training and 3.19k for evaluation. You can take a look at the model we trained on this data: https://huggingface.co/khaimaitien/qa-expert-7B-V1.0 The dataset is mostly generated using the OpenAPI model (gpt-3.5-turbo-instruct). Please read more information about… See the full description on the dataset page: https://huggingface.co/datasets/khaimaitien/qa-expert-multi-hop-qa-V1.0.
h
covid_qa_castorini
huggingface.co
Updated May 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Castorini (2024). covid_qa_castorini [Dataset]. https://huggingface.co/datasets/castorini/covid_qa_castorini
Explore at:
Dataset updated
May 29, 2024
Dataset authored and provided by
Castorini
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
CovidQA is the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's COVID-19 Open Research Dataset Challenge.
h
squad_v2
huggingface.co
Updated Jun 15, 2005
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pranav R (2005). squad_v2 [Dataset]. https://huggingface.co/datasets/rajpurkar/squad_v2
Explore at:
Dataset updated
Jun 15, 2005
Authors
Pranav R
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for SQuAD 2.0

Dataset Summary

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad_v2.
h
Data from: yahoo-answers
huggingface.co
Updated Apr 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
yahoo-answers [Dataset]. https://huggingface.co/datasets/sentence-transformers/yahoo-answers
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 7, 2025
Dataset authored and provided by
Sentence Transformers
Description
Dataset Card for Yahoo Answers

This dataset is a collection of pairs containing titles, questions, and answers collected from Yahoo Answers. See the Yahoo Answers dataset for additional information. This dataset can be used directly with Sentence Transformers to train embedding models.

Dataset Subsets title-question-answer-pair subset

Columns: "question", "answer" Column types: str, str Examples:{ 'question': "why doesn't an optical mouse work on a glass… See the full description on the dataset page: https://huggingface.co/datasets/sentence-transformers/yahoo-answers.

Facebook

Twitter

Click to copy link

Link copied

Cite

Muhammad Areeb Khan (2023). Dermatology-Question-Answer-Dataset-For-Fine-Tuning [Dataset]. https://huggingface.co/datasets/Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning

Dermatology-Question-Answer-Dataset-For-Fine-Tuning

Dermatology Question Answering Dataset

Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 17, 2023

Authors

Muhammad Areeb Khan

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset Details

The data set has about 1 Million Tokens for Training and about 1500 question answers.

  Dataset Description

This dataset is a comprehensive compilation of questions related to dermatology, spanning inquiries about various skin diseases, their symptoms, recommended medications, and available treatment modalities. Each question is paired with a concise and informative response, making it an ideal resource for training and fine-tuning language models in the… See the full description on the dataset page: https://huggingface.co/datasets/Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning.

Clear search

Close search

Google apps

Main menu

Dermatology-Question-Answer-Dataset-For-Fine-Tuning

medical-question-answering-datasets

subjqa

squad

Data from: quora-question-answer-dataset

wiki_qa

gooaq

fquad

Data from: quac

arxiv_qa

tweet_qa

cncf-question-and-answer-dataset-for-llm-training

natural-questions

medmcqa

NewQA

mlqa

qa-expert-multi-hop-qa-V1.0

covid_qa_castorini

squad_v2

Data from: yahoo-answers

Dermatology-Question-Answer-Dataset-For-Fine-Tuning

Dermatology Question Answering Dataset

Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning