Malikeh1375/medical-question-answering-datasets dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Details
The data set has about 1 Million Tokens for Training and about 1500 question answers.
Dataset Description
This dataset is a comprehensive compilation of questions related to dermatology, spanning inquiries about various skin diseases, their symptoms, recommended medications, and available treatment modalities. Each question is paired with a concise and informative response, making it an ideal resource for training and fine-tuning language models in the… See the full description on the dataset page: https://huggingface.co/datasets/Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning.
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
SubjQA is a question answering dataset that focuses on subjective questions and answers. The dataset consists of roughly 10,000 questions over reviews from 6 different domains: books, movies, grocery, electronics, TripAdvisor (i.e. hotels), and restaurants.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Card for SQuAD
Dataset Summary
Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 1.1 contains 100,000+ question-answer pairs on 500+ articles.
Supported Tasks and Leaderboards
Question Answering.… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad.
Dataset Card for QA-Expert-multi-hop-qa-V1.0
This dataset aims to provide multi-domain training data for the task: Question Answering, with a focus on Multi-hop Question Answering. In total, this dataset contains 25.5k for training and 3.19k for evaluation. You can take a look at the model we trained on this data: https://huggingface.co/khaimaitien/qa-expert-7B-V1.0 The dataset is mostly generated using the OpenAPI model (gpt-3.5-turbo-instruct). Please read more information about… See the full description on the dataset page: https://huggingface.co/datasets/khaimaitien/qa-expert-multi-hop-qa-V1.0.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Card for TweetQA
Dataset Summary
With social media becoming increasingly popular on which lots of news and real-time events are reported, developing automated question answering systems is critical to the effectiveness of many applications that rely on real-time knowledge. While previous question answering (QA) datasets have concentrated on formal text like news and Wikipedia, the first large-scale dataset for QA over social media data is presented. To make sure… See the full description on the dataset page: https://huggingface.co/datasets/ucsbnlp/tweet_qa.
petkopetkov/medical-question-answering-synthetic dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
FQuAD: French Question Answering Dataset We introduce FQuAD, a native French Question Answering Dataset. FQuAD contains 25,000+ question and answer pairs. Finetuning CamemBERT on FQuAD yields a F1 score of 88% and an exact match of 77.9%.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for "wiki_qa"
Dataset Summary
Wiki Question Answering corpus from Microsoft. The WikiQA corpus is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering.
Supported Tasks and Leaderboards
More Information Needed
Languages
More Information Needed
Dataset Structure
Data Instances
default
Size of downloaded dataset files: 7.10 MB Size… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/wiki_qa.
https://choosealicense.com/licenses/ms-pl/https://choosealicense.com/licenses/ms-pl/
Recent work in semantic parsing for question answering has focused on long and complicated questions, many of which would seem unnatural if asked in a normal conversation between two humans. In an effort to explore a conversational QA setting, we present a more realistic task: answering sequences of simple but inter-related questions. We created SQA by asking crowdsourced workers to decompose 2,022 questions from WikiTableQuestions (WTQ), which contains highly-compositional questions about tables from Wikipedia. We had three workers decompose each WTQ question, resulting in a dataset of 6,066 sequences that contain 17,553 questions in total. Each question is also associated with answers in the form of cell locations in the tables.
open-source-metrics/table-question-answering-checkpoint-downloads dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for "qasc"
Dataset Summary
QASC is a question-answering dataset with a focus on sentence composition. It consists of 9,980 8-way multiple-choice questions about grade school science (8,134 train, 926 dev, 920 test), and comes with a corpus of 17M sentences.
Supported Tasks and Leaderboards
More Information Needed
Languages
More Information Needed
Dataset Structure
Data Instances
default
Size of… See the full description on the dataset page: https://huggingface.co/datasets/allenai/qasc.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for "squad"
Dataset Summary
Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.
Supported Tasks and Leaderboards
More Information Needed
Languages
More Information Needed
Dataset… See the full description on the dataset page: https://huggingface.co/datasets/badokorach/NewQA.
https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/
Quora Question Answer Dataset (Quora-QuAD) contains 56,402 question-answer pairs scraped from Quora.
Usage:
For instructions on fine-tuning a model (Flan-T5) with this dataset, please check out the article: https://www.toughdata.net/blog/post/finetune-flan-t5-question-answer-quora-dataset
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
MLQA (MultiLingual Question Answering) is a benchmark dataset for evaluating cross-lingual question answering performance. MLQA consists of over 5K extractive QA instances (12K in English) in SQuAD format in seven languages - English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese. MLQA is highly parallel, with QA instances parallel between 4 different languages on average.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ArXiv QA
(TBD) Automated ArXiv question answering via large language models Github | Homepage | Simple QA - Hugging Face Space
Automated Question Answering with ArXiv Papers
Latest 25 Papers
LIME: Localized Image Editing via Attention Regularization in Diffusion Models - [Arxiv] [QA]
Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization - [Arxiv] [QA]
VL-GPT: A Generative Pre-trained Transformer for Vision and… See the full description on the dataset page: https://huggingface.co/datasets/taesiri/arxiv_qa.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
GooAQ is a large-scale dataset with a variety of answer types. This dataset contains over 5 million questions and 3 million answers collected from Google. GooAQ questions are collected semi-automatically from the Google search engine using its autocomplete feature. This results in naturalistic questions of practical interest that are nonetheless short and expressed using simple language. GooAQ answers are mined from Google's responses to our collected questions, specifically from the answer boxes in the search results. This yields a rich space of answer types, containing both textual answers (short and long) as well as more structured ones such as collections.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Card for SQuAD 2.0
Dataset Summary
Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad_v2.
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Dataset Card for OpenBookQA
Dataset Summary
OpenBookQA aims to promote research in advanced question-answering, probing a deeper understanding of both the topic (with salient facts summarized as an open book, also provided with the dataset) and the language it is expressed in. In particular, it contains questions that require multi-step reasoning, use of additional common and commonsense knowledge, and rich text comprehension. OpenBookQA is a new kind of… See the full description on the dataset page: https://huggingface.co/datasets/allenai/openbookqa.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
We present CLAP NQ, a benchmark Long-form Question Answering dataset for the full RAG pipeline. CLAP NQ includes long answers with grounded gold passages from Natural Questions (NQ) and a corpus to perform either retrieval, generation, or the full RAG pipeline. The CLAP NQ answers are concise, 3x smaller than the full passage, and cohesive, with multiple pieces of the passage that are not contiguous. This is the annotated data for the generation portion of the RAG pipeline. For more… See the full description on the dataset page: https://huggingface.co/datasets/PrimeQA/clapnq.
Malikeh1375/medical-question-answering-datasets dataset hosted on Hugging Face and contributed by the HF Datasets community