100+ datasets found
  1. h

    medical-question-answering-datasets

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malikeh Ehghaghi, medical-question-answering-datasets [Dataset]. https://huggingface.co/datasets/Malikeh1375/medical-question-answering-datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Malikeh Ehghaghi
    Description

    Malikeh1375/medical-question-answering-datasets dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. h

    Dermatology-Question-Answer-Dataset-For-Fine-Tuning

    • huggingface.co
    Updated Nov 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Areeb Khan (2023). Dermatology-Question-Answer-Dataset-For-Fine-Tuning [Dataset]. https://huggingface.co/datasets/Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2023
    Authors
    Muhammad Areeb Khan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Details

    The data set has about 1 Million Tokens for Training and about 1500 question answers.

      Dataset Description
    

    This dataset is a comprehensive compilation of questions related to dermatology, spanning inquiries about various skin diseases, their symptoms, recommended medications, and available treatment modalities. Each question is paired with a concise and informative response, making it an ideal resource for training and fine-tuning language models in the… See the full description on the dataset page: https://huggingface.co/datasets/Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning.

  3. h

    subjqa

    • huggingface.co
    Updated Apr 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Megagon Labs (2024). subjqa [Dataset]. https://huggingface.co/datasets/megagonlabs/subjqa
    Explore at:
    Dataset updated
    Apr 16, 2024
    Dataset authored and provided by
    Megagon Labs
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    SubjQA is a question answering dataset that focuses on subjective questions and answers. The dataset consists of roughly 10,000 questions over reviews from 6 different domains: books, movies, grocery, electronics, TripAdvisor (i.e. hotels), and restaurants.

  4. h

    squad

    • huggingface.co
    • tensorflow.org
    • +1more
    Updated Jun 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranav R (2020). squad [Dataset]. https://huggingface.co/datasets/rajpurkar/squad
    Explore at:
    Dataset updated
    Jun 12, 2020
    Authors
    Pranav R
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for SQuAD

      Dataset Summary
    

    Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 1.1 contains 100,000+ question-answer pairs on 500+ articles.

      Supported Tasks and Leaderboards
    

    Question Answering.… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad.

  5. h

    qa-expert-multi-hop-qa-V1.0

    • huggingface.co
    Updated Oct 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khai Mai (2023). qa-expert-multi-hop-qa-V1.0 [Dataset]. https://huggingface.co/datasets/khaimaitien/qa-expert-multi-hop-qa-V1.0
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 13, 2023
    Authors
    Khai Mai
    Description

    Dataset Card for QA-Expert-multi-hop-qa-V1.0

    This dataset aims to provide multi-domain training data for the task: Question Answering, with a focus on Multi-hop Question Answering. In total, this dataset contains 25.5k for training and 3.19k for evaluation. You can take a look at the model we trained on this data: https://huggingface.co/khaimaitien/qa-expert-7B-V1.0 The dataset is mostly generated using the OpenAPI model (gpt-3.5-turbo-instruct). Please read more information about… See the full description on the dataset page: https://huggingface.co/datasets/khaimaitien/qa-expert-multi-hop-qa-V1.0.

  6. h

    tweet_qa

    • huggingface.co
    • opendatalab.com
    Updated Jun 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UC Santa Barbara NLP Group (2021). tweet_qa [Dataset]. https://huggingface.co/datasets/ucsbnlp/tweet_qa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 30, 2021
    Dataset authored and provided by
    UC Santa Barbara NLP Group
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for TweetQA

      Dataset Summary
    

    With social media becoming increasingly popular on which lots of news and real-time events are reported, developing automated question answering systems is critical to the effectiveness of many applications that rely on real-time knowledge. While previous question answering (QA) datasets have concentrated on formal text like news and Wikipedia, the first large-scale dataset for QA over social media data is presented. To make sure… See the full description on the dataset page: https://huggingface.co/datasets/ucsbnlp/tweet_qa.

  7. h

    medical-question-answering-synthetic

    • huggingface.co
    Updated May 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Petko Petkov (2025). medical-question-answering-synthetic [Dataset]. https://huggingface.co/datasets/petkopetkov/medical-question-answering-synthetic
    Explore at:
    Dataset updated
    May 1, 2025
    Authors
    Petko Petkov
    Description

    petkopetkov/medical-question-answering-synthetic dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    fquad

    • huggingface.co
    Updated May 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Illuin Technology (2024). fquad [Dataset]. https://huggingface.co/datasets/illuin/fquad
    Explore at:
    Dataset updated
    May 24, 2024
    Dataset authored and provided by
    Illuin Technology
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Description

    FQuAD: French Question Answering Dataset We introduce FQuAD, a native French Question Answering Dataset. FQuAD contains 25,000+ question and answer pairs. Finetuning CamemBERT on FQuAD yields a F1 score of 88% and an exact match of 77.9%.

  9. wiki_qa

    • huggingface.co
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft, wiki_qa [Dataset]. https://huggingface.co/datasets/microsoft/wiki_qa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Microsofthttp://microsoft.com/
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Card for "wiki_qa"

      Dataset Summary
    

    Wiki Question Answering corpus from Microsoft. The WikiQA corpus is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering.

      Supported Tasks and Leaderboards
    

    More Information Needed

      Languages
    

    More Information Needed

      Dataset Structure
    
    
    
    
    
      Data Instances
    
    
    
    
    
      default
    

    Size of downloaded dataset files: 7.10 MB Size… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/wiki_qa.

  10. msr_sqa

    • huggingface.co
    Updated Aug 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft (2023). msr_sqa [Dataset]. https://huggingface.co/datasets/microsoft/msr_sqa
    Explore at:
    Dataset updated
    Aug 29, 2023
    Dataset authored and provided by
    Microsofthttp://microsoft.com/
    License

    https://choosealicense.com/licenses/ms-pl/https://choosealicense.com/licenses/ms-pl/

    Description

    Recent work in semantic parsing for question answering has focused on long and complicated questions, many of which would seem unnatural if asked in a normal conversation between two humans. In an effort to explore a conversational QA setting, we present a more realistic task: answering sequences of simple but inter-related questions. We created SQA by asking crowdsourced workers to decompose 2,022 questions from WikiTableQuestions (WTQ), which contains highly-compositional questions about tables from Wikipedia. We had three workers decompose each WTQ question, resulting in a dataset of 6,066 sequences that contain 17,553 questions in total. Each question is also associated with answers in the form of cell locations in the tables.

  11. table-question-answering-checkpoint-downloads

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face OSS Metrics, table-question-answering-checkpoint-downloads [Dataset]. https://huggingface.co/datasets/open-source-metrics/table-question-answering-checkpoint-downloads
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face OSS Metrics
    Description

    open-source-metrics/table-question-answering-checkpoint-downloads dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. qasc

    • huggingface.co
    • opendatalab.com
    • +2more
    Updated Apr 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2023). qasc [Dataset]. https://huggingface.co/datasets/allenai/qasc
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 30, 2023
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for "qasc"

      Dataset Summary
    

    QASC is a question-answering dataset with a focus on sentence composition. It consists of 9,980 8-way multiple-choice questions about grade school science (8,134 train, 926 dev, 920 test), and comes with a corpus of 17M sentences.

      Supported Tasks and Leaderboards
    

    More Information Needed

      Languages
    

    More Information Needed

      Dataset Structure
    
    
    
    
    
      Data Instances
    
    
    
    
    
      default
    

    Size of… See the full description on the dataset page: https://huggingface.co/datasets/allenai/qasc.

  13. h

    NewQA

    • huggingface.co
    Updated Jun 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    brenda Adokorach (2023). NewQA [Dataset]. https://huggingface.co/datasets/badokorach/NewQA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 24, 2023
    Authors
    brenda Adokorach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for "squad"

      Dataset Summary
    

    Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

      Supported Tasks and Leaderboards
    

    More Information Needed

      Languages
    

    More Information Needed

      Dataset… See the full description on the dataset page: https://huggingface.co/datasets/badokorach/NewQA.
    
  14. h

    Data from: quora-question-answer-dataset

    • huggingface.co
    Updated Sep 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gregory Bizup (2023). quora-question-answer-dataset [Dataset]. https://huggingface.co/datasets/toughdata/quora-question-answer-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 2, 2023
    Authors
    Gregory Bizup
    License

    https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/

    Description

    Quora Question Answer Dataset (Quora-QuAD) contains 56,402 question-answer pairs scraped from Quora.

      Usage:
    

    For instructions on fine-tuning a model (Flan-T5) with this dataset, please check out the article: https://www.toughdata.net/blog/post/finetune-flan-t5-question-answer-quora-dataset

  15. h

    mlqa

    • huggingface.co
    • paperswithcode.com
    • +2more
    Updated Sep 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI at Meta (2023). mlqa [Dataset]. https://huggingface.co/datasets/facebook/mlqa
    Explore at:
    Dataset updated
    Sep 1, 2023
    Dataset authored and provided by
    AI at Meta
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    MLQA (MultiLingual Question Answering) is a benchmark dataset for evaluating cross-lingual question answering performance. MLQA consists of over 5K extractive QA instances (12K in English) in SQuAD format in seven languages - English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese. MLQA is highly parallel, with QA instances parallel between 4 different languages on average.

  16. h

    arxiv_qa

    • huggingface.co
    Updated Sep 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    taesiri (2023). arxiv_qa [Dataset]. https://huggingface.co/datasets/taesiri/arxiv_qa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 30, 2023
    Authors
    taesiri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ArXiv QA

    (TBD) Automated ArXiv question answering via large language models Github | Homepage | Simple QA - Hugging Face Space

      Automated Question Answering with ArXiv Papers
    
    
    
    
    
      Latest 25 Papers
    

    LIME: Localized Image Editing via Attention Regularization in Diffusion Models - [Arxiv] [QA]

    Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization - [Arxiv] [QA]

    VL-GPT: A Generative Pre-trained Transformer for Vision and… See the full description on the dataset page: https://huggingface.co/datasets/taesiri/arxiv_qa.

  17. gooaq

    • huggingface.co
    • paperswithcode.com
    • +1more
    Updated Aug 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2023). gooaq [Dataset]. https://huggingface.co/datasets/allenai/gooaq
    Explore at:
    Dataset updated
    Aug 29, 2023
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    GooAQ is a large-scale dataset with a variety of answer types. This dataset contains over 5 million questions and 3 million answers collected from Google. GooAQ questions are collected semi-automatically from the Google search engine using its autocomplete feature. This results in naturalistic questions of practical interest that are nonetheless short and expressed using simple language. GooAQ answers are mined from Google's responses to our collected questions, specifically from the answer boxes in the search results. This yields a rich space of answer types, containing both textual answers (short and long) as well as more structured ones such as collections.

  18. h

    squad_v2

    • huggingface.co
    Updated Jun 15, 2005
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranav R (2005). squad_v2 [Dataset]. https://huggingface.co/datasets/rajpurkar/squad_v2
    Explore at:
    Dataset updated
    Jun 15, 2005
    Authors
    Pranav R
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for SQuAD 2.0

      Dataset Summary
    

    Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad_v2.

  19. openbookqa

    • huggingface.co
    • paperswithcode.com
    • +1more
    Updated Mar 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2024). openbookqa [Dataset]. https://huggingface.co/datasets/allenai/openbookqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2024
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for OpenBookQA

      Dataset Summary
    

    OpenBookQA aims to promote research in advanced question-answering, probing a deeper understanding of both the topic (with salient facts summarized as an open book, also provided with the dataset) and the language it is expressed in. In particular, it contains questions that require multi-step reasoning, use of additional common and commonsense knowledge, and rich text comprehension. OpenBookQA is a new kind of… See the full description on the dataset page: https://huggingface.co/datasets/allenai/openbookqa.

  20. h

    clapnq

    • huggingface.co
    Updated Apr 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PrimeQA (2024). clapnq [Dataset]. https://huggingface.co/datasets/PrimeQA/clapnq
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 8, 2024
    Dataset authored and provided by
    PrimeQA
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    We present CLAP NQ, a benchmark Long-form Question Answering dataset for the full RAG pipeline. CLAP NQ includes long answers with grounded gold passages from Natural Questions (NQ) and a corpus to perform either retrieval, generation, or the full RAG pipeline. The CLAP NQ answers are concise, 3x smaller than the full passage, and cohesive, with multiple pieces of the passage that are not contiguous. This is the annotated data for the generation portion of the RAG pipeline. For more… See the full description on the dataset page: https://huggingface.co/datasets/PrimeQA/clapnq.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Malikeh Ehghaghi, medical-question-answering-datasets [Dataset]. https://huggingface.co/datasets/Malikeh1375/medical-question-answering-datasets

medical-question-answering-datasets

Malikeh1375/medical-question-answering-datasets

Explore at:
140 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Malikeh Ehghaghi
Description

Malikeh1375/medical-question-answering-datasets dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu