100+ datasets found
  1. h

    Dermatology-Question-Answer-Dataset-For-Fine-Tuning

    • huggingface.co
    Updated Nov 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Areeb Khan (2023). Dermatology-Question-Answer-Dataset-For-Fine-Tuning [Dataset]. https://huggingface.co/datasets/Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2023
    Authors
    Muhammad Areeb Khan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Details

    The data set has about 1 Million Tokens for Training and about 1500 question answers.

      Dataset Description
    

    This dataset is a comprehensive compilation of questions related to dermatology, spanning inquiries about various skin diseases, their symptoms, recommended medications, and available treatment modalities. Each question is paired with a concise and informative response, making it an ideal resource for training and fine-tuning language models in the… See the full description on the dataset page: https://huggingface.co/datasets/Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning.

  2. h

    medical-question-answering-datasets

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malikeh Ehghaghi, medical-question-answering-datasets [Dataset]. https://huggingface.co/datasets/Malikeh1375/medical-question-answering-datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Malikeh Ehghaghi
    Description

    Malikeh1375/medical-question-answering-datasets dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    subjqa

    • huggingface.co
    Updated Apr 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    subjqa [Dataset]. https://huggingface.co/datasets/megagonlabs/subjqa
    Explore at:
    Dataset updated
    Apr 16, 2024
    Dataset authored and provided by
    Megagon Labs
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    SubjQA is a question answering dataset that focuses on subjective questions and answers. The dataset consists of roughly 10,000 questions over reviews from 6 different domains: books, movies, grocery, electronics, TripAdvisor (i.e. hotels), and restaurants.

  4. h

    squad

    • huggingface.co
    • tensorflow.org
    • +1more
    Updated Jun 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranav R (2020). squad [Dataset]. https://huggingface.co/datasets/rajpurkar/squad
    Explore at:
    Dataset updated
    Jun 12, 2020
    Authors
    Pranav R
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for SQuAD

      Dataset Summary
    

    Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 1.1 contains 100,000+ question-answer pairs on 500+ articles.

      Supported Tasks and Leaderboards
    

    Question Answering.… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad.

  5. h

    Data from: quora-question-answer-dataset

    • huggingface.co
    Updated Sep 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gregory Bizup (2023). quora-question-answer-dataset [Dataset]. https://huggingface.co/datasets/toughdata/quora-question-answer-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 2, 2023
    Authors
    Gregory Bizup
    License

    https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/

    Description

    Quora Question Answer Dataset (Quora-QuAD) contains 56,402 question-answer pairs scraped from Quora.

      Usage:
    

    For instructions on fine-tuning a model (Flan-T5) with this dataset, please check out the article: https://www.toughdata.net/blog/post/finetune-flan-t5-question-answer-quora-dataset

  6. wiki_qa

    • huggingface.co
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft, wiki_qa [Dataset]. https://huggingface.co/datasets/microsoft/wiki_qa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Microsofthttp://microsoft.com/
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Card for "wiki_qa"

      Dataset Summary
    

    Wiki Question Answering corpus from Microsoft. The WikiQA corpus is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering.

      Supported Tasks and Leaderboards
    

    More Information Needed

      Languages
    

    More Information Needed

      Dataset Structure
    
    
    
    
    
      Data Instances
    
    
    
    
    
      default
    

    Size of downloaded dataset files: 7.10 MB Size… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/wiki_qa.

  7. gooaq

    • huggingface.co
    • paperswithcode.com
    • +1more
    Updated May 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2023). gooaq [Dataset]. https://huggingface.co/datasets/allenai/gooaq
    Explore at:
    Dataset updated
    May 23, 2024
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    GooAQ is a large-scale dataset with a variety of answer types. This dataset contains over 5 million questions and 3 million answers collected from Google. GooAQ questions are collected semi-automatically from the Google search engine using its autocomplete feature. This results in naturalistic questions of practical interest that are nonetheless short and expressed using simple language. GooAQ answers are mined from Google's responses to our collected questions, specifically from the answer boxes in the search results. This yields a rich space of answer types, containing both textual answers (short and long) as well as more structured ones such as collections.

  8. h

    fquad

    • huggingface.co
    Updated May 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Illuin Technology (2024). fquad [Dataset]. https://huggingface.co/datasets/illuin/fquad
    Explore at:
    Dataset updated
    May 24, 2024
    Dataset authored and provided by
    Illuin Technology
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Description

    FQuAD: French Question Answering Dataset We introduce FQuAD, a native French Question Answering Dataset. FQuAD contains 25,000+ question and answer pairs. Finetuning CamemBERT on FQuAD yields a F1 score of 88% and an exact match of 77.9%.

  9. Data from: quac

    • huggingface.co
    • tensorflow.org
    • +1more
    Updated Dec 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2020). quac [Dataset]. https://huggingface.co/datasets/allenai/quac
    Explore at:
    Dataset updated
    Dec 12, 2020
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Question Answering in Context is a dataset for modeling, understanding, and participating in information seeking dialog. Data instances consist of an interactive dialog between two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts (spans) from the text. QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context.

  10. h

    arxiv_qa

    • huggingface.co
    Updated Sep 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    taesiri (2023). arxiv_qa [Dataset]. https://huggingface.co/datasets/taesiri/arxiv_qa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 30, 2023
    Authors
    taesiri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ArXiv QA

    (TBD) Automated ArXiv question answering via large language models Github | Homepage | Simple QA - Hugging Face Space

      Automated Question Answering with ArXiv Papers
    
    
    
    
    
      Latest 25 Papers
    

    LIME: Localized Image Editing via Attention Regularization in Diffusion Models - [Arxiv] [QA]

    Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization - [Arxiv] [QA]

    VL-GPT: A Generative Pre-trained Transformer for Vision and… See the full description on the dataset page: https://huggingface.co/datasets/taesiri/arxiv_qa.

  11. h

    tweet_qa

    • huggingface.co
    • opendatalab.com
    Updated Jun 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UC Santa Barbara NLP Group (2021). tweet_qa [Dataset]. https://huggingface.co/datasets/ucsbnlp/tweet_qa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 30, 2021
    Dataset authored and provided by
    UC Santa Barbara NLP Group
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for TweetQA

      Dataset Summary
    

    With social media becoming increasingly popular on which lots of news and real-time events are reported, developing automated question answering systems is critical to the effectiveness of many applications that rely on real-time knowledge. While previous question answering (QA) datasets have concentrated on formal text like news and Wikipedia, the first large-scale dataset for QA over social media data is presented. To make sure… See the full description on the dataset page: https://huggingface.co/datasets/ucsbnlp/tweet_qa.

  12. h

    cncf-question-and-answer-dataset-for-llm-training

    • huggingface.co
    Updated Nov 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kubermatic (2020). cncf-question-and-answer-dataset-for-llm-training [Dataset]. https://huggingface.co/datasets/Kubermatic/cncf-question-and-answer-dataset-for-llm-training
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 29, 2020
    Dataset authored and provided by
    Kubermatic
    Description

    CNCF QA Dataset for LLM Tuning

      Description
    

    This dataset, named cncf-qa-dataset-for-llm-tuning, is designed for fine-tuning large language models (LLMs) and is formatted in a question-answer (QA) style. The data is sourced from PDF and markdown (MD) files extracted from various project repositories within the CNCF (Cloud Native Computing Foundation) landscape. These files were processed and converted into a QA format to be fed into the LLM model. The dataset includes the… See the full description on the dataset page: https://huggingface.co/datasets/Kubermatic/cncf-question-and-answer-dataset-for-llm-training.

  13. h

    natural-questions

    • huggingface.co
    Updated Jan 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sentence Transformers (2018). natural-questions [Dataset]. https://huggingface.co/datasets/sentence-transformers/natural-questions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 30, 2018
    Dataset authored and provided by
    Sentence Transformers
    Description

    Dataset Card for Natural Questions

    This dataset is a collection of question-answer pairs from the Natural Questions dataset. See Natural Questions for additional information. This dataset can be used directly with Sentence Transformers to train embedding models.

      Dataset Subsets
    
    
    
    
    
      pair subset
    

    Columns: "question", "answer" Column types: str, str Examples:{ 'query': 'the si unit of the electric field is', 'answer': 'Electric field An electric field is a field… See the full description on the dataset page: https://huggingface.co/datasets/sentence-transformers/natural-questions.

  14. h

    medmcqa

    • huggingface.co
    Updated May 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open Life Science AI (2022). medmcqa [Dataset]. https://huggingface.co/datasets/openlifescienceai/medmcqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 22, 2022
    Dataset authored and provided by
    Open Life Science AI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for MedMCQA

      Dataset Summary
    

    MedMCQA is a large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real-world medical entrance exam questions. MedMCQA has more than 194k high-quality AIIMS & NEET PG entrance exam MCQs covering 2.4k healthcare topics and 21 medical subjects are collected with an average token length of 12.77 and high topical diversity. Each sample contains a question, correct answer(s), and other options which require… See the full description on the dataset page: https://huggingface.co/datasets/openlifescienceai/medmcqa.

  15. h

    NewQA

    • huggingface.co
    Updated Jun 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    brenda Adokorach (2023). NewQA [Dataset]. https://huggingface.co/datasets/badokorach/NewQA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 24, 2023
    Authors
    brenda Adokorach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for "squad"

      Dataset Summary
    

    Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

      Supported Tasks and Leaderboards
    

    More Information Needed

      Languages
    

    More Information Needed

      Dataset… See the full description on the dataset page: https://huggingface.co/datasets/badokorach/NewQA.
    
  16. h

    mlqa

    • huggingface.co
    • paperswithcode.com
    • +2more
    Updated May 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI at Meta (2024). mlqa [Dataset]. https://huggingface.co/datasets/facebook/mlqa
    Explore at:
    Dataset updated
    May 29, 2024
    Dataset authored and provided by
    AI at Meta
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    MLQA (MultiLingual Question Answering) is a benchmark dataset for evaluating cross-lingual question answering performance. MLQA consists of over 5K extractive QA instances (12K in English) in SQuAD format in seven languages - English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese. MLQA is highly parallel, with QA instances parallel between 4 different languages on average.

  17. h

    qa-expert-multi-hop-qa-V1.0

    • huggingface.co
    Updated Oct 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    qa-expert-multi-hop-qa-V1.0 [Dataset]. https://huggingface.co/datasets/khaimaitien/qa-expert-multi-hop-qa-V1.0
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 13, 2023
    Authors
    Khai Mai
    Description

    Dataset Card for QA-Expert-multi-hop-qa-V1.0

    This dataset aims to provide multi-domain training data for the task: Question Answering, with a focus on Multi-hop Question Answering. In total, this dataset contains 25.5k for training and 3.19k for evaluation. You can take a look at the model we trained on this data: https://huggingface.co/khaimaitien/qa-expert-7B-V1.0 The dataset is mostly generated using the OpenAPI model (gpt-3.5-turbo-instruct). Please read more information about… See the full description on the dataset page: https://huggingface.co/datasets/khaimaitien/qa-expert-multi-hop-qa-V1.0.

  18. h

    covid_qa_castorini

    • huggingface.co
    Updated May 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Castorini (2024). covid_qa_castorini [Dataset]. https://huggingface.co/datasets/castorini/covid_qa_castorini
    Explore at:
    Dataset updated
    May 29, 2024
    Dataset authored and provided by
    Castorini
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    CovidQA is the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's COVID-19 Open Research Dataset Challenge.

  19. h

    squad_v2

    • huggingface.co
    Updated Jun 15, 2005
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranav R (2005). squad_v2 [Dataset]. https://huggingface.co/datasets/rajpurkar/squad_v2
    Explore at:
    Dataset updated
    Jun 15, 2005
    Authors
    Pranav R
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for SQuAD 2.0

      Dataset Summary
    

    Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad_v2.

  20. h

    Data from: yahoo-answers

    • huggingface.co
    Updated Apr 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yahoo-answers [Dataset]. https://huggingface.co/datasets/sentence-transformers/yahoo-answers
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 7, 2025
    Dataset authored and provided by
    Sentence Transformers
    Description

    Dataset Card for Yahoo Answers

    This dataset is a collection of pairs containing titles, questions, and answers collected from Yahoo Answers. See the Yahoo Answers dataset for additional information. This dataset can be used directly with Sentence Transformers to train embedding models.

      Dataset Subsets
    
    
    
    
    
      title-question-answer-pair subset
    

    Columns: "question", "answer" Column types: str, str Examples:{ 'question': "why doesn't an optical mouse work on a glass… See the full description on the dataset page: https://huggingface.co/datasets/sentence-transformers/yahoo-answers.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Muhammad Areeb Khan (2023). Dermatology-Question-Answer-Dataset-For-Fine-Tuning [Dataset]. https://huggingface.co/datasets/Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning

Dermatology-Question-Answer-Dataset-For-Fine-Tuning

Dermatology Question Answering Dataset

Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 17, 2023
Authors
Muhammad Areeb Khan
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset Details

The data set has about 1 Million Tokens for Training and about 1500 question answers.

  Dataset Description

This dataset is a comprehensive compilation of questions related to dermatology, spanning inquiries about various skin diseases, their symptoms, recommended medications, and available treatment modalities. Each question is paired with a concise and informative response, making it an ideal resource for training and fine-tuning language models in the… See the full description on the dataset page: https://huggingface.co/datasets/Mreeb/Dermatology-Question-Answer-Dataset-For-Fine-Tuning.

Search
Clear search
Close search
Google apps
Main menu