100+ datasets found
  1. ComplexWebQuestions

    • opendatalab.com
    • huggingface.co
    zip
    Updated Mar 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allen Institute for Artificial Intelligence (2023). ComplexWebQuestions [Dataset]. https://opendatalab.com/OpenDataLab/ComplexWebQuestions
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 17, 2023
    Dataset provided by
    艾伦人工智能研究院http://allenai.org/
    Description

    ComplexWebQuestions is a dataset for answering complex questions that require reasoning over multiple web snippets. It contains a large set of complex questions in natural language, and can be used in multiple ways: 1) By interacting with a search engine, which is the focus of our paper (Talmor and Berant, 2018); 2) As a reading comprehension task: we release 12,725,989 web snippets that are relevant for the questions, and were collected during the development of our model; 3) As a semantic parsing task: each question is paired with a SPARQL query that can be executed against Freebase to retrieve the answer.

  2. T

    web_questions

    • tensorflow.org
    • opendatalab.com
    • +1more
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). web_questions [Dataset]. https://www.tensorflow.org/datasets/catalog/web_questions
    Explore at:
    Dataset updated
    Dec 6, 2022
    Description

    This dataset consists of 6,642 question/answer pairs. The questions are supposed to be answerable by Freebase, a large knowledge graph. The questions are mostly centered around a single named entity. The questions are popular ones asked on the web (at least in 2013).

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('web_questions', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  3. P

    WebQuestions Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Mar 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Berant; Andrew Chou; Roy Frostig; Percy Liang (2023). WebQuestions Dataset [Dataset]. https://paperswithcode.com/dataset/webquestions
    Explore at:
    Dataset updated
    Mar 30, 2023
    Authors
    Jonathan Berant; Andrew Chou; Roy Frostig; Percy Liang
    Description

    The WebQuestions dataset is a question answering dataset using Freebase as the knowledge base and contains 6,642 question-answer pairs. It was created by crawling questions through the Google Suggest API, and then obtaining answers using Amazon Mechanical Turk. The original split uses 3,778 examples for training and 2,032 for testing. All answers are defined as Freebase entities.

    Example questions (answers) in the dataset include “Where did Edgar Allan Poe died?” (baltimore) or “What degrees did Barack Obama get?” (bachelor_of_arts, juris_doctor).

  4. h

    spoken-web-questions

    • huggingface.co
    Updated Sep 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ultravox.ai (2024). spoken-web-questions [Dataset]. https://huggingface.co/datasets/fixie-ai/spoken-web-questions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2024
    Dataset provided by
    Ultravox.ai
    Description

    fixie-ai/spoken-web-questions dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. P

    ComplexWebQuestions Dataset

    • paperswithcode.com
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alon Talmor; Jonathan Berant (2023). ComplexWebQuestions Dataset [Dataset]. https://paperswithcode.com/dataset/complexwebquestions
    Explore at:
    Dataset updated
    Oct 12, 2023
    Authors
    Alon Talmor; Jonathan Berant
    Description

    ComplexWebQuestions is a dataset for answering complex questions that require reasoning over multiple web snippets. It contains a large set of complex questions in natural language, and can be used in multiple ways:

    By interacting with a search engine; As a reading comprehension task: the authors release 12,725,989 web snippets that are relevant for the questions, and were collected during the development of their model; As a semantic parsing task: each question is paired with a SPARQL query that can be executed against Freebase to retrieve the answer.

  6. h

    speech-web-questions

    • huggingface.co
    Updated Jan 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shi Qundong (2025). speech-web-questions [Dataset]. https://huggingface.co/datasets/TwinkStart/speech-web-questions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 14, 2025
    Authors
    Shi Qundong
    Description

    This dataset only contains test data, which is integrated into UltraEval-Audio(https://github.com/OpenBMB/UltraEval-Audio) framework.

    python audio_evals/main.py --dataset speech-web-questions --model gpt4o_speech

      🚀超凡体验,尽在UltraEval-Audio🚀
    

    UltraEval-Audio——全球首个同时支持语音理解和语音生成评估的开源框架,专为语音大模型评估打造,集合了34项权威Benchmark,覆盖语音、声音、医疗及音乐四大领域,支持十种语言,涵盖十二类任务。选择UltraEval-Audio,您将体验到前所未有的便捷与高效:

    一键式基准管理 📥:告别繁琐的手动下载与数据处理,UltraEval-Audio为您自动化完成这一切,轻松获取所需基准测试数据。 内置评估利器… See the full description on the dataset page: https://huggingface.co/datasets/TwinkStart/speech-web-questions.

  7. Question Answering Data

    • figshare.com
    txt
    Updated Oct 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    gaurav maheshwari (2017). Question Answering Data [Dataset]. http://doi.org/10.6084/m9.figshare.5006084.v9
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 23, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    gaurav maheshwari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A large scale dataset for complex Question Answering.

  8. P

    questions Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oleg Platonov; Denis Kuznedelev; Michael Diskin; Artem Babenko; Liudmila Prokhorenkova, questions Dataset [Dataset]. https://paperswithcode.com/dataset/questions
    Explore at:
    Authors
    Oleg Platonov; Denis Kuznedelev; Michael Diskin; Artem Babenko; Liudmila Prokhorenkova
    Description

    Questions is an interaction graph of users of a question-answering website based on data provided by Yandex Q.

  9. o

    Multilingual Question Answering over Linked Data: QALD-5 Dataset

    • explore.openaire.eu
    Updated Jan 1, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christina Unger (2015). Multilingual Question Answering over Linked Data: QALD-5 Dataset [Dataset]. http://doi.org/10.4119/unibi/2900686
    Explore at:
    Dataset updated
    Jan 1, 2015
    Authors
    Christina Unger
    Description

    This dataset comprises all questions used as benchmark in the 5th Open Challenge on Question Answering over Linked Data (QALD-5). Questions 1-340 and 391-410 are the training questions for multilingual question answering over DBpedia and hybrid question answering, respectively, and questions 341-390 and 411-420 are the corresponding test questions. [Documentation]: https://github.com/ag-sc/QALD/blob/master/5/documents/qald-5.pdf [Documentation]:

  10. F

    English Closed Ended Question Answer Text Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/english-closed-ended-question-answer-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    The English Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the English language, advancing the field of artificial intelligence.

    Dataset Content:

    This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in English. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.

    Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native English people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.

    This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.

    Question Diversity:

    To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.

    Answer Formats:

    To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.

    Data Format and Annotation Details:

    This fully labeled English Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.

    Quality and Accuracy:

    The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.

    The English versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.

    Continuous Updates and Customization:

    The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.

    License:

    The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy English Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.

  11. Data from: Semantic Parameter Matching in Web APIs with Transformer-based...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jun 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Kotstein; Sebastian Kotstein; Christian Decker; Christian Decker (2023). Semantic Parameter Matching in Web APIs with Transformer-based Question Answering [Dataset]. http://doi.org/10.5281/zenodo.8019625
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sebastian Kotstein; Sebastian Kotstein; Christian Decker; Christian Decker
    Description

    This repository contains the evaluation results of our study, as well as datasets and model checkpoints.
    For a detailed overview regarding the provided materials, please refer to README.md.

  12. F

    French Closed Ended Question Answer Text Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). French Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/french-closed-ended-question-answer-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    French
    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    The French Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the French language, advancing the field of artificial intelligence.

    Dataset Content:

    This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in French. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.

    Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native French people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.

    This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.

    Question Diversity:

    To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.

    Answer Formats:

    To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.

    Data Format and Annotation Details:

    This fully labeled French Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.

    Quality and Accuracy:

    The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.

    The French versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.

    Continuous Updates and Customization:

    The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.

    License:

    The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy French Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.

  13. h

    text_no-replay-14_spoken-web-questions

    • huggingface.co
    Updated Feb 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hsiao chiyuan (2025). text_no-replay-14_spoken-web-questions [Dataset]. https://huggingface.co/datasets/chiyuanhsiao/text_no-replay-14_spoken-web-questions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 20, 2025
    Authors
    hsiao chiyuan
    Description

    chiyuanhsiao/text_no-replay-14_spoken-web-questions dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. c

    Research data supporting "Question Answering System for Chemistry -- a...

    • repository.cam.ac.uk
    zip
    Updated Jun 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhou, Xiaochi; Nurkowski, Daniel; Menon, Angiras; Akroyd, Jethro; Mosbach, Sebastian; Kraft, Markus (2022). Research data supporting "Question Answering System for Chemistry -- a semantic agent extension" [Dataset]. http://doi.org/10.17863/CAM.78870
    Explore at:
    zip(18491 bytes)Available download formats
    Dataset updated
    Jun 7, 2022
    Dataset provided by
    Apollo
    University of Cambridge
    Authors
    Zhou, Xiaochi; Nurkowski, Daniel; Menon, Angiras; Akroyd, Jethro; Mosbach, Sebastian; Kraft, Markus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the evaluation set of the questions and the detailed responses to those evaluation questions.

  15. f

    LC-QuAD 1.0 German Version

    • figshare.com
    txt
    Updated Jun 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohnish Dubey (2020). LC-QuAD 1.0 German Version [Dataset]. http://doi.org/10.6084/m9.figshare.12570983.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 26, 2020
    Dataset provided by
    figshare
    Authors
    Mohnish Dubey
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These are 5000 questions from LC-QuAD 1.0 dataset, translated to German language. Each question consist of corresponding SPARQL query for DBpedia 2016-04.

  16. Z

    Data from: RESTBERTa: A Transformer-based Question Answering Approach for...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Decker, Christian (2024). RESTBERTa: A Transformer-based Question Answering Approach for Semantic Search in Web API Documentation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8349083
    Explore at:
    Dataset updated
    Jan 18, 2024
    Dataset provided by
    Kotstein, Sebastian
    Decker, Christian
    Description

    This repository contains the datasets and evaluation results of our study. For a detailed overview regarding the provided materials, please refer to README.md.

  17. f

    Full Annotated LC QuAD dataset

    • figshare.com
    txt
    Updated May 31, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohnish Dubey (2018). Full Annotated LC QuAD dataset [Dataset]. http://doi.org/10.6084/m9.figshare.5782197.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2018
    Dataset provided by
    figshare
    Authors
    Mohnish Dubey
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Manually fully-annotated LC-QuAD dataset, to create a gold label data set for entity and relation linking over dbpedia. For each question, the keywords are classified as entity or predicate. Also these keywords are mapped to the uri of knowledge graph (dbpedia) corresponding to the SPARQL query.

  18. F

    Italian Closed Ended Question Answer Text Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Italian Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/italian-closed-ended-question-answer-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    The Italian Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the Italian language, advancing the field of artificial intelligence.

    Dataset Content:

    This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in Italian. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.

    Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Italian people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.

    This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.

    Question Diversity:

    To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.

    Answer Formats:

    To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.

    Data Format and Annotation Details:

    This fully labeled Italian Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.

    Quality and Accuracy:

    The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.

    The Italian versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.

    Continuous Updates and Customization:

    The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.

    License:

    The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Italian Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.

  19. P

    ELI5 Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Sep 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angela Fan; Yacine Jernite; Ethan Perez; David Grangier; Jason Weston; Michael Auli (2024). ELI5 Dataset [Dataset]. https://paperswithcode.com/dataset/eli5
    Explore at:
    Dataset updated
    Sep 23, 2024
    Authors
    Angela Fan; Yacine Jernite; Ethan Perez; David Grangier; Jason Weston; Michael Auli
    Description

    ELI5 is a dataset for long-form question answering. It contains 270K complex, diverse questions that require explanatory multi-sentence answers. Web search results are used as evidence documents to answer each question.

    ELI5 is also a task in Dodecadialogue.

  20. h

    text_llama-origin_spoken-web-questions

    • huggingface.co
    Updated Feb 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hsiao chiyuan (2025). text_llama-origin_spoken-web-questions [Dataset]. https://huggingface.co/datasets/chiyuanhsiao/text_llama-origin_spoken-web-questions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2025
    Authors
    hsiao chiyuan
    Description

    chiyuanhsiao/text_llama-origin_spoken-web-questions dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Allen Institute for Artificial Intelligence (2023). ComplexWebQuestions [Dataset]. https://opendatalab.com/OpenDataLab/ComplexWebQuestions
Organization logo

ComplexWebQuestions

OpenDataLab/ComplexWebQuestions

Explore at:
290 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Mar 17, 2023
Dataset provided by
艾伦人工智能研究院http://allenai.org/
Description

ComplexWebQuestions is a dataset for answering complex questions that require reasoning over multiple web snippets. It contains a large set of complex questions in natural language, and can be used in multiple ways: 1) By interacting with a search engine, which is the focus of our paper (Talmor and Berant, 2018); 2) As a reading comprehension task: we release 12,725,989 web snippets that are relevant for the questions, and were collected during the development of our model; 3) As a semantic parsing task: each question is paired with a SPARQL query that can be executed against Freebase to retrieve the answer.

Search
Clear search
Close search
Google apps
Main menu