100+ datasets found

ComplexWebQuestions
opendatalab.com
huggingface.co
zip
Updated Mar 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allen Institute for Artificial Intelligence (2023). ComplexWebQuestions [Dataset]. https://opendatalab.com/OpenDataLab/ComplexWebQuestions
Explore at:
zipAvailable download formats
Dataset updated
Mar 17, 2023
Dataset provided by
艾伦人工智能研究院http://allenai.org/
Description
ComplexWebQuestions is a dataset for answering complex questions that require reasoning over multiple web snippets. It contains a large set of complex questions in natural language, and can be used in multiple ways: 1) By interacting with a search engine, which is the focus of our paper (Talmor and Berant, 2018); 2) As a reading comprehension task: we release 12,725,989 web snippets that are relevant for the questions, and were collected during the development of our model; 3) As a semantic parsing task: each question is paired with a SPARQL query that can be executed against Freebase to retrieve the answer.
T
web_questions
tensorflow.org
opendatalab.com
+1more
Updated Dec 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). web_questions [Dataset]. https://www.tensorflow.org/datasets/catalog/web_questions
Explore at:
Dataset updated
Dec 6, 2022
Description
This dataset consists of 6,642 question/answer pairs. The questions are supposed to be answerable by Freebase, a large knowledge graph. The questions are mostly centered around a single named entity. The questions are popular ones asked on the web (at least in 2013).

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('web_questions', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
P
WebQuestions Dataset
paperswithcode.com
opendatalab.com
Updated Mar 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Berant; Andrew Chou; Roy Frostig; Percy Liang (2023). WebQuestions Dataset [Dataset]. https://paperswithcode.com/dataset/webquestions
Explore at:
Dataset updated
Mar 30, 2023
Authors
Jonathan Berant; Andrew Chou; Roy Frostig; Percy Liang
Description
The WebQuestions dataset is a question answering dataset using Freebase as the knowledge base and contains 6,642 question-answer pairs. It was created by crawling questions through the Google Suggest API, and then obtaining answers using Amazon Mechanical Turk. The original split uses 3,778 examples for training and 2,032 for testing. All answers are defined as Freebase entities.

Example questions (answers) in the dataset include “Where did Edgar Allan Poe died?” (baltimore) or “What degrees did Barack Obama get?” (bachelor_of_arts, juris_doctor).
h
spoken-web-questions
huggingface.co
Updated Sep 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ultravox.ai (2024). spoken-web-questions [Dataset]. https://huggingface.co/datasets/fixie-ai/spoken-web-questions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 20, 2024
Dataset provided by
Ultravox.ai
Description
fixie-ai/spoken-web-questions dataset hosted on Hugging Face and contributed by the HF Datasets community
P
ComplexWebQuestions Dataset
paperswithcode.com
Updated Oct 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alon Talmor; Jonathan Berant (2023). ComplexWebQuestions Dataset [Dataset]. https://paperswithcode.com/dataset/complexwebquestions
Explore at:
Dataset updated
Oct 12, 2023
Authors
Alon Talmor; Jonathan Berant
Description
ComplexWebQuestions is a dataset for answering complex questions that require reasoning over multiple web snippets. It contains a large set of complex questions in natural language, and can be used in multiple ways:

By interacting with a search engine; As a reading comprehension task: the authors release 12,725,989 web snippets that are relevant for the questions, and were collected during the development of their model; As a semantic parsing task: each question is paired with a SPARQL query that can be executed against Freebase to retrieve the answer.
h
speech-web-questions
huggingface.co
Updated Jan 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shi Qundong (2025). speech-web-questions [Dataset]. https://huggingface.co/datasets/TwinkStart/speech-web-questions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 14, 2025
Authors
Shi Qundong
Description
This dataset only contains test data, which is integrated into UltraEval-Audio(https://github.com/OpenBMB/UltraEval-Audio) framework.

python audio_evals/main.py --dataset speech-web-questions --model gpt4o_speech

🚀超凡体验，尽在UltraEval-Audio🚀

UltraEval-Audio——全球首个同时支持语音理解和语音生成评估的开源框架，专为语音大模型评估打造，集合了34项权威Benchmark，覆盖语音、声音、医疗及音乐四大领域，支持十种语言，涵盖十二类任务。选择UltraEval-Audio，您将体验到前所未有的便捷与高效：

一键式基准管理 📥：告别繁琐的手动下载与数据处理，UltraEval-Audio为您自动化完成这一切，轻松获取所需基准测试数据。内置评估利器… See the full description on the dataset page: https://huggingface.co/datasets/TwinkStart/speech-web-questions.
Question Answering Data
figshare.com
txt
Updated Oct 23, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
gaurav maheshwari (2017). Question Answering Data [Dataset]. http://doi.org/10.6084/m9.figshare.5006084.v9
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5006084.v9
Dataset updated
Oct 23, 2017
Dataset provided by
Figsharehttp://figshare.com/
Authors
gaurav maheshwari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A large scale dataset for complex Question Answering.
P
questions Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oleg Platonov; Denis Kuznedelev; Michael Diskin; Artem Babenko; Liudmila Prokhorenkova, questions Dataset [Dataset]. https://paperswithcode.com/dataset/questions
Explore at:
Authors
Oleg Platonov; Denis Kuznedelev; Michael Diskin; Artem Babenko; Liudmila Prokhorenkova
Description
Questions is an interaction graph of users of a question-answering website based on data provided by Yandex Q.
o
Multilingual Question Answering over Linked Data: QALD-5 Dataset
explore.openaire.eu
Updated Jan 1, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christina Unger (2015). Multilingual Question Answering over Linked Data: QALD-5 Dataset [Dataset]. http://doi.org/10.4119/unibi/2900686
Explore at:
Unique identifier
https://doi.org/10.4119/unibi/2900686
Dataset updated
Jan 1, 2015
Authors
Christina Unger
Description
This dataset comprises all questions used as benchmark in the 5th Open Challenge on Question Answering over Linked Data (QALD-5). Questions 1-340 and 391-410 are the training questions for multilingual question answering over DBpedia and hybrid question answering, respectively, and questions 341-390 and 411-420 are the corresponding test questions. [Documentation]: https://github.com/ag-sc/QALD/blob/master/5/documents/qald-5.pdf [Documentation]:
F
English Closed Ended Question Answer Text Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). English Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/english-closed-ended-question-answer-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
The English Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the English language, advancing the field of artificial intelligence.
Dataset Content:
This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in English. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native English people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Question Diversity:
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.
Answer Formats:
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.
Data Format and Annotation Details:
This fully labeled English Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.
Quality and Accuracy:
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
The English versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.
Continuous Updates and Customization:
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy English Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.
Data from: Semantic Parameter Matching in Web APIs with Transformer-based...
zenodo.org
data.niaid.nih.gov
zip
Updated Jun 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sebastian Kotstein; Sebastian Kotstein; Christian Decker; Christian Decker (2023). Semantic Parameter Matching in Web APIs with Transformer-based Question Answering [Dataset]. http://doi.org/10.5281/zenodo.8019625
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8019625
Dataset updated
Jun 12, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sebastian Kotstein; Sebastian Kotstein; Christian Decker; Christian Decker
Description
This repository contains the evaluation results of our study, as well as datasets and model checkpoints.
For a detailed overview regarding the provided materials, please refer to README.md.
F
French Closed Ended Question Answer Text Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). French Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/french-closed-ended-question-answer-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
French
Dataset funded by
FutureBeeAI
Description
What’s Included
The French Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the French language, advancing the field of artificial intelligence.
Dataset Content:
This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in French. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native French people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Question Diversity:
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.
Answer Formats:
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.
Data Format and Annotation Details:
This fully labeled French Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.
Quality and Accuracy:
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
The French versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.
Continuous Updates and Customization:
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy French Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.
h
text_no-replay-14_spoken-web-questions
huggingface.co
Updated Feb 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hsiao chiyuan (2025). text_no-replay-14_spoken-web-questions [Dataset]. https://huggingface.co/datasets/chiyuanhsiao/text_no-replay-14_spoken-web-questions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 20, 2025
Authors
hsiao chiyuan
Description
chiyuanhsiao/text_no-replay-14_spoken-web-questions dataset hosted on Hugging Face and contributed by the HF Datasets community
c
Research data supporting "Question Answering System for Chemistry -- a...
repository.cam.ac.uk
zip
Updated Jun 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhou, Xiaochi; Nurkowski, Daniel; Menon, Angiras; Akroyd, Jethro; Mosbach, Sebastian; Kraft, Markus (2022). Research data supporting "Question Answering System for Chemistry -- a semantic agent extension" [Dataset]. http://doi.org/10.17863/CAM.78870
Explore at:
zip(18491 bytes)Available download formats
Unique identifier
https://doi.org/10.17863/CAM.78870
Dataset updated
Jun 7, 2022
Dataset provided by
Apollo
University of Cambridge
Authors
Zhou, Xiaochi; Nurkowski, Daniel; Menon, Angiras; Akroyd, Jethro; Mosbach, Sebastian; Kraft, Markus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the evaluation set of the questions and the detailed responses to those evaluation questions.
f
LC-QuAD 1.0 German Version
figshare.com
txt
Updated Jun 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohnish Dubey (2020). LC-QuAD 1.0 German Version [Dataset]. http://doi.org/10.6084/m9.figshare.12570983.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12570983.v1
Dataset updated
Jun 26, 2020
Dataset provided by
figshare
Authors
Mohnish Dubey
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are 5000 questions from LC-QuAD 1.0 dataset, translated to German language. Each question consist of corresponding SPARQL query for DBpedia 2016-04.
Z
Data from: RESTBERTa: A Transformer-based Question Answering Approach for...
data.niaid.nih.gov
zenodo.org
Updated Jan 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Decker, Christian (2024). RESTBERTa: A Transformer-based Question Answering Approach for Semantic Search in Web API Documentation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8349083
Explore at:
Dataset updated
Jan 18, 2024
Dataset provided by
Kotstein, Sebastian
Decker, Christian
Description
This repository contains the datasets and evaluation results of our study. For a detailed overview regarding the provided materials, please refer to README.md.
f
Full Annotated LC QuAD dataset
figshare.com
txt
Updated May 31, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohnish Dubey (2018). Full Annotated LC QuAD dataset [Dataset]. http://doi.org/10.6084/m9.figshare.5782197.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5782197.v2
Dataset updated
May 31, 2018
Dataset provided by
figshare
Authors
Mohnish Dubey
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Manually fully-annotated LC-QuAD dataset, to create a gold label data set for entity and relation linking over dbpedia. For each question, the keywords are classified as entity or predicate. Also these keywords are mapped to the uri of knowledge graph (dbpedia) corresponding to the SPARQL query.
F
Italian Closed Ended Question Answer Text Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Italian Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/italian-closed-ended-question-answer-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
The Italian Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the Italian language, advancing the field of artificial intelligence.
Dataset Content:
This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in Italian. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Italian people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Question Diversity:
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.
Answer Formats:
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.
Data Format and Annotation Details:
This fully labeled Italian Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.
Quality and Accuracy:
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
The Italian versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.
Continuous Updates and Customization:
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Italian Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.
P
ELI5 Dataset
paperswithcode.com
opendatalab.com
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angela Fan; Yacine Jernite; Ethan Perez; David Grangier; Jason Weston; Michael Auli (2024). ELI5 Dataset [Dataset]. https://paperswithcode.com/dataset/eli5
Explore at:
Dataset updated
Sep 23, 2024
Authors
Angela Fan; Yacine Jernite; Ethan Perez; David Grangier; Jason Weston; Michael Auli
Description
ELI5 is a dataset for long-form question answering. It contains 270K complex, diverse questions that require explanatory multi-sentence answers. Web search results are used as evidence documents to answer each question.

ELI5 is also a task in Dodecadialogue.
h
text_llama-origin_spoken-web-questions
huggingface.co
Updated Feb 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hsiao chiyuan (2025). text_llama-origin_spoken-web-questions [Dataset]. https://huggingface.co/datasets/chiyuanhsiao/text_llama-origin_spoken-web-questions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2025
Authors
hsiao chiyuan
Description
chiyuanhsiao/text_llama-origin_spoken-web-questions dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

Allen Institute for Artificial Intelligence (2023). ComplexWebQuestions [Dataset]. https://opendatalab.com/OpenDataLab/ComplexWebQuestions

ComplexWebQuestions

OpenDataLab/ComplexWebQuestions

Explore at:

290 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Dataset updated

Mar 17, 2023

Dataset provided by

艾伦人工智能研究院http://allenai.org/

Description

ComplexWebQuestions is a dataset for answering complex questions that require reasoning over multiple web snippets. It contains a large set of complex questions in natural language, and can be used in multiple ways: 1) By interacting with a search engine, which is the focus of our paper (Talmor and Berant, 2018); 2) As a reading comprehension task: we release 12,725,989 web snippets that are relevant for the questions, and were collected during the development of our model; 3) As a semantic parsing task: each question is paired with a SPARQL query that can be executed against Freebase to retrieve the answer.

Clear search

Close search

Google apps

Main menu

ComplexWebQuestions

web_questions

WebQuestions Dataset

spoken-web-questions

ComplexWebQuestions Dataset

speech-web-questions

Question Answering Data

questions Dataset

Multilingual Question Answering over Linked Data: QALD-5 Dataset

English Closed Ended Question Answer Text Dataset

What’s Included

Data from: Semantic Parameter Matching in Web APIs with Transformer-based...

French Closed Ended Question Answer Text Dataset

What’s Included

text_no-replay-14_spoken-web-questions

Research data supporting "Question Answering System for Chemistry -- a...

LC-QuAD 1.0 German Version

Data from: RESTBERTa: A Transformer-based Question Answering Approach for...

Full Annotated LC QuAD dataset

Italian Closed Ended Question Answer Text Dataset

What’s Included

ELI5 Dataset

text_llama-origin_spoken-web-questions

ComplexWebQuestions

OpenDataLab/ComplexWebQuestions