100+ datasets found

h
web_questions
huggingface.co
opendatalab.com
+2more
Updated Jun 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford NLP (2024). web_questions [Dataset]. https://huggingface.co/datasets/stanfordnlp/web_questions
Explore at:
Dataset updated
Jun 3, 2024
Dataset authored and provided by
Stanford NLP
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
Dataset Card for "web_questions"

Dataset Summary

This dataset consists of 6,642 question/answer pairs. The questions are supposed to be answerable by Freebase, a large knowledge graph. The questions are mostly centered around a single named entity. The questions are popular ones asked on the web (at least in 2013).

Supported Tasks and Leaderboards

More Information Needed

Languages

More Information Needed

Dataset Structure Data… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/web_questions.
ComplexWebQuestions
opendatalab.com
huggingface.co
zip
Updated Mar 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allen Institute for Artificial Intelligence (2023). ComplexWebQuestions [Dataset]. https://opendatalab.com/OpenDataLab/ComplexWebQuestions
Explore at:
zipAvailable download formats
Dataset updated
Mar 17, 2023
Dataset provided by
艾伦人工智能研究院http://allenai.org/
Description
ComplexWebQuestions is a dataset for answering complex questions that require reasoning over multiple web snippets. It contains a large set of complex questions in natural language, and can be used in multiple ways: 1) By interacting with a search engine, which is the focus of our paper (Talmor and Berant, 2018); 2) As a reading comprehension task: we release 12,725,989 web snippets that are relevant for the questions, and were collected during the development of our model; 3) As a semantic parsing task: each question is paired with a SPARQL query that can be executed against Freebase to retrieve the answer.
t
WebQuestions - Dataset - LDM
service.tib.eu
Updated Dec 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). WebQuestions - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/webquestions
Explore at:
Dataset updated
Dec 16, 2024
Description
The task of Question Answering over Linked Data (QALD) has received increased attention over the last years (see the surveys [14] and [36]). The task consists in mapping natural language questions into an executable form, e.g. a SPARQL query in particular, that allows to retrieve answers to the question from a given knowledge base.
h
spoken-web-questions
huggingface.co
Updated Sep 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ultravox.ai (2024). spoken-web-questions [Dataset]. https://huggingface.co/datasets/fixie-ai/spoken-web-questions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 20, 2024
Dataset provided by
Ultravox.ai
Description
fixie-ai/spoken-web-questions dataset hosted on Hugging Face and contributed by the HF Datasets community
h
speech-web-questions
huggingface.co
Updated Jan 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shi Qundong (2025). speech-web-questions [Dataset]. https://huggingface.co/datasets/TwinkStart/speech-web-questions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 14, 2025
Authors
Shi Qundong
Description
This dataset only contains test data, which is integrated into UltraEval-Audio(https://github.com/OpenBMB/UltraEval-Audio) framework.

python audio_evals/main.py --dataset speech-web-questions --model gpt4o_speech

🚀超凡体验，尽在UltraEval-Audio🚀

UltraEval-Audio——全球首个同时支持语音理解和语音生成评估的开源框架，专为语音大模型评估打造，集合了34项权威Benchmark，覆盖语音、声音、医疗及音乐四大领域，支持十种语言，涵盖十二类任务。选择UltraEval-Audio，您将体验到前所未有的便捷与高效：

一键式基准管理 📥：告别繁琐的手动下载与数据处理，UltraEval-Audio为您自动化完成这一切，轻松获取所需基准测试数据。内置评估利器… See the full description on the dataset page: https://huggingface.co/datasets/TwinkStart/speech-web-questions.
t
Yih, W.-t., Richardson, M., Meek, C., Chang, M.-W., Suh, J. (2025). Dataset:...
service.tib.eu
resodate.org
Updated Jan 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Yih, W.-t., Richardson, M., Meek, C., Chang, M.-W., Suh, J. (2025). Dataset: WebQuestions dataset for Google Suggest. https://doi.org/10.57702/7u5sfzs6 [Dataset]. https://service.tib.eu/ldmservice/dataset/webquestions-dataset-for-google-suggest
Explore at:
Dataset updated
Jan 2, 2025
Description
The WebQuestions dataset contains questions answerable using Google Suggest as the knowledge graph.
h
inference_slm_spoken-web-questions
huggingface.co
Updated Jul 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hsiao chiyuan (2025). inference_slm_spoken-web-questions [Dataset]. https://huggingface.co/datasets/chiyuanhsiao/inference_slm_spoken-web-questions
Explore at:
Dataset updated
Jul 1, 2025
Authors
hsiao chiyuan
Description
chiyuanhsiao/inference_slm_spoken-web-questions dataset hosted on Hugging Face and contributed by the HF Datasets community
Natural Questions Dataset
kaggle.com
zip
Updated Mar 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fujoos (2024). Natural Questions Dataset [Dataset]. https://www.kaggle.com/datasets/frankossai/natural-questions-dataset
Explore at:
zip(116502047 bytes)Available download formats
Dataset updated
Mar 15, 2024
Authors
fujoos
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Context

The Natural Questions (NQ) dataset is a comprehensive collection of real user queries submitted to Google Search, with answers sourced from Wikipedia by expert annotators. Created by Google AI Research, this dataset aims to support the development and evaluation of advanced automated question-answering systems. The version provided here includes 89,312 meticulously annotated entries, tailored for ease of access and utility in natural language processing (NLP) and machine learning (ML) research.

Data Collection

The dataset is composed of authentic search queries from Google Search, reflecting the wide range of information sought by users globally. This approach ensures a realistic and diverse set of questions for NLP applications.

Data Pre-processing

The NQ dataset underwent significant pre-processing to prepare it for NLP tasks: - Removal of web-specific elements like URLs, hashtags, user mentions, and special characters using Python's "BeautifulSoup" and "regex" libraries. - Grammatical error identification and correction using the "LanguageTool" library, an open-source grammar, style, and spell checker.

These steps were taken to clean and simplify the text while retaining the essence of the questions and their answers, divided into 'questions', 'long answers', and 'short answers'.

Data Storage

The unprocessed data, including answers with embedded HTML, empty or complex long and short answers, is stored in "Natural-Questions-Base.csv". This version retains the raw structure of the data, featuring HTML elements in answers, and varied answer formats such as tables and lists, providing a comprehensive view for those interested in the original dataset's complexity and richness. The processed data is compiled into a single CSV file named "Natural-Questions-Filtered.csv". The file is structured for easy access and analysis, with each record containing the processed question, a detailed answer, and concise answer snippets.

Filtered Results

The filtered version is available where specific criteria, such as question length or answer complexity, were applied to refine the data further. This version allows for more focused research and application development.

Flask CSV Reader App

The repository at 'https://github.com/fujoos/natural_questions' also includes a Flask-based CSV reader application designed to read and display contents from the "NaturalQuestions.csv" file. The app provides functionalities such as: - Viewing questions and answers directly in your browser. - Filtering results based on criteria like question keywords or answer length. -See the live demo using the csv files converted to slite db at 'https://fujoos.pythonanywhere.com/'
cpa-web-questions.app Website Traffic, Ranking, Analytics [October 2025]
semrush.ebundletools.com
Updated Nov 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semrush (2025). cpa-web-questions.app Website Traffic, Ranking, Analytics [October 2025] [Dataset]. https://semrush.ebundletools.com/website/cpa-web-questions.app/overview/
Explore at:
Dataset updated
Nov 12, 2025
Dataset authored and provided by
Semrushhttps://fr.semrush.com/
License
https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/
Time period covered
Nov 12, 2025
Area covered
Worldwide
Variables measured
visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
Measurement technique
Semrush Traffic Analytics; Click-stream data
Description
cpa-web-questions.app is ranked #13037 in JP with 203.33K Traffic. Categories: . Learn more about website traffic, market share, and more!
h
text_mllama_spoken-web-questions
huggingface.co
Updated Feb 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hsiao chiyuan (2025). text_mllama_spoken-web-questions [Dataset]. https://huggingface.co/datasets/chiyuanhsiao/text_mllama_spoken-web-questions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2025
Authors
hsiao chiyuan
Description
chiyuanhsiao/text_mllama_spoken-web-questions dataset hosted on Hugging Face and contributed by the HF Datasets community
t
Wenhan Xiong, Hong Wang, William Yang Wang (2024). Dataset:...
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Wenhan Xiong, Hong Wang, William Yang Wang (2024). Dataset: NaturalQuestions-Open WebQuestions CuratedTREC. https://doi.org/10.57702/955nlkkc [Dataset]. https://service.tib.eu/ldmservice/dataset/naturalquestions-open-webquestions-curatedtrec
Explore at:
Dataset updated
Dec 16, 2024
Description
Open-domain QA datasets for testing the proposed method
F
Italian Closed Ended Question Answer Text Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Italian Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/italian-closed-ended-question-answer-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
The Italian Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the Italian language, advancing the field of artificial intelligence.
Dataset Content
This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in Italian. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Italian people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Question Diversity
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.
Answer Formats
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.
Data Format and Annotation Details
This fully labeled Italian Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.
Quality and Accuracy
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
The Italian versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.
Continuous Updates and Customization
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Italian Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.
WikiQA (Open-Domain Q&A)
kaggle.com
zip
Updated Nov 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). WikiQA (Open-Domain Q&A) [Dataset]. https://www.kaggle.com/datasets/thedevastator/wikiquestionanswer-a-dataset-for-open-domain-que
Explore at:
zip(1785708 bytes)Available download formats
Dataset updated
Nov 20, 2022
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
WikiQA (Open-Domain Q&A)

Discovering New Knowledge through Question and Sentence Pairs

About this dataset

The WikiQA dataset is a collection of question and sentence pairs, collected and annotated for research on open-domain question answering. The data fields are the same among all splits: question, document title, label. The questions come from different sources, including Wikipedia articles, news articles, and web forums. The sentences come from different sources as well, such as Wikipedia articles, news articles, web forums, and books. The labels indicate whether the answer is supported by the document

How to use the dataset

How to use this dataset

The WikiQA dataset is a collection of question and sentence pairs, collected and annotated for research on open-domain question answering.

The data fields are the same among all splits.

Columns:question, question,document_title, document_title,label, label,question, question,document_title, document_title,label, label

The file test.csv in the WikiQA dataset is a collection of question and sentence pairs used to evaluate the performance of different question answering models

Research Ideas

The WikiQA dataset can be used to train a machine-learning model to answer questions automatically.

The dataset can be used to research the feasibility of open-domain question answering.

The dataset can be used to evaluate the performance of different question answering models

Acknowledgements

This dataset was proposed in WikiQA: A Challenge Dataset for Open-Domain Question Answering by Yang et al. The authors acknowledge the help of Aria Haghighi and Percy Liang in constructing the pairwise sentence similarity features, Wei Ying in providing additional insights about the dataset, Hannah Rashkin for helpful discussions, and Google for providing the computing infrastructure

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: validation.csv | Column name | Description | |:-------------------|:-------------------------------------------------------------------------------| | question | The question that was asked. (String) | | document_title | The title of the Wikipedia article that the question was asked about. (String) | | answer | The answer to the question. (String) | | label | Whether or not the answer is relevant to the question. (String) |

File: train.csv | Column name | Description | |:-------------------|:-------------------------------------------------------------------------------| | question | The question that was asked. (String) | | document_title | The title of the Wikipedia article that the question was asked about. (String) | | answer | The answer to the question. (String) | | label | Whether or not the answer is relevant to the question. (String) |

File: test.csv | Column name | Description | |:-------------------|:-------------------------------------------------------------------------------| | question | The question that was asked. (String) | | document_title | The title of the Wikipedia article that the question was asked about. (String) | | answer | The answer to the question. (String) | | label | Whether or not the answer is relevant to the question. (String) |
h
text_replay_spoken-web-questions
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hsiao chiyuan, text_replay_spoken-web-questions [Dataset]. https://huggingface.co/datasets/chiyuanhsiao/text_replay_spoken-web-questions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
hsiao chiyuan
Description
chiyuanhsiao/text_replay_spoken-web-questions dataset hosted on Hugging Face and contributed by the HF Datasets community
Z
Data from: RESTBERTa: A Transformer-based Question Answering Approach for...
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Jan 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kotstein, Sebastian; Decker, Christian (2024). RESTBERTa: A Transformer-based Question Answering Approach for Semantic Search in Web API Documentation [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_8349083
Explore at:
Dataset updated
Jan 18, 2024
Dataset provided by
Reutlingen University, Germany
Authors
Kotstein, Sebastian; Decker, Christian
Description
This repository contains the datasets and evaluation results of our study. For a detailed overview regarding the provided materials, please refer to README.md.
F
Hindi Closed Ended Question Answer Text Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Hindi Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/hindi-closed-ended-question-answer-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
The Hindi Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the Hindi language, advancing the field of artificial intelligence.
Dataset Content
This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in Hindi. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Hindi people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Question Diversity
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.
Answer Formats
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.
Data Format and Annotation Details
This fully labeled Hindi Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.
Quality and Accuracy
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
The Hindi versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.
Continuous Updates and Customization
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Hindi Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.
c
Research data supporting "Question Answering System for Chemistry -- a...
repository.cam.ac.uk
zip
Updated Jun 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhou, Xiaochi; Nurkowski, Daniel; Menon, Angiras; Akroyd, Jethro; Mosbach, Sebastian; Kraft, Markus (2022). Research data supporting "Question Answering System for Chemistry -- a semantic agent extension" [Dataset]. http://doi.org/10.17863/CAM.78870
Explore at:
zip(18491 bytes)Available download formats
Unique identifier
https://doi.org/10.17863/CAM.78870
Dataset updated
Jun 7, 2022
Dataset provided by
University of Cambridge
Apollo
Authors
Zhou, Xiaochi; Nurkowski, Daniel; Menon, Angiras; Akroyd, Jethro; Mosbach, Sebastian; Kraft, Markus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the evaluation set of the questions and the detailed responses to those evaluation questions.
g
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
hotpotqa.github.io
explagraphs.github.io
+1more
json
Updated Jun 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carnegie Mellon University, Stanford University, Université de Montréal (2024). HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering [Dataset]. https://hotpotqa.github.io/
Explore at:
jsonAvailable download formats
Dataset updated
Jun 25, 2024
Dataset authored and provided by
Carnegie Mellon University, Stanford University, Université de Montréal
Description
HotpotQA is a question answering dataset featuring natural, multi-hop questions, with strong supervision for supporting facts to enable more explainable question answering systems built based on Wikipedia.
d
Dataset for: Same Question, Different Answers? An Empirical Comparison of...
demo-b2find.dkrz.de
Updated Sep 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Dataset for: Same Question, Different Answers? An Empirical Comparison of Web Data and Traditional Data - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/bc684dad-c657-5013-b2d4-cc35b4a2e7ee
Explore at:
Dataset updated
Sep 22, 2025
Description
Psychological scientists increasingly study web data, such as user ratings or social media postings. However, whether research relying on such web data leads to the same conclusions as research based on traditional data is largely unknown. To test this, we (re)analyzed three datasets, thereby comparing web data with lab and online survey data. We calculated correlations across these different datasets (Study 1) and investigated identical, illustrative research questions in each dataset (Studies 2 to 4). Our results suggest that web and traditional data are not fundamentally different and usually lead to similar conclusions, but also that it is important to consider differences between data types such as populations and research settings. Web data can be a valuable tool for psychologists when accounting for such differences, as it allows for testing established research findings in new contexts, complementing them with insights from novel data sources.
QALD-9-Plus
figshare.com
opendatalab.com
txt
Updated Dec 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aleksandr Perevalov; Andreas Both; Dennis Diefenbach; Ricardo Usbeck (2021). QALD-9-Plus [Dataset]. http://doi.org/10.6084/m9.figshare.16864273.v7
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.16864273.v7
Dataset updated
Dec 21, 2021
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Aleksandr Perevalov; Andreas Both; Dennis Diefenbach; Ricardo Usbeck
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
QALD-9-Plus is the dataset for Knowledge Graph Question Answering (KGQA) based on well-known QALD-9.QALD-9-Plus enables to train and test KGQA systems over DBpedia and Wikidata using questions in 8 different languages.Some of the questions have several alternative writings in particular languages which enables to evaluate the robustness of KGQA systems and train paraphrasing models.As the questions' translations were provided by native speakers, they are considered as "gold standard", therefore, machine translation tools can be trained and evaluated on the dataset.Please, see also the GitHub repository: https://github.com/Perevalov/qald_9_plus

Facebook

Twitter

Click to copy link

Link copied

Cite

Stanford NLP (2024). web_questions [Dataset]. https://huggingface.co/datasets/stanfordnlp/web_questions

web_questions

WebQuestions

stanfordnlp/web_questions

Explore at:

48 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jun 3, 2024

Dataset authored and provided by

Stanford NLP

License

https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

Description

Dataset Card for "web_questions"

  Dataset Summary

This dataset consists of 6,642 question/answer pairs. The questions are supposed to be answerable by Freebase, a large knowledge graph. The questions are mostly centered around a single named entity. The questions are popular ones asked on the web (at least in 2013).

  Supported Tasks and Leaderboards

More Information Needed

  Languages

More Information Needed

  Dataset Structure





  Data… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/web_questions.

Clear search

Close search

Google apps

Main menu

web_questions

ComplexWebQuestions

WebQuestions - Dataset - LDM

spoken-web-questions

speech-web-questions

Yih, W.-t., Richardson, M., Meek, C., Chang, M.-W., Suh, J. (2025). Dataset:...

inference_slm_spoken-web-questions

Natural Questions Dataset

Context

Data Collection

Data Pre-processing

Data Storage

Filtered Results

Flask CSV Reader App

cpa-web-questions.app Website Traffic, Ranking, Analytics [October 2025]

text_mllama_spoken-web-questions

Wenhan Xiong, Hong Wang, William Yang Wang (2024). Dataset:...

Italian Closed Ended Question Answer Text Dataset

Dataset Content

Question Diversity

Answer Formats

Data Format and Annotation Details

Quality and Accuracy

Continuous Updates and Customization

License:

WikiQA (Open-Domain Q&A)

WikiQA (Open-Domain Q&A)

Discovering New Knowledge through Question and Sentence Pairs

About this dataset

How to use the dataset

How to use this dataset

Research Ideas

Acknowledgements

License

Columns

text_replay_spoken-web-questions

Data from: RESTBERTa: A Transformer-based Question Answering Approach for...

Hindi Closed Ended Question Answer Text Dataset

Dataset Content

Question Diversity

Answer Formats

Data Format and Annotation Details

Quality and Accuracy

Continuous Updates and Customization

License:

Research data supporting "Question Answering System for Chemistry -- a...

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Dataset for: Same Question, Different Answers? An Empirical Comparison of...

QALD-9-Plus

web_questions

WebQuestions

stanfordnlp/web_questions