100+ datasets found
  1. h

    web_questions

    • huggingface.co
    • opendatalab.com
    • +2more
    Updated Jun 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford NLP (2024). web_questions [Dataset]. https://huggingface.co/datasets/stanfordnlp/web_questions
    Explore at:
    Dataset updated
    Jun 3, 2024
    Dataset authored and provided by
    Stanford NLP
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for "web_questions"

      Dataset Summary
    

    This dataset consists of 6,642 question/answer pairs. The questions are supposed to be answerable by Freebase, a large knowledge graph. The questions are mostly centered around a single named entity. The questions are popular ones asked on the web (at least in 2013).

      Supported Tasks and Leaderboards
    

    More Information Needed

      Languages
    

    More Information Needed

      Dataset Structure
    
    
    
    
    
      Data… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/web_questions.
    
  2. ComplexWebQuestions

    • opendatalab.com
    • huggingface.co
    zip
    Updated Mar 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allen Institute for Artificial Intelligence (2023). ComplexWebQuestions [Dataset]. https://opendatalab.com/OpenDataLab/ComplexWebQuestions
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 17, 2023
    Dataset provided by
    艾伦人工智能研究院http://allenai.org/
    Description

    ComplexWebQuestions is a dataset for answering complex questions that require reasoning over multiple web snippets. It contains a large set of complex questions in natural language, and can be used in multiple ways: 1) By interacting with a search engine, which is the focus of our paper (Talmor and Berant, 2018); 2) As a reading comprehension task: we release 12,725,989 web snippets that are relevant for the questions, and were collected during the development of our model; 3) As a semantic parsing task: each question is paired with a SPARQL query that can be executed against Freebase to retrieve the answer.

  3. t

    WebQuestions - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). WebQuestions - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/webquestions
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The task of Question Answering over Linked Data (QALD) has received increased attention over the last years (see the surveys [14] and [36]). The task consists in mapping natural language questions into an executable form, e.g. a SPARQL query in particular, that allows to retrieve answers to the question from a given knowledge base.

  4. h

    spoken-web-questions

    • huggingface.co
    Updated Sep 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ultravox.ai (2024). spoken-web-questions [Dataset]. https://huggingface.co/datasets/fixie-ai/spoken-web-questions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2024
    Dataset provided by
    Ultravox.ai
    Description

    fixie-ai/spoken-web-questions dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    speech-web-questions

    • huggingface.co
    Updated Jan 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shi Qundong (2025). speech-web-questions [Dataset]. https://huggingface.co/datasets/TwinkStart/speech-web-questions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 14, 2025
    Authors
    Shi Qundong
    Description

    This dataset only contains test data, which is integrated into UltraEval-Audio(https://github.com/OpenBMB/UltraEval-Audio) framework.

    python audio_evals/main.py --dataset speech-web-questions --model gpt4o_speech

      🚀超凡体验,尽在UltraEval-Audio🚀
    

    UltraEval-Audio——全球首个同时支持语音理解和语音生成评估的开源框架,专为语音大模型评估打造,集合了34项权威Benchmark,覆盖语音、声音、医疗及音乐四大领域,支持十种语言,涵盖十二类任务。选择UltraEval-Audio,您将体验到前所未有的便捷与高效:

    一键式基准管理 📥:告别繁琐的手动下载与数据处理,UltraEval-Audio为您自动化完成这一切,轻松获取所需基准测试数据。 内置评估利器… See the full description on the dataset page: https://huggingface.co/datasets/TwinkStart/speech-web-questions.

  6. t

    Yih, W.-t., Richardson, M., Meek, C., Chang, M.-W., Suh, J. (2025). Dataset:...

    • service.tib.eu
    • resodate.org
    Updated Jan 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Yih, W.-t., Richardson, M., Meek, C., Chang, M.-W., Suh, J. (2025). Dataset: WebQuestions dataset for Google Suggest. https://doi.org/10.57702/7u5sfzs6 [Dataset]. https://service.tib.eu/ldmservice/dataset/webquestions-dataset-for-google-suggest
    Explore at:
    Dataset updated
    Jan 2, 2025
    Description

    The WebQuestions dataset contains questions answerable using Google Suggest as the knowledge graph.

  7. h

    inference_slm_spoken-web-questions

    • huggingface.co
    Updated Jul 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hsiao chiyuan (2025). inference_slm_spoken-web-questions [Dataset]. https://huggingface.co/datasets/chiyuanhsiao/inference_slm_spoken-web-questions
    Explore at:
    Dataset updated
    Jul 1, 2025
    Authors
    hsiao chiyuan
    Description

    chiyuanhsiao/inference_slm_spoken-web-questions dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. Natural Questions Dataset

    • kaggle.com
    zip
    Updated Mar 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fujoos (2024). Natural Questions Dataset [Dataset]. https://www.kaggle.com/datasets/frankossai/natural-questions-dataset
    Explore at:
    zip(116502047 bytes)Available download formats
    Dataset updated
    Mar 15, 2024
    Authors
    fujoos
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Context

    The Natural Questions (NQ) dataset is a comprehensive collection of real user queries submitted to Google Search, with answers sourced from Wikipedia by expert annotators. Created by Google AI Research, this dataset aims to support the development and evaluation of advanced automated question-answering systems. The version provided here includes 89,312 meticulously annotated entries, tailored for ease of access and utility in natural language processing (NLP) and machine learning (ML) research.

    Data Collection

    The dataset is composed of authentic search queries from Google Search, reflecting the wide range of information sought by users globally. This approach ensures a realistic and diverse set of questions for NLP applications.

    Data Pre-processing

    The NQ dataset underwent significant pre-processing to prepare it for NLP tasks: - Removal of web-specific elements like URLs, hashtags, user mentions, and special characters using Python's "BeautifulSoup" and "regex" libraries. - Grammatical error identification and correction using the "LanguageTool" library, an open-source grammar, style, and spell checker.

    These steps were taken to clean and simplify the text while retaining the essence of the questions and their answers, divided into 'questions', 'long answers', and 'short answers'.

    Data Storage

    The unprocessed data, including answers with embedded HTML, empty or complex long and short answers, is stored in "Natural-Questions-Base.csv". This version retains the raw structure of the data, featuring HTML elements in answers, and varied answer formats such as tables and lists, providing a comprehensive view for those interested in the original dataset's complexity and richness. The processed data is compiled into a single CSV file named "Natural-Questions-Filtered.csv". The file is structured for easy access and analysis, with each record containing the processed question, a detailed answer, and concise answer snippets.

    Filtered Results

    The filtered version is available where specific criteria, such as question length or answer complexity, were applied to refine the data further. This version allows for more focused research and application development.

    Flask CSV Reader App

    The repository at 'https://github.com/fujoos/natural_questions' also includes a Flask-based CSV reader application designed to read and display contents from the "NaturalQuestions.csv" file. The app provides functionalities such as: - Viewing questions and answers directly in your browser. - Filtering results based on criteria like question keywords or answer length. -See the live demo using the csv files converted to slite db at 'https://fujoos.pythonanywhere.com/'

  9. cpa-web-questions.app Website Traffic, Ranking, Analytics [October 2025]

    • semrush.ebundletools.com
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). cpa-web-questions.app Website Traffic, Ranking, Analytics [October 2025] [Dataset]. https://semrush.ebundletools.com/website/cpa-web-questions.app/overview/
    Explore at:
    Dataset updated
    Nov 12, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Nov 12, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    cpa-web-questions.app is ranked #13037 in JP with 203.33K Traffic. Categories: . Learn more about website traffic, market share, and more!

  10. h

    text_mllama_spoken-web-questions

    • huggingface.co
    Updated Feb 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hsiao chiyuan (2025). text_mllama_spoken-web-questions [Dataset]. https://huggingface.co/datasets/chiyuanhsiao/text_mllama_spoken-web-questions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2025
    Authors
    hsiao chiyuan
    Description

    chiyuanhsiao/text_mllama_spoken-web-questions dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. t

    Wenhan Xiong, Hong Wang, William Yang Wang (2024). Dataset:...

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Wenhan Xiong, Hong Wang, William Yang Wang (2024). Dataset: NaturalQuestions-Open WebQuestions CuratedTREC. https://doi.org/10.57702/955nlkkc [Dataset]. https://service.tib.eu/ldmservice/dataset/naturalquestions-open-webquestions-curatedtrec
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    Open-domain QA datasets for testing the proposed method

  12. F

    Italian Closed Ended Question Answer Text Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Italian Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/italian-closed-ended-question-answer-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    The Italian Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the Italian language, advancing the field of artificial intelligence.

    Dataset Content

    This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in Italian. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.

    Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Italian people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.

    This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.

    Question Diversity

    To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.

    Answer Formats

    To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.

    Data Format and Annotation Details

    This fully labeled Italian Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.

    Quality and Accuracy

    The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.

    The Italian versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.

    Continuous Updates and Customization

    The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.

    License:

    The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Italian Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.

  13. WikiQA (Open-Domain Q&A)

    • kaggle.com
    zip
    Updated Nov 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). WikiQA (Open-Domain Q&A) [Dataset]. https://www.kaggle.com/datasets/thedevastator/wikiquestionanswer-a-dataset-for-open-domain-que
    Explore at:
    zip(1785708 bytes)Available download formats
    Dataset updated
    Nov 20, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    WikiQA (Open-Domain Q&A)

    Discovering New Knowledge through Question and Sentence Pairs

    About this dataset

    The WikiQA dataset is a collection of question and sentence pairs, collected and annotated for research on open-domain question answering. The data fields are the same among all splits: question, document title, label. The questions come from different sources, including Wikipedia articles, news articles, and web forums. The sentences come from different sources as well, such as Wikipedia articles, news articles, web forums, and books. The labels indicate whether the answer is supported by the document

    How to use the dataset

    How to use this dataset

    1. The WikiQA dataset is a collection of question and sentence pairs, collected and annotated for research on open-domain question answering.
    2. The data fields are the same among all splits.
    3. Columns:question, question,document_title, document_title,label, label,question, question,document_title, document_title,label, label
    4. The file test.csv in the WikiQA dataset is a collection of question and sentence pairs used to evaluate the performance of different question answering models

    Research Ideas

    • The WikiQA dataset can be used to train a machine-learning model to answer questions automatically.
      • The dataset can be used to research the feasibility of open-domain question answering.
      • The dataset can be used to evaluate the performance of different question answering models

    Acknowledgements

    This dataset was proposed in WikiQA: A Challenge Dataset for Open-Domain Question Answering by Yang et al. The authors acknowledge the help of Aria Haghighi and Percy Liang in constructing the pairwise sentence similarity features, Wei Ying in providing additional insights about the dataset, Hannah Rashkin for helpful discussions, and Google for providing the computing infrastructure

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: validation.csv | Column name | Description | |:-------------------|:-------------------------------------------------------------------------------| | question | The question that was asked. (String) | | document_title | The title of the Wikipedia article that the question was asked about. (String) | | answer | The answer to the question. (String) | | label | Whether or not the answer is relevant to the question. (String) |

    File: train.csv | Column name | Description | |:-------------------|:-------------------------------------------------------------------------------| | question | The question that was asked. (String) | | document_title | The title of the Wikipedia article that the question was asked about. (String) | | answer | The answer to the question. (String) | | label | Whether or not the answer is relevant to the question. (String) |

    File: test.csv | Column name | Description | |:-------------------|:-------------------------------------------------------------------------------| | question | The question that was asked. (String) | | document_title | The title of the Wikipedia article that the question was asked about. (String) | | answer | The answer to the question. (String) | | label | Whether or not the answer is relevant to the question. (String) |

  14. h

    text_replay_spoken-web-questions

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hsiao chiyuan, text_replay_spoken-web-questions [Dataset]. https://huggingface.co/datasets/chiyuanhsiao/text_replay_spoken-web-questions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    hsiao chiyuan
    Description

    chiyuanhsiao/text_replay_spoken-web-questions dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. Z

    Data from: RESTBERTa: A Transformer-based Question Answering Approach for...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kotstein, Sebastian; Decker, Christian (2024). RESTBERTa: A Transformer-based Question Answering Approach for Semantic Search in Web API Documentation [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_8349083
    Explore at:
    Dataset updated
    Jan 18, 2024
    Dataset provided by
    Reutlingen University, Germany
    Authors
    Kotstein, Sebastian; Decker, Christian
    Description

    This repository contains the datasets and evaluation results of our study. For a detailed overview regarding the provided materials, please refer to README.md.

  16. F

    Hindi Closed Ended Question Answer Text Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Hindi Closed Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/hindi-closed-ended-question-answer-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    The Hindi Closed-Ended Question Answering Dataset is a meticulously curated collection of 5000 comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and question-answering models in the Hindi language, advancing the field of artificial intelligence.

    Dataset Content

    This closed-ended QA dataset comprises a diverse set of context paragraphs and questions paired with corresponding answers in Hindi. There is a context paragraph given for each question to get the answer from. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.

    Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Hindi people, and references were taken from diverse sources like books, news articles, websites, web forums, and other reliable references.

    This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.

    Question Diversity

    To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. The QA dataset also contains questions with constraints, which makes it even more useful for LLM training.

    Answer Formats

    To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraphs types of answers. The answers contain text strings, numerical values, date and time formats as well. Such diversity strengthens the language model's ability to generate coherent and contextually appropriate answers.

    Data Format and Annotation Details

    This fully labeled Hindi Closed-Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as a unique id, context paragraph, context reference link, question, question type, question complexity, question category, domain, prompt type, answer, answer type, and rich text presence.

    Quality and Accuracy

    The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.

    The Hindi versions is grammatically accurate without any spelling or grammatical errors. No toxic or harmful content is used while building this dataset.

    Continuous Updates and Customization

    The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.

    License:

    The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Hindi Closed-Ended Question Answer Dataset to enhance the language understanding capabilities of their generative AI models, improve response generation, and explore new approaches to NLP question-answering tasks.

  17. c

    Research data supporting "Question Answering System for Chemistry -- a...

    • repository.cam.ac.uk
    zip
    Updated Jun 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhou, Xiaochi; Nurkowski, Daniel; Menon, Angiras; Akroyd, Jethro; Mosbach, Sebastian; Kraft, Markus (2022). Research data supporting "Question Answering System for Chemistry -- a semantic agent extension" [Dataset]. http://doi.org/10.17863/CAM.78870
    Explore at:
    zip(18491 bytes)Available download formats
    Dataset updated
    Jun 7, 2022
    Dataset provided by
    University of Cambridge
    Apollo
    Authors
    Zhou, Xiaochi; Nurkowski, Daniel; Menon, Angiras; Akroyd, Jethro; Mosbach, Sebastian; Kraft, Markus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the evaluation set of the questions and the detailed responses to those evaluation questions.

  18. g

    HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

    • hotpotqa.github.io
    • explagraphs.github.io
    • +1more
    json
    Updated Jun 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carnegie Mellon University, Stanford University, Université de Montréal (2024). HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering [Dataset]. https://hotpotqa.github.io/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset authored and provided by
    Carnegie Mellon University, Stanford University, Université de Montréal
    Description

    HotpotQA is a question answering dataset featuring natural, multi-hop questions, with strong supervision for supporting facts to enable more explainable question answering systems built based on Wikipedia.

  19. d

    Dataset for: Same Question, Different Answers? An Empirical Comparison of...

    • demo-b2find.dkrz.de
    Updated Sep 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Dataset for: Same Question, Different Answers? An Empirical Comparison of Web Data and Traditional Data - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/bc684dad-c657-5013-b2d4-cc35b4a2e7ee
    Explore at:
    Dataset updated
    Sep 22, 2025
    Description

    Psychological scientists increasingly study web data, such as user ratings or social media postings. However, whether research relying on such web data leads to the same conclusions as research based on traditional data is largely unknown. To test this, we (re)analyzed three datasets, thereby comparing web data with lab and online survey data. We calculated correlations across these different datasets (Study 1) and investigated identical, illustrative research questions in each dataset (Studies 2 to 4). Our results suggest that web and traditional data are not fundamentally different and usually lead to similar conclusions, but also that it is important to consider differences between data types such as populations and research settings. Web data can be a valuable tool for psychologists when accounting for such differences, as it allows for testing established research findings in new contexts, complementing them with insights from novel data sources.

  20. QALD-9-Plus

    • figshare.com
    • opendatalab.com
    txt
    Updated Dec 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksandr Perevalov; Andreas Both; Dennis Diefenbach; Ricardo Usbeck (2021). QALD-9-Plus [Dataset]. http://doi.org/10.6084/m9.figshare.16864273.v7
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 21, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Aleksandr Perevalov; Andreas Both; Dennis Diefenbach; Ricardo Usbeck
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    QALD-9-Plus is the dataset for Knowledge Graph Question Answering (KGQA) based on well-known QALD-9.QALD-9-Plus enables to train and test KGQA systems over DBpedia and Wikidata using questions in 8 different languages.Some of the questions have several alternative writings in particular languages which enables to evaluate the robustness of KGQA systems and train paraphrasing models.As the questions' translations were provided by native speakers, they are considered as "gold standard", therefore, machine translation tools can be trained and evaluated on the dataset.Please, see also the GitHub repository: https://github.com/Perevalov/qald_9_plus

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Stanford NLP (2024). web_questions [Dataset]. https://huggingface.co/datasets/stanfordnlp/web_questions

web_questions

WebQuestions

stanfordnlp/web_questions

Explore at:
48 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 3, 2024
Dataset authored and provided by
Stanford NLP
License

https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

Description

Dataset Card for "web_questions"

  Dataset Summary

This dataset consists of 6,642 question/answer pairs. The questions are supposed to be answerable by Freebase, a large knowledge graph. The questions are mostly centered around a single named entity. The questions are popular ones asked on the web (at least in 2013).

  Supported Tasks and Leaderboards

More Information Needed

  Languages

More Information Needed

  Dataset Structure





  Data… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/web_questions.
Search
Clear search
Close search
Google apps
Main menu