WebQA, is a new benchmark for multimodal multihop reasoning in which systems are presented with the same style of data as humans when searching the web: Snippets and Images. The system must then identify which information is relevant across modalities and combine it with reasoning to answer the query. Systems will be evaluated on both the correctness of their answers and their sources.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
suolyer/webqa dataset hosted on Hugging Face and contributed by the HF Datasets community
TreezzZ/WebQA dataset hosted on Hugging Face and contributed by the HF Datasets community
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Haofei Yu
Released under CC0: Public Domain
Dataset Summary
The Turku WebQA dataset is a Finnish Question-Answer dataset that has been extracted from different CommonCrawl sources (Parsebank, mC4-Fi, CC-Fi). The dataset has 237,000 question-answer pairs (altogether 290,000 questions, but not all have an answer). The questions with no answers can be discarded by taking out the rows with None (null). The codebase as well as the raw data can be found on GitHub. The extracted question-answer pairs include various topics from the… See the full description on the dataset page: https://huggingface.co/datasets/TurkuNLP/Turku-WebQA.
The RetVQA dataset is a large-scale dataset designed for Retrieval-Based Visual Question Answering (RetVQA). RetVQA is a more challenging task than traditional VQA, as it requires models to retrieve relevant images from a pool of images before answering a question. The need for RetVQA stems from the fact that information needed to answer a question may be spread across multiple images.
Here is a detailed summary of the RetVQA dataset:
It is 20 times larger than the closest dataset in this setting, WebQA. It was derived from the Visual Genome dataset, utilising its questions and annotations of images. It has 418K unique questions and 16,205 unique precise answers. The questions are designed to be metadata-independent, meaning they do not rely on information such as captions or tags. The questions are divided into five categories: color shape count object-attributes relation-based.
The dataset includes both binary (yes/no) questions and open-ended questions that require a generative answer. All answers are free-form and fluent, even for binary questions. For example, a binary question may be "Do the rose and sunflower share the same colour?", and a corresponding answer would be "No, the rose and sunflower do not share the same colour". Every question in RetVQA requires reasoning over multiple images to arrive at the answer. This contrasts with datasets like WebQA, where a majority of questions can be answered using a single image. The dataset has, on average, two relevant images and 24.5 irrelevant images per question. This makes it more challenging than datasets like ISVQA, where images are homogeneous and no explicit retrieval is needed.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
WebQA, is a new benchmark for multimodal multihop reasoning in which systems are presented with the same style of data as humans when searching the web: Snippets and Images. The system must then identify which information is relevant across modalities and combine it with reasoning to answer the query. Systems will be evaluated on both the correctness of their answers and their sources.