100+ datasets found

P
Visual Question Answering Dataset
paperswithcode.com
library.toponeai.link
Updated Nov 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh (2023). Visual Question Answering Dataset [Dataset]. https://paperswithcode.com/dataset/visual-question-answering
Explore at:
Dataset updated
Nov 5, 2023
Authors
Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh
Description
Visual Question Answering (VQA) is a dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. The first version of the dataset was released in October 2015. VQA v2.0 was released in April 2017.
P
Visual Question Answering v2.0 Dataset
paperswithcode.com
Updated Mar 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yash Goyal; Tejas Khot; Douglas Summers-Stay; Dhruv Batra; Devi Parikh (2017). Visual Question Answering v2.0 Dataset [Dataset]. https://paperswithcode.com/dataset/visual-question-answering-v2-0
Explore at:
Dataset updated
Mar 15, 2017
Authors
Yash Goyal; Tejas Khot; Douglas Summers-Stay; Dhruv Batra; Devi Parikh
Description
Visual Question Answering (VQA) v2.0 is a dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. It is the second version of the VQA dataset.

265,016 images (COCO and abstract scenes) At least 3 questions (5.4 questions on average) per image 10 ground truth answers per question 3 plausible (but likely incorrect) answers per question Automatic evaluation metric

The first version of the dataset was released in October 2015.
JA-VG-VQA-500
huggingface.co
Updated May 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sakana AI (2024). JA-VG-VQA-500 [Dataset]. https://huggingface.co/datasets/SakanaAI/JA-VG-VQA-500
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 18, 2024
Dataset authored and provided by
Sakana AIhttps://sakana.ai/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
JA-VG-VQA-500

Dataset Description

JA-VG-VQA-500 is a 500-sample subset of Japanese Visual Genome VQA dataset. This dataset was used in the evaluation of EvoVLM-JP-v1-7B. Please refer to our report and blog for more details. We are grateful to the developers for making the dataset available under Creative Commons Attribution 4.0 License.

Visual Genome Japanese Visual Genome VQA dataset

Usage

Use the code below to get started with the dataset. from datasets… See the full description on the dataset page: https://huggingface.co/datasets/SakanaAI/JA-VG-VQA-500.
JA-Multi-Image-VQA
huggingface.co
Updated Aug 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sakana AI (2024). JA-Multi-Image-VQA [Dataset]. https://huggingface.co/datasets/SakanaAI/JA-Multi-Image-VQA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 2, 2024
Dataset authored and provided by
Sakana AIhttps://sakana.ai/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
JA-Multi-Image-VQA

Dataset Description

JA-Multi-Image-VQA is a dataset for evaluating the question answering capabilities on multiple image inputs. We carefully collected a diverse set of 39 images with 55 questions in total. Some images contain Japanese culture and objects in Japan. The Japanese questions and answers were created manually.

Usage

from datasets import load_dataset dataset = load_dataset("SakanaAI/JA-Multi-Image-VQA", split="test")… See the full description on the dataset page: https://huggingface.co/datasets/SakanaAI/JA-Multi-Image-VQA.
a
VQA: Visual Question Answering Dataset
academictorrents.com
bittorrent
Updated Oct 8, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanislaw Antol and Aishwarya Agrawal and Jiasen Lu and Margaret Mitchell and Dhruv Batra and C. Lawrence Zitnick and Devi Parikh (2015). VQA: Visual Question Answering Dataset [Dataset]. https://academictorrents.com/details/f075ad12eccbbd665aec68db5d208dc68e7a384f
Explore at:
bittorrent(7984418554)Available download formats
Dataset updated
Oct 8, 2015
Dataset authored and provided by
Stanislaw Antol and Aishwarya Agrawal and Jiasen Lu and Margaret Mitchell and Dhruv Batra and C. Lawrence Zitnick and Devi Parikh
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
A BitTorrent file to download data with the title 'VQA: Visual Question Answering Dataset'
h
vqa-v1.1
huggingface.co
Updated Apr 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
vqa-v1.1 [Dataset]. https://huggingface.co/datasets/worldcuisines/vqa-v1.1
Explore at:
Dataset updated
Apr 20, 2025
Dataset authored and provided by
World Cuisines
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines

WorldCuisines is a massive-scale visual question answering (VQA) benchmark for multilingual and multicultural understanding through global cuisines. The dataset contains text-image pairs across 30 languages and dialects, spanning 9 language families and featuring over 1 million data points, making it the largest multicultural VQA benchmark as of 17 October 2024.… See the full description on the dataset page: https://huggingface.co/datasets/worldcuisines/vqa-v1.1.
P
VQA-RAD Dataset
paperswithcode.com
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jason J. Lau; Soumya Gayen; Asma Ben Abacha; Dina Demner-Fushman (2023). VQA-RAD Dataset [Dataset]. https://paperswithcode.com/dataset/vqa-rad
Explore at:
Dataset updated
Jun 2, 2023
Authors
Jason J. Lau; Soumya Gayen; Asma Ben Abacha; Dina Demner-Fushman
Description
VQA-RAD consists of 3,515 question–answer pairs on 315 radiology images.
i
ITM-HDR-VQA DATASET
ieee-dataport.org
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
liang zhijie (2024). ITM-HDR-VQA DATASET [Dataset]. https://ieee-dataport.org/documents/itm-hdr-vqa-dataset
Explore at:
Dataset updated
Nov 28, 2024
Authors
liang zhijie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
man-made architecture and natural scenery.
P
OK-VQA Dataset
paperswithcode.com
Updated Oct 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kenneth Marino; Mohammad Rastegari; Ali Farhadi; Roozbeh Mottaghi (2023). OK-VQA Dataset [Dataset]. https://paperswithcode.com/dataset/ok-vqa
Explore at:
Dataset updated
Oct 13, 2023
Authors
Kenneth Marino; Mohammad Rastegari; Ali Farhadi; Roozbeh Mottaghi
Description
Outside Knowledge Visual Question Answering (OK-VQA) includes more than 14,000 questions that require external knowledge to answer.
h
vqa-rad
huggingface.co
opendatalab.com
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Flavia Giammarino (2023). vqa-rad [Dataset]. https://huggingface.co/datasets/flaviagiammarino/vqa-rad
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 3, 2023
Authors
Flavia Giammarino
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Dataset Card for VQA-RAD

Dataset Description

VQA-RAD is a dataset of question-answer pairs on radiology images. The dataset is intended to be used for training and testing Medical Visual Question Answering (VQA) systems. The dataset includes both open-ended questions and binary "yes/no" questions. The dataset is built from MedPix, which is a free open-access online database of medical images. The question-answer pairs were manually generated by a team of clinicians.… See the full description on the dataset page: https://huggingface.co/datasets/flaviagiammarino/vqa-rad.
P
ST-VQA Dataset
paperswithcode.com
Updated Feb 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; Ernest Valveny; C. V. Jawahar; Dimosthenis Karatzas (2021). ST-VQA Dataset [Dataset]. https://paperswithcode.com/dataset/st-vqa
Explore at:
Dataset updated
Feb 20, 2021
Authors
Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; Ernest Valveny; C. V. Jawahar; Dimosthenis Karatzas
Description
ST-VQA aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the VQA process.
u
PitVQA: A Dataset of Visual Question Answering in Pituitary Surgery
rdr.ucl.ac.uk
bin
Updated Sep 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mobarack Islam; Matt Clarkson; Sophia Bano; Danail Stoyanov; Hani Marcus (2024). PitVQA: A Dataset of Visual Question Answering in Pituitary Surgery [Dataset]. http://doi.org/10.5522/04/27004666.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5522/04/27004666.v2
Dataset updated
Sep 18, 2024
Dataset provided by
University College London
Authors
Mobarack Islam; Matt Clarkson; Sophia Bano; Danail Stoyanov; Hani Marcus
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
PitVQA dataset comprises 25 videos of endoscopic pituitary surgeries from the National Hospital of Neurology and Neurosurgery in London, United Kingdom, similar to the dataset used in the MICCAI PitVis challenge. All patients provided informed consent, and the study was registered with the local governance committee. The surgeries were recorded using a high-definition endoscope (Karl Storz Endoscopy) with a resolution of 720p and stored as MP4 files. All videos were annotated for the surgical phases, steps, instruments present and operation notes guided by a standardised annotation framework, which was derived from a preceding international consensus study on pituitary surgery workflow. Annotation was performed collaboratively by 2 neurosurgical residents with operative pituitary experience and checked by an attending neurosurgeon. We extracted image frames from each video at 1 fps and removed any frames that were blurred or occluded. Ultimately, we obtained a total of 109,173 frames, with the videos of minimum and maximum length yielding 2,443 and 7,179 frames, respectively. We acquired frame-wise question-answer pairs for all the categories of the annotation. Overall, there are 884,242 question-answer pairs from 109,173 frames, which is around 8 pairs for each frame. There are 59 classes overall, including 4 phases, 15 steps, 18 instruments, 3 variations of instruments present in a frame, 5 positions of the instruments, and 14 operation notes in the annotation classes. The length of the questions ranges from a minimum of 7 words to a maximum of 12 words.The details description of the original videos can be found at the MICCAI PitVis challenge and the videos can be directly download from UCL HDR portal.
t
VQA 2.0 - Dataset - LDM
service.tib.eu
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). VQA 2.0 - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/vqa-2-0
Explore at:
Dataset updated
Nov 25, 2024
Description
The VQA 2.0 dataset is used for visual question answering task. It consists of three sets with a train set containing 83k images and 444k questions, a validation set containing 41k images and 214k questions, and a test set containing 81k images and 448k questions.
vqa-dataset
kaggle.com
Updated Mar 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Le Trong Hieu (2024). vqa-dataset [Dataset]. https://www.kaggle.com/datasets/backtracking/vqa-dataset/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 16, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Le Trong Hieu
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Le Trong Hieu

Released under Apache 2.0

Contents
D
VQA-MHUG
darus.uni-stuttgart.de
Updated Mar 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ekta Sood; Fabian Kögel; Andreas Bulling (2025). VQA-MHUG [Dataset]. http://doi.org/10.18419/DARUS-4428
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.18419/DARUS-4428
Dataset updated
Mar 3, 2025
Dataset provided by
DaRUS
Authors
Ekta Sood; Fabian Kögel; Andreas Bulling
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
We present VQA-MHUG - a novel 49-participant dataset of multimodal human gaze on both images and questions during visual question answering (VQA), collected using a high-speed eye tracker. To the best of our knowledge, this is the first resource containing multimodal human gaze data over a textual question and the corresponding image. Our corpus encompasses task-specific gaze on a subset of the benchmark dataset VQAv2 val2. Our dataset is unique in that it is the first to provide real human gaze data on both images and corresponding questions and, as such, allows researchers to jointly study human and machine attention. We use our dataset to analyse the similarity between human and neural attentive strategies learned by five state-of-the-art VQA models: Modulated Co-Attention Network (MCAN) with either grid or region features, Pythia, Bilinear Attention Network (BAN), and the Multimodal Factorised Bilinear Pooling Network (MFB). While prior work has focused on studying the image modality, our analyses show - for the first time - that for all models, higher correlation with human attention on text is a significant predictor of VQA performance. This finding points at a potential for improving VQA performance and, at the same time, calls for further research on neural text attention mechanisms and their integration into architectures for vision and language tasks, including but potentially also beyond VQA.
V
Visual Question Answering Technology Report
archivemarketresearch.com
doc, pdf, ppt
Updated Mar 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Visual Question Answering Technology Report [Dataset]. https://www.archivemarketresearch.com/reports/visual-question-answering-technology-58313
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Mar 15, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Visual Question Answering (VQA) technology market is experiencing robust growth, driven by increasing demand for advanced image analysis and AI-powered solutions across diverse industries. The market, estimated at $2 billion in 2025, is projected to expand at a Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033. This significant growth is fueled by several key factors. The proliferation of big data and the advancements in deep learning algorithms are enabling more accurate and efficient VQA systems. Furthermore, the rising adoption of VQA in sectors such as healthcare (for medical image analysis), retail (for enhanced customer experience), and autonomous vehicles (for scene understanding) is significantly boosting market expansion. The increasing availability of powerful cloud computing resources further facilitates the development and deployment of complex VQA models. While challenges such as data bias and the need for robust annotation techniques remain, the overall market outlook for VQA technology is extremely positive. Segmentation analysis reveals strong growth across various application areas. The software industry currently leads in VQA adoption, followed by the computer and electronics industries. Within the technology itself, image classification and image identification are the dominant segments, indicating a strong focus on practical applications. Geographically, North America and Europe currently hold the largest market shares, but the Asia-Pacific region is expected to witness substantial growth in the coming years, driven by increasing investments in AI and technological advancements in countries like China and India. Key players like Toshiba Corporation, Amazon Science, and Cognex are actively contributing to market growth through continuous innovation and strategic partnerships. The competitive landscape is dynamic, with both established tech giants and emerging startups vying for market share. The long-term outlook suggests that VQA technology will continue to be a critical component of various emerging technologies and will play a pivotal role in shaping the future of artificial intelligence.
h
Kvasir-VQA
huggingface.co
paperswithcode.com
Updated May 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SimulaMet HOST Department (2025). Kvasir-VQA [Dataset]. https://huggingface.co/datasets/SimulaMet-HOST/Kvasir-VQA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 23, 2025
Dataset authored and provided by
SimulaMet HOST Department
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The Kvasir-VQA dataset is an extended dataset derived from the HyperKvasir and Kvasir-Instrument datasets, augmented with question-and-answer annotations. This dataset is designed to facilitate advanced machine learning tasks in gastrointestinal (GI) diagnostics, including image captioning, Visual Question Answering (VQA) and text-based generation of synthetic medical images Homepage: https://datasets.simula.no/kvasir-vqa

Usage

You can use the Kvasir-VQA dataset directly from… See the full description on the dataset page: https://huggingface.co/datasets/SimulaMet-HOST/Kvasir-VQA.
P
VQA-E Dataset
paperswithcode.com
opendatalab.com
Updated Nov 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qing Li; Qingyi Tao; Shafiq Joty; Jianfei Cai; Jiebo Luo (2021). VQA-E Dataset [Dataset]. https://paperswithcode.com/dataset/vqa-e
Explore at:
Dataset updated
Nov 17, 2021
Authors
Qing Li; Qingyi Tao; Shafiq Joty; Jianfei Cai; Jiebo Luo
Description
VQA-E is a dataset for Visual Question Answering with Explanation, where the models are required to generate and explanation with the predicted answer. The VQA-E dataset is automatically derived from the VQA v2 dataset by synthesizing a textual explanation for each image-question-answer triple.
vqa-rad-visual-question-answering-radiology
kaggle.com
Updated Oct 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MD Zeeshan Hassan (2023). vqa-rad-visual-question-answering-radiology [Dataset]. https://www.kaggle.com/datasets/mdzeeshanhassan/vqa-rad-visual-question-answering-radiology
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
MD Zeeshan Hassan
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by MD Zeeshan Hassan

Released under Apache 2.0

Contents
S
PathVQA AND VQA-RAD
scidb.cn
Updated Jul 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
张殿元 (2024). PathVQA AND VQA-RAD [Dataset]. http://doi.org/10.57760/sciencedb.18348
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.18348
Dataset updated
Jul 26, 2024
Dataset provided by
Science Data Bank
Authors
张殿元
License
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
Description
PathVQA来源于He, Xuehai, et al. "Pathvqa: 30000+ questions for medical visual question answering." arxiv preprint arxiv:2003.10286 (2020). VQA-RAD来源于 Lau, Jason J., et al. "A dataset of clinically generated visual questions and answers about radiology images." Scientific data 5.1 (2018): 1-10.

Facebook

Twitter

Click to copy link

Link copied

Cite

Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh (2023). Visual Question Answering Dataset [Dataset]. https://paperswithcode.com/dataset/visual-question-answering

Visual Question Answering Dataset

VQA

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Nov 5, 2023

Authors

Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh

Description

Visual Question Answering (VQA) is a dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. The first version of the dataset was released in October 2015. VQA v2.0 was released in April 2017.

Clear search

Close search

Google apps

Main menu

Visual Question Answering Dataset

Visual Question Answering v2.0 Dataset

JA-VG-VQA-500

JA-Multi-Image-VQA

VQA: Visual Question Answering Dataset

vqa-v1.1

VQA-RAD Dataset

ITM-HDR-VQA DATASET

OK-VQA Dataset

vqa-rad

ST-VQA Dataset

PitVQA: A Dataset of Visual Question Answering in Pituitary Surgery

VQA 2.0 - Dataset - LDM

vqa-dataset

Dataset

Contents

VQA-MHUG

Visual Question Answering Technology Report

Kvasir-VQA

VQA-E Dataset

vqa-rad-visual-question-answering-radiology

Dataset

Contents

PathVQA AND VQA-RAD

Visual Question Answering Dataset

VQA