100+ datasets found

P
Visual Question Answering Dataset
paperswithcode.com
library.toponeai.link
Updated Nov 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh (2023). Visual Question Answering Dataset [Dataset]. https://paperswithcode.com/dataset/visual-question-answering
Explore at:
Dataset updated
Nov 5, 2023
Authors
Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh
Description
Visual Question Answering (VQA) is a dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. The first version of the dataset was released in October 2015. VQA v2.0 was released in April 2017.
P
Visual Question Answering v2.0 Dataset
paperswithcode.com
Updated Mar 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yash Goyal; Tejas Khot; Douglas Summers-Stay; Dhruv Batra; Devi Parikh (2017). Visual Question Answering v2.0 Dataset [Dataset]. https://paperswithcode.com/dataset/visual-question-answering-v2-0
Explore at:
Dataset updated
Mar 15, 2017
Authors
Yash Goyal; Tejas Khot; Douglas Summers-Stay; Dhruv Batra; Devi Parikh
Description
Visual Question Answering (VQA) v2.0 is a dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. It is the second version of the VQA dataset.

265,016 images (COCO and abstract scenes) At least 3 questions (5.4 questions on average) per image 10 ground truth answers per question 3 plausible (but likely incorrect) answers per question Automatic evaluation metric

The first version of the dataset was released in October 2015.
a
VQA: Visual Question Answering Dataset
academictorrents.com
bittorrent
Updated Oct 8, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanislaw Antol and Aishwarya Agrawal and Jiasen Lu and Margaret Mitchell and Dhruv Batra and C. Lawrence Zitnick and Devi Parikh (2015). VQA: Visual Question Answering Dataset [Dataset]. https://academictorrents.com/details/f075ad12eccbbd665aec68db5d208dc68e7a384f
Explore at:
bittorrent(7984418554)Available download formats
Dataset updated
Oct 8, 2015
Dataset authored and provided by
Stanislaw Antol and Aishwarya Agrawal and Jiasen Lu and Margaret Mitchell and Dhruv Batra and C. Lawrence Zitnick and Devi Parikh
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
A BitTorrent file to download data with the title 'VQA: Visual Question Answering Dataset'
Z
Toloka Visual Question Answering Dataset
data.niaid.nih.gov
Updated Oct 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ustalov, Dmitry (2023). Toloka Visual Question Answering Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7057740
Explore at:
Dataset updated
Oct 10, 2023
Dataset authored and provided by
Ustalov, Dmitry
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Our dataset consists of the images associated with textual questions. One entry (instance) in our dataset is a question-image pair labeled with the ground truth coordinates of a bounding box containing the visual answer to the given question. The images were obtained from a CC BY-licensed subset of the Microsoft Common Objects in Context dataset, MS COCO. All data labeling was performed on the Toloka crowdsourcing platform, https://toloka.ai/.

Our dataset has 45,199 instances split among three subsets: train (38,990 instances), public test (1,705 instances), and private test (4,504 instances). The entire train dataset was available for everyone since the start of the challenge. The public test dataset was available since the evaluation phase of the competition, but without any ground truth labels. After the end of the competition, public and private sets were released.

The datasets will be provided as files in the comma-separated values (CSV) format containing the following columns.

Column Type Description image string URL of an image on a public content delivery network width integer image width height integer image height left integer bounding box coordinate: left top integer bounding box coordinate: top right integer bounding box coordinate: right bottom integer bounding box coordinate: bottom question string question in English

This upload also contains a ZIP file with the images from MS COCO.
u
PitVQA: A Dataset of Visual Question Answering in Pituitary Surgery
rdr.ucl.ac.uk
bin
Updated Sep 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mobarack Islam; Matt Clarkson; Sophia Bano; Danail Stoyanov; Hani Marcus (2024). PitVQA: A Dataset of Visual Question Answering in Pituitary Surgery [Dataset]. http://doi.org/10.5522/04/27004666.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5522/04/27004666.v2
Dataset updated
Sep 18, 2024
Dataset provided by
University College London
Authors
Mobarack Islam; Matt Clarkson; Sophia Bano; Danail Stoyanov; Hani Marcus
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
PitVQA dataset comprises 25 videos of endoscopic pituitary surgeries from the National Hospital of Neurology and Neurosurgery in London, United Kingdom, similar to the dataset used in the MICCAI PitVis challenge. All patients provided informed consent, and the study was registered with the local governance committee. The surgeries were recorded using a high-definition endoscope (Karl Storz Endoscopy) with a resolution of 720p and stored as MP4 files. All videos were annotated for the surgical phases, steps, instruments present and operation notes guided by a standardised annotation framework, which was derived from a preceding international consensus study on pituitary surgery workflow. Annotation was performed collaboratively by 2 neurosurgical residents with operative pituitary experience and checked by an attending neurosurgeon. We extracted image frames from each video at 1 fps and removed any frames that were blurred or occluded. Ultimately, we obtained a total of 109,173 frames, with the videos of minimum and maximum length yielding 2,443 and 7,179 frames, respectively. We acquired frame-wise question-answer pairs for all the categories of the annotation. Overall, there are 884,242 question-answer pairs from 109,173 frames, which is around 8 pairs for each frame. There are 59 classes overall, including 4 phases, 15 steps, 18 instruments, 3 variations of instruments present in a frame, 5 positions of the instruments, and 14 operation notes in the annotation classes. The length of the questions ranges from a minimum of 7 words to a maximum of 12 words.The details description of the original videos can be found at the MICCAI PitVis challenge and the videos can be directly download from UCL HDR portal.
P
ST-VQA Dataset
paperswithcode.com
Updated Feb 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; Ernest Valveny; C. V. Jawahar; Dimosthenis Karatzas (2021). ST-VQA Dataset [Dataset]. https://paperswithcode.com/dataset/st-vqa
Explore at:
Dataset updated
Feb 20, 2021
Authors
Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; Ernest Valveny; C. V. Jawahar; Dimosthenis Karatzas
Description
ST-VQA aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the VQA process.
P
DocCVQA Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rubèn Tito; Dimosthenis Karatzas; Ernest Valveny, DocCVQA Dataset [Dataset]. https://paperswithcode.com/dataset/doccvqa
Explore at:
Authors
Rubèn Tito; Dimosthenis Karatzas; Ernest Valveny
Description
DocCVQA is a Document Visual Question Answering dataset, where the questions are posed over a whole collection of 14,362 scanned documents. Therefore, the task can be seen as a retrieval-style evidence seeking task where given a question, the aim is to identify and retrieve all the documents in a large document collection that are relevant to answering this question as well as provide the answer.
f
Visual question answering
figshare.com
bin
Updated Jan 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Nadian-Ghomsheh (2020). Visual question answering [Dataset]. http://doi.org/10.6084/m9.figshare.11763636.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11763636.v1
Dataset updated
Jan 29, 2020
Dataset provided by
figshare
Authors
Ali Nadian-Ghomsheh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This are the image features extracted from Inception v3 network to be included for solving the VQA problem.
h
vqa-rad
huggingface.co
opendatalab.com
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Flavia Giammarino (2023). vqa-rad [Dataset]. https://huggingface.co/datasets/flaviagiammarino/vqa-rad
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 3, 2023
Authors
Flavia Giammarino
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Dataset Card for VQA-RAD

Dataset Description

VQA-RAD is a dataset of question-answer pairs on radiology images. The dataset is intended to be used for training and testing Medical Visual Question Answering (VQA) systems. The dataset includes both open-ended questions and binary "yes/no" questions. The dataset is built from MedPix, which is a free open-access online database of medical images. The question-answer pairs were manually generated by a team of clinicians.… See the full description on the dataset page: https://huggingface.co/datasets/flaviagiammarino/vqa-rad.
P
MP-DocVQA Dataset
paperswithcode.com
Updated Feb 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rubèn Tito; Dimosthenis Karatzas; Ernest Valveny (2025). MP-DocVQA Dataset [Dataset]. https://paperswithcode.com/dataset/mp-docvqa
Explore at:
Dataset updated
Feb 12, 2025
Authors
Rubèn Tito; Dimosthenis Karatzas; Ernest Valveny
Description
The dataset is aimed to perform Visual Question Answering on multipage industry scanned documents. The questions and answers are reused from Single Page DocVQA (SP-DocVQA) dataset. The images also corresponds to the same in original dataset with previous and posterior pages with a limit of up to 20 pages per document.
t
Image Captioning and Visual Question Answering - Dataset - LDM
service.tib.eu
Updated Dec 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Image Captioning and Visual Question Answering - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/image-captioning-and-visual-question-answering
Explore at:
Dataset updated
Dec 17, 2024
Description
The dataset is used for image captioning and visual question answering.
V
Visual Question Answering Technology Report
archivemarketresearch.com
doc, pdf, ppt
Updated Mar 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Visual Question Answering Technology Report [Dataset]. https://www.archivemarketresearch.com/reports/visual-question-answering-technology-58313
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Mar 15, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Visual Question Answering (VQA) technology market is experiencing robust growth, driven by increasing demand for advanced image analysis and AI-powered solutions across diverse industries. The market, estimated at $2 billion in 2025, is projected to expand at a Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033. This significant growth is fueled by several key factors. The proliferation of big data and the advancements in deep learning algorithms are enabling more accurate and efficient VQA systems. Furthermore, the rising adoption of VQA in sectors such as healthcare (for medical image analysis), retail (for enhanced customer experience), and autonomous vehicles (for scene understanding) is significantly boosting market expansion. The increasing availability of powerful cloud computing resources further facilitates the development and deployment of complex VQA models. While challenges such as data bias and the need for robust annotation techniques remain, the overall market outlook for VQA technology is extremely positive. Segmentation analysis reveals strong growth across various application areas. The software industry currently leads in VQA adoption, followed by the computer and electronics industries. Within the technology itself, image classification and image identification are the dominant segments, indicating a strong focus on practical applications. Geographically, North America and Europe currently hold the largest market shares, but the Asia-Pacific region is expected to witness substantial growth in the coming years, driven by increasing investments in AI and technological advancements in countries like China and India. Key players like Toshiba Corporation, Amazon Science, and Cognex are actively contributing to market growth through continuous innovation and strategic partnerships. The competitive landscape is dynamic, with both established tech giants and emerging startups vying for market share. The long-term outlook suggests that VQA technology will continue to be a critical component of various emerging technologies and will play a pivotal role in shaping the future of artificial intelligence.
V
Visual Question Answering Technology Report
archivemarketresearch.com
doc, pdf, ppt
Updated Feb 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Visual Question Answering Technology Report [Dataset]. https://www.archivemarketresearch.com/reports/visual-question-answering-technology-12874
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Feb 6, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Visual Question Answering (VQA) Technology market is poised for significant growth due to its increasing adoption across various industries. With a market size of XXX million in 2025, the market is projected to grow at a CAGR of XX% during the forecast period 2025-2033. This growth is attributed to the rising demand for automated systems for complex tasks, advancements in artificial intelligence (AI), and the increasing availability of image and video data. VQA technology has applications in the software, computer, and electronic industries, providing solutions for image identification, image classification, and other tasks. Various factors are driving the growth of the VQA technology market. The increasing adoption of AI-powered solutions, the growing need for efficient and accurate image processing, and the rising demand for automated customer service are major factors driving the market. Moreover, the advancements in natural language processing (NLP) and computer vision technologies further enhance the capabilities of VQA systems. However, the availability of limited training data for VQA models and the need for specialized hardware for processing large datasets pose certain challenges to the market's growth. Despite these challenges, the increasing R&D investments by market players and the collaborative efforts to develop standardized datasets are expected to create new growth opportunities in the coming years.
Text VQA
kaggle.com
Updated Mar 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dmytro Kozii (2021). Text VQA [Dataset]. https://www.kaggle.com/dmytruto/textvqa/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 15, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dmytro Kozii
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TextVQA requires models to read and reason about text in an image to answer questions based on them. In order to perform well on this task, models need to first detect and read text in the images. Models then need to reason about this to answer the question. Current state-of-the-art models fail to answer questions in TextVQA because they do not have text reading and reasoning capabilities. See the examples in the image to compare ground truth answers and corresponding predictions by a state-of-the-art model. Challenge link: https://eval.ai/web/challenges/challenge-page/874/
D
Visual Analysis System for Scene-Graph-Based Visual Question Answering
darus.uni-stuttgart.de
Updated Jul 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Noel Schäfer; Pascal Tilli; Tanja Munz-Körner; Sebastian Künzel; Sandeep Vidyapu; Ngoc Thang Vu; Daniel Weiskopf (2023). Visual Analysis System for Scene-Graph-Based Visual Question Answering [Dataset]. http://doi.org/10.18419/DARUS-3589
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.18419/DARUS-3589
Dataset updated
Jul 25, 2023
Dataset provided by
DaRUS
Authors
Noel Schäfer; Pascal Tilli; Tanja Munz-Körner; Sebastian Künzel; Sandeep Vidyapu; Ngoc Thang Vu; Daniel Weiskopf
License
https://spdx.org/licenses/MIT.htmlhttps://spdx.org/licenses/MIT.html
Dataset funded by
DFG
Description
Source code of our visual analysis system to explore scene-graph-based visual question answering. This approach is built on top of the state-of-the-art GraphVQA framework which was trained on the GQA dataset. Instructions on how to use our system can be found in the README.
p
Visual Question Answering evaluation dataset for MIMIC CXR
physionet.org
Updated Jan 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timo Kohlberger; Charles Lau; Tom Pollard; Andrew Sellergren; Atilla Kiraly; Fayaz Jamil (2025). Visual Question Answering evaluation dataset for MIMIC CXR [Dataset]. http://doi.org/10.13026/cvsk-ny21
Explore at:
Unique identifier
https://doi.org/10.13026/cvsk-ny21
Dataset updated
Jan 28, 2025
Authors
Timo Kohlberger; Charles Lau; Tom Pollard; Andrew Sellergren; Atilla Kiraly; Fayaz Jamil
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
MIMIC CXR [1] is a large publicly available dataset of chest radiographs in DICOM format with free-text radiology reports. In addition, labels for the presence of 12 different chest-related pathologies, as well as of any support devices, and overall normal/abnormal status were made available via the MIMIC Chest X-ray JPG (MIMIC-CXR-JPG) [2] labels, which were generated using the CheXpert and NegBio algorithms.

Based on these labels, we created a visual question answering dataset comprising 224 questions for 48 cases from the official test set, and 111 questions for 23 validation cases. A majority (68%) of the questions are close-ended (answerable with yes or no), and focus on the presence of one out of 15 chest pathologies, or any support device, or generically on any abnormality, whereas the remaining open-ended questions inquire about the location, size, severity or type of a pathology/device, if present in the specific case, indicated by the MIMIC-CXR-JPG labels.

For each question and case we also provide a reference answer, which was authored by a board-certified radiologist (with 17 years of post-residency experience) based on the chest X-ray and original radiology report
D
Extended Visual Analysis System for Scene-Graph-Based Visual Question...
darus.uni-stuttgart.de
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Noel Schäfer; Sebastian Künzel; Pascal Tilli; Tanja Munz-Körner; Sandeep Vidyapu; Ngoc Thang Vu; Daniel Weiskopf (2025). Extended Visual Analysis System for Scene-Graph-Based Visual Question Answering [Dataset]. http://doi.org/10.18419/DARUS-3909
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.18419/DARUS-3909
Dataset updated
May 28, 2025
Dataset provided by
DaRUS
Authors
Noel Schäfer; Sebastian Künzel; Pascal Tilli; Tanja Munz-Körner; Sandeep Vidyapu; Ngoc Thang Vu; Daniel Weiskopf
License
https://spdx.org/licenses/MIT.htmlhttps://spdx.org/licenses/MIT.html
Dataset funded by
DFG
Description
Source code of our extended visual analysis system to explore scene-graph-based visual question answering. This approach is built on top of the state-of-the-art GraphVQA framework which was trained on the GQA dataset. Additionally, it is an improved version of our system that can be found here Instructions on how to use our system can be found in the README.
P
VQA-RAD Dataset
paperswithcode.com
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jason J. Lau; Soumya Gayen; Asma Ben Abacha; Dina Demner-Fushman (2023). VQA-RAD Dataset [Dataset]. https://paperswithcode.com/dataset/vqa-rad
Explore at:
Dataset updated
Jun 2, 2023
Authors
Jason J. Lau; Soumya Gayen; Asma Ben Abacha; Dina Demner-Fushman
Description
VQA-RAD consists of 3,515 question–answer pairs on 315 radiology images.
visual-question-answering-checkpoint-downloads
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face OSS Metrics, visual-question-answering-checkpoint-downloads [Dataset]. https://huggingface.co/datasets/open-source-metrics/visual-question-answering-checkpoint-downloads
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face OSS Metrics
Description
open-source-metrics/visual-question-answering-checkpoint-downloads dataset hosted on Hugging Face and contributed by the HF Datasets community
P
OpenViVQA Dataset
paperswithcode.com
Updated May 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nghia Hieu Nguyen; Duong T. D. Vo; Kiet Van Nguyen; Ngan Luu-Thuy Nguyen (2023). OpenViVQA Dataset [Dataset]. https://paperswithcode.com/dataset/openvivqa
Explore at:
Dataset updated
May 9, 2023
Authors
Nghia Hieu Nguyen; Duong T. D. Vo; Kiet Van Nguyen; Ngan Luu-Thuy Nguyen
Description
In recent years, visual question answering (VQA) has attracted attention from the research community because of its highly potential applications (such as virtual assistance on intelligent cars, assistant devices for blind people, or information retrieval from document images using natural language as queries) and challenge. The VQA task requires methods that have the ability to fuse the information from questions and images to produce appropriate answers. Neural visual question answering models have achieved tremendous growth on large-scale datasets which are mostly for resource-rich languages such as English. However, available datasets narrow the VQA task as the answers selection task or answer classification task. We argue that this form of VQA is far from human ability and eliminates the challenge of the answering aspect in the VQA task by just selecting answers rather than generating them. In this paper, we introduce the OpenViVQA (Open-domain Vietnamese Visual Question Answering) dataset, the first large-scale dataset for VQA with open-ended answers in Vietnamese, consists of 11,000+ images associated with 37,000+ question–answer pairs (QAs).

Facebook

Twitter

Click to copy link

Link copied

Cite

Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh (2023). Visual Question Answering Dataset [Dataset]. https://paperswithcode.com/dataset/visual-question-answering

Visual Question Answering Dataset

VQA

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Nov 5, 2023

Authors

Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh

Description

Visual Question Answering (VQA) is a dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. The first version of the dataset was released in October 2015. VQA v2.0 was released in April 2017.

Clear search

Close search

Google apps

Main menu

Visual Question Answering Dataset

Visual Question Answering v2.0 Dataset

VQA: Visual Question Answering Dataset

Toloka Visual Question Answering Dataset

PitVQA: A Dataset of Visual Question Answering in Pituitary Surgery

ST-VQA Dataset

DocCVQA Dataset

Visual question answering

vqa-rad

MP-DocVQA Dataset

Image Captioning and Visual Question Answering - Dataset - LDM

Visual Question Answering Technology Report

Visual Question Answering Technology Report

Text VQA

Visual Analysis System for Scene-Graph-Based Visual Question Answering

Visual Question Answering evaluation dataset for MIMIC CXR

Extended Visual Analysis System for Scene-Graph-Based Visual Question...

VQA-RAD Dataset

visual-question-answering-checkpoint-downloads

OpenViVQA Dataset

Visual Question Answering Dataset

VQA