100+ datasets found
  1. P

    Visual Question Answering Dataset

    • paperswithcode.com
    • library.toponeai.link
    Updated Nov 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh (2023). Visual Question Answering Dataset [Dataset]. https://paperswithcode.com/dataset/visual-question-answering
    Explore at:
    Dataset updated
    Nov 5, 2023
    Authors
    Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh
    Description

    Visual Question Answering (VQA) is a dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. The first version of the dataset was released in October 2015. VQA v2.0 was released in April 2017.

  2. P

    Visual Question Answering v2.0 Dataset

    • paperswithcode.com
    Updated Mar 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yash Goyal; Tejas Khot; Douglas Summers-Stay; Dhruv Batra; Devi Parikh (2017). Visual Question Answering v2.0 Dataset [Dataset]. https://paperswithcode.com/dataset/visual-question-answering-v2-0
    Explore at:
    Dataset updated
    Mar 15, 2017
    Authors
    Yash Goyal; Tejas Khot; Douglas Summers-Stay; Dhruv Batra; Devi Parikh
    Description

    Visual Question Answering (VQA) v2.0 is a dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. It is the second version of the VQA dataset.

    265,016 images (COCO and abstract scenes) At least 3 questions (5.4 questions on average) per image 10 ground truth answers per question 3 plausible (but likely incorrect) answers per question Automatic evaluation metric

    The first version of the dataset was released in October 2015.

  3. JA-VG-VQA-500

    • huggingface.co
    Updated May 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sakana AI (2024). JA-VG-VQA-500 [Dataset]. https://huggingface.co/datasets/SakanaAI/JA-VG-VQA-500
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 18, 2024
    Dataset authored and provided by
    Sakana AIhttps://sakana.ai/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    JA-VG-VQA-500

      Dataset Description
    

    JA-VG-VQA-500 is a 500-sample subset of Japanese Visual Genome VQA dataset. This dataset was used in the evaluation of EvoVLM-JP-v1-7B. Please refer to our report and blog for more details. We are grateful to the developers for making the dataset available under Creative Commons Attribution 4.0 License.

    Visual Genome Japanese Visual Genome VQA dataset

      Usage
    

    Use the code below to get started with the dataset. from datasets… See the full description on the dataset page: https://huggingface.co/datasets/SakanaAI/JA-VG-VQA-500.

  4. JA-Multi-Image-VQA

    • huggingface.co
    Updated Aug 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sakana AI (2024). JA-Multi-Image-VQA [Dataset]. https://huggingface.co/datasets/SakanaAI/JA-Multi-Image-VQA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 2, 2024
    Dataset authored and provided by
    Sakana AIhttps://sakana.ai/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    JA-Multi-Image-VQA

      Dataset Description
    

    JA-Multi-Image-VQA is a dataset for evaluating the question answering capabilities on multiple image inputs. We carefully collected a diverse set of 39 images with 55 questions in total. Some images contain Japanese culture and objects in Japan. The Japanese questions and answers were created manually.

      Usage
    

    from datasets import load_dataset dataset = load_dataset("SakanaAI/JA-Multi-Image-VQA", split="test")… See the full description on the dataset page: https://huggingface.co/datasets/SakanaAI/JA-Multi-Image-VQA.

  5. a

    VQA: Visual Question Answering Dataset

    • academictorrents.com
    bittorrent
    Updated Oct 8, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanislaw Antol and Aishwarya Agrawal and Jiasen Lu and Margaret Mitchell and Dhruv Batra and C. Lawrence Zitnick and Devi Parikh (2015). VQA: Visual Question Answering Dataset [Dataset]. https://academictorrents.com/details/f075ad12eccbbd665aec68db5d208dc68e7a384f
    Explore at:
    bittorrent(7984418554)Available download formats
    Dataset updated
    Oct 8, 2015
    Dataset authored and provided by
    Stanislaw Antol and Aishwarya Agrawal and Jiasen Lu and Margaret Mitchell and Dhruv Batra and C. Lawrence Zitnick and Devi Parikh
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    A BitTorrent file to download data with the title 'VQA: Visual Question Answering Dataset'

  6. h

    vqa-v1.1

    • huggingface.co
    Updated Apr 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vqa-v1.1 [Dataset]. https://huggingface.co/datasets/worldcuisines/vqa-v1.1
    Explore at:
    Dataset updated
    Apr 20, 2025
    Dataset authored and provided by
    World Cuisines
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines

    WorldCuisines is a massive-scale visual question answering (VQA) benchmark for multilingual and multicultural understanding through global cuisines. The dataset contains text-image pairs across 30 languages and dialects, spanning 9 language families and featuring over 1 million data points, making it the largest multicultural VQA benchmark as of 17 October 2024.… See the full description on the dataset page: https://huggingface.co/datasets/worldcuisines/vqa-v1.1.

  7. P

    VQA-RAD Dataset

    • paperswithcode.com
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason J. Lau; Soumya Gayen; Asma Ben Abacha; Dina Demner-Fushman (2023). VQA-RAD Dataset [Dataset]. https://paperswithcode.com/dataset/vqa-rad
    Explore at:
    Dataset updated
    Jun 2, 2023
    Authors
    Jason J. Lau; Soumya Gayen; Asma Ben Abacha; Dina Demner-Fushman
    Description

    VQA-RAD consists of 3,515 question–answer pairs on 315 radiology images.

  8. i

    ITM-HDR-VQA DATASET

    • ieee-dataport.org
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    liang zhijie (2024). ITM-HDR-VQA DATASET [Dataset]. https://ieee-dataport.org/documents/itm-hdr-vqa-dataset
    Explore at:
    Dataset updated
    Nov 28, 2024
    Authors
    liang zhijie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    man-made architecture and natural scenery.

  9. P

    OK-VQA Dataset

    • paperswithcode.com
    Updated Oct 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kenneth Marino; Mohammad Rastegari; Ali Farhadi; Roozbeh Mottaghi (2023). OK-VQA Dataset [Dataset]. https://paperswithcode.com/dataset/ok-vqa
    Explore at:
    Dataset updated
    Oct 13, 2023
    Authors
    Kenneth Marino; Mohammad Rastegari; Ali Farhadi; Roozbeh Mottaghi
    Description

    Outside Knowledge Visual Question Answering (OK-VQA) includes more than 14,000 questions that require external knowledge to answer.

  10. h

    vqa-rad

    • huggingface.co
    • opendatalab.com
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Flavia Giammarino (2023). vqa-rad [Dataset]. https://huggingface.co/datasets/flaviagiammarino/vqa-rad
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 3, 2023
    Authors
    Flavia Giammarino
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Dataset Card for VQA-RAD

      Dataset Description
    

    VQA-RAD is a dataset of question-answer pairs on radiology images. The dataset is intended to be used for training and testing Medical Visual Question Answering (VQA) systems. The dataset includes both open-ended questions and binary "yes/no" questions. The dataset is built from MedPix, which is a free open-access online database of medical images. The question-answer pairs were manually generated by a team of clinicians.… See the full description on the dataset page: https://huggingface.co/datasets/flaviagiammarino/vqa-rad.

  11. P

    ST-VQA Dataset

    • paperswithcode.com
    Updated Feb 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; Ernest Valveny; C. V. Jawahar; Dimosthenis Karatzas (2021). ST-VQA Dataset [Dataset]. https://paperswithcode.com/dataset/st-vqa
    Explore at:
    Dataset updated
    Feb 20, 2021
    Authors
    Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; Ernest Valveny; C. V. Jawahar; Dimosthenis Karatzas
    Description

    ST-VQA aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the VQA process.

  12. u

    PitVQA: A Dataset of Visual Question Answering in Pituitary Surgery

    • rdr.ucl.ac.uk
    bin
    Updated Sep 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mobarack Islam; Matt Clarkson; Sophia Bano; Danail Stoyanov; Hani Marcus (2024). PitVQA: A Dataset of Visual Question Answering in Pituitary Surgery [Dataset]. http://doi.org/10.5522/04/27004666.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 18, 2024
    Dataset provided by
    University College London
    Authors
    Mobarack Islam; Matt Clarkson; Sophia Bano; Danail Stoyanov; Hani Marcus
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    PitVQA dataset comprises 25 videos of endoscopic pituitary surgeries from the National Hospital of Neurology and Neurosurgery in London, United Kingdom, similar to the dataset used in the MICCAI PitVis challenge. All patients provided informed consent, and the study was registered with the local governance committee. The surgeries were recorded using a high-definition endoscope (Karl Storz Endoscopy) with a resolution of 720p and stored as MP4 files. All videos were annotated for the surgical phases, steps, instruments present and operation notes guided by a standardised annotation framework, which was derived from a preceding international consensus study on pituitary surgery workflow. Annotation was performed collaboratively by 2 neurosurgical residents with operative pituitary experience and checked by an attending neurosurgeon. We extracted image frames from each video at 1 fps and removed any frames that were blurred or occluded. Ultimately, we obtained a total of 109,173 frames, with the videos of minimum and maximum length yielding 2,443 and 7,179 frames, respectively. We acquired frame-wise question-answer pairs for all the categories of the annotation. Overall, there are 884,242 question-answer pairs from 109,173 frames, which is around 8 pairs for each frame. There are 59 classes overall, including 4 phases, 15 steps, 18 instruments, 3 variations of instruments present in a frame, 5 positions of the instruments, and 14 operation notes in the annotation classes. The length of the questions ranges from a minimum of 7 words to a maximum of 12 words.The details description of the original videos can be found at the MICCAI PitVis challenge and the videos can be directly download from UCL HDR portal.

  13. t

    VQA 2.0 - Dataset - LDM

    • service.tib.eu
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). VQA 2.0 - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/vqa-2-0
    Explore at:
    Dataset updated
    Nov 25, 2024
    Description

    The VQA 2.0 dataset is used for visual question answering task. It consists of three sets with a train set containing 83k images and 444k questions, a validation set containing 41k images and 214k questions, and a test set containing 81k images and 448k questions.

  14. vqa-dataset

    • kaggle.com
    Updated Mar 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Le Trong Hieu (2024). vqa-dataset [Dataset]. https://www.kaggle.com/datasets/backtracking/vqa-dataset/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 16, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Le Trong Hieu
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Le Trong Hieu

    Released under Apache 2.0

    Contents

  15. D

    VQA-MHUG

    • darus.uni-stuttgart.de
    Updated Mar 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ekta Sood; Fabian Kögel; Andreas Bulling (2025). VQA-MHUG [Dataset]. http://doi.org/10.18419/DARUS-4428
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 3, 2025
    Dataset provided by
    DaRUS
    Authors
    Ekta Sood; Fabian Kögel; Andreas Bulling
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    We present VQA-MHUG - a novel 49-participant dataset of multimodal human gaze on both images and questions during visual question answering (VQA), collected using a high-speed eye tracker. To the best of our knowledge, this is the first resource containing multimodal human gaze data over a textual question and the corresponding image. Our corpus encompasses task-specific gaze on a subset of the benchmark dataset VQAv2 val2. Our dataset is unique in that it is the first to provide real human gaze data on both images and corresponding questions and, as such, allows researchers to jointly study human and machine attention. We use our dataset to analyse the similarity between human and neural attentive strategies learned by five state-of-the-art VQA models: Modulated Co-Attention Network (MCAN) with either grid or region features, Pythia, Bilinear Attention Network (BAN), and the Multimodal Factorised Bilinear Pooling Network (MFB). While prior work has focused on studying the image modality, our analyses show - for the first time - that for all models, higher correlation with human attention on text is a significant predictor of VQA performance. This finding points at a potential for improving VQA performance and, at the same time, calls for further research on neural text attention mechanisms and their integration into architectures for vision and language tasks, including but potentially also beyond VQA.

  16. V

    Visual Question Answering Technology Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Visual Question Answering Technology Report [Dataset]. https://www.archivemarketresearch.com/reports/visual-question-answering-technology-58313
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Mar 15, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Visual Question Answering (VQA) technology market is experiencing robust growth, driven by increasing demand for advanced image analysis and AI-powered solutions across diverse industries. The market, estimated at $2 billion in 2025, is projected to expand at a Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033. This significant growth is fueled by several key factors. The proliferation of big data and the advancements in deep learning algorithms are enabling more accurate and efficient VQA systems. Furthermore, the rising adoption of VQA in sectors such as healthcare (for medical image analysis), retail (for enhanced customer experience), and autonomous vehicles (for scene understanding) is significantly boosting market expansion. The increasing availability of powerful cloud computing resources further facilitates the development and deployment of complex VQA models. While challenges such as data bias and the need for robust annotation techniques remain, the overall market outlook for VQA technology is extremely positive. Segmentation analysis reveals strong growth across various application areas. The software industry currently leads in VQA adoption, followed by the computer and electronics industries. Within the technology itself, image classification and image identification are the dominant segments, indicating a strong focus on practical applications. Geographically, North America and Europe currently hold the largest market shares, but the Asia-Pacific region is expected to witness substantial growth in the coming years, driven by increasing investments in AI and technological advancements in countries like China and India. Key players like Toshiba Corporation, Amazon Science, and Cognex are actively contributing to market growth through continuous innovation and strategic partnerships. The competitive landscape is dynamic, with both established tech giants and emerging startups vying for market share. The long-term outlook suggests that VQA technology will continue to be a critical component of various emerging technologies and will play a pivotal role in shaping the future of artificial intelligence.

  17. h

    Kvasir-VQA

    • huggingface.co
    • paperswithcode.com
    Updated May 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SimulaMet HOST Department (2025). Kvasir-VQA [Dataset]. https://huggingface.co/datasets/SimulaMet-HOST/Kvasir-VQA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 23, 2025
    Dataset authored and provided by
    SimulaMet HOST Department
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The Kvasir-VQA dataset is an extended dataset derived from the HyperKvasir and Kvasir-Instrument datasets, augmented with question-and-answer annotations. This dataset is designed to facilitate advanced machine learning tasks in gastrointestinal (GI) diagnostics, including image captioning, Visual Question Answering (VQA) and text-based generation of synthetic medical images Homepage: https://datasets.simula.no/kvasir-vqa

      Usage
    

    You can use the Kvasir-VQA dataset directly from… See the full description on the dataset page: https://huggingface.co/datasets/SimulaMet-HOST/Kvasir-VQA.

  18. P

    VQA-E Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Nov 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qing Li; Qingyi Tao; Shafiq Joty; Jianfei Cai; Jiebo Luo (2021). VQA-E Dataset [Dataset]. https://paperswithcode.com/dataset/vqa-e
    Explore at:
    Dataset updated
    Nov 17, 2021
    Authors
    Qing Li; Qingyi Tao; Shafiq Joty; Jianfei Cai; Jiebo Luo
    Description

    VQA-E is a dataset for Visual Question Answering with Explanation, where the models are required to generate and explanation with the predicted answer. The VQA-E dataset is automatically derived from the VQA v2 dataset by synthesizing a textual explanation for each image-question-answer triple.

  19. vqa-rad-visual-question-answering-radiology

    • kaggle.com
    Updated Oct 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MD Zeeshan Hassan (2023). vqa-rad-visual-question-answering-radiology [Dataset]. https://www.kaggle.com/datasets/mdzeeshanhassan/vqa-rad-visual-question-answering-radiology
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 21, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    MD Zeeshan Hassan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by MD Zeeshan Hassan

    Released under Apache 2.0

    Contents

  20. S

    PathVQA AND VQA-RAD

    • scidb.cn
    Updated Jul 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    张殿元 (2024). PathVQA AND VQA-RAD [Dataset]. http://doi.org/10.57760/sciencedb.18348
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 26, 2024
    Dataset provided by
    Science Data Bank
    Authors
    张殿元
    License

    Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
    License information was derived automatically

    Description

    PathVQA来源于He, Xuehai, et al. "Pathvqa: 30000+ questions for medical visual question answering." arxiv preprint arxiv:2003.10286 (2020). VQA-RAD来源于 Lau, Jason J., et al. "A dataset of clinically generated visual questions and answers about radiology images." Scientific data 5.1 (2018): 1-10.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh (2023). Visual Question Answering Dataset [Dataset]. https://paperswithcode.com/dataset/visual-question-answering

Visual Question Answering Dataset

VQA

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Nov 5, 2023
Authors
Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh
Description

Visual Question Answering (VQA) is a dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. The first version of the dataset was released in October 2015. VQA v2.0 was released in April 2017.

Search
Clear search
Close search
Google apps
Main menu