100+ datasets found
  1. P

    Visual Question Answering Dataset

    • paperswithcode.com
    • library.toponeai.link
    Updated Nov 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh (2023). Visual Question Answering Dataset [Dataset]. https://paperswithcode.com/dataset/visual-question-answering
    Explore at:
    Dataset updated
    Nov 5, 2023
    Authors
    Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh
    Description

    Visual Question Answering (VQA) is a dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. The first version of the dataset was released in October 2015. VQA v2.0 was released in April 2017.

  2. P

    Visual Question Answering v2.0 Dataset

    • paperswithcode.com
    Updated Mar 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yash Goyal; Tejas Khot; Douglas Summers-Stay; Dhruv Batra; Devi Parikh (2017). Visual Question Answering v2.0 Dataset [Dataset]. https://paperswithcode.com/dataset/visual-question-answering-v2-0
    Explore at:
    Dataset updated
    Mar 15, 2017
    Authors
    Yash Goyal; Tejas Khot; Douglas Summers-Stay; Dhruv Batra; Devi Parikh
    Description

    Visual Question Answering (VQA) v2.0 is a dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. It is the second version of the VQA dataset.

    265,016 images (COCO and abstract scenes) At least 3 questions (5.4 questions on average) per image 10 ground truth answers per question 3 plausible (but likely incorrect) answers per question Automatic evaluation metric

    The first version of the dataset was released in October 2015.

  3. a

    VQA: Visual Question Answering Dataset

    • academictorrents.com
    bittorrent
    Updated Oct 8, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanislaw Antol and Aishwarya Agrawal and Jiasen Lu and Margaret Mitchell and Dhruv Batra and C. Lawrence Zitnick and Devi Parikh (2015). VQA: Visual Question Answering Dataset [Dataset]. https://academictorrents.com/details/f075ad12eccbbd665aec68db5d208dc68e7a384f
    Explore at:
    bittorrent(7984418554)Available download formats
    Dataset updated
    Oct 8, 2015
    Dataset authored and provided by
    Stanislaw Antol and Aishwarya Agrawal and Jiasen Lu and Margaret Mitchell and Dhruv Batra and C. Lawrence Zitnick and Devi Parikh
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    A BitTorrent file to download data with the title 'VQA: Visual Question Answering Dataset'

  4. Z

    Toloka Visual Question Answering Dataset

    • data.niaid.nih.gov
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ustalov, Dmitry (2023). Toloka Visual Question Answering Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7057740
    Explore at:
    Dataset updated
    Oct 10, 2023
    Dataset authored and provided by
    Ustalov, Dmitry
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Our dataset consists of the images associated with textual questions. One entry (instance) in our dataset is a question-image pair labeled with the ground truth coordinates of a bounding box containing the visual answer to the given question. The images were obtained from a CC BY-licensed subset of the Microsoft Common Objects in Context dataset, MS COCO. All data labeling was performed on the Toloka crowdsourcing platform, https://toloka.ai/.

    Our dataset has 45,199 instances split among three subsets: train (38,990 instances), public test (1,705 instances), and private test (4,504 instances). The entire train dataset was available for everyone since the start of the challenge. The public test dataset was available since the evaluation phase of the competition, but without any ground truth labels. After the end of the competition, public and private sets were released.

    The datasets will be provided as files in the comma-separated values (CSV) format containing the following columns.

        Column
        Type
        Description
    
    
        image
        string
        URL of an image on a public content delivery network
    
    
        width
        integer
        image width
    
    
        height
        integer
        image height
    
    
        left
        integer
        bounding box coordinate: left
    
    
        top
        integer
        bounding box coordinate: top
    
    
        right
        integer
        bounding box coordinate: right
    
    
        bottom
        integer
        bounding box coordinate: bottom
    
    
        question
        string
        question in English
    

    This upload also contains a ZIP file with the images from MS COCO.

  5. u

    PitVQA: A Dataset of Visual Question Answering in Pituitary Surgery

    • rdr.ucl.ac.uk
    bin
    Updated Sep 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mobarack Islam; Matt Clarkson; Sophia Bano; Danail Stoyanov; Hani Marcus (2024). PitVQA: A Dataset of Visual Question Answering in Pituitary Surgery [Dataset]. http://doi.org/10.5522/04/27004666.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 18, 2024
    Dataset provided by
    University College London
    Authors
    Mobarack Islam; Matt Clarkson; Sophia Bano; Danail Stoyanov; Hani Marcus
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    PitVQA dataset comprises 25 videos of endoscopic pituitary surgeries from the National Hospital of Neurology and Neurosurgery in London, United Kingdom, similar to the dataset used in the MICCAI PitVis challenge. All patients provided informed consent, and the study was registered with the local governance committee. The surgeries were recorded using a high-definition endoscope (Karl Storz Endoscopy) with a resolution of 720p and stored as MP4 files. All videos were annotated for the surgical phases, steps, instruments present and operation notes guided by a standardised annotation framework, which was derived from a preceding international consensus study on pituitary surgery workflow. Annotation was performed collaboratively by 2 neurosurgical residents with operative pituitary experience and checked by an attending neurosurgeon. We extracted image frames from each video at 1 fps and removed any frames that were blurred or occluded. Ultimately, we obtained a total of 109,173 frames, with the videos of minimum and maximum length yielding 2,443 and 7,179 frames, respectively. We acquired frame-wise question-answer pairs for all the categories of the annotation. Overall, there are 884,242 question-answer pairs from 109,173 frames, which is around 8 pairs for each frame. There are 59 classes overall, including 4 phases, 15 steps, 18 instruments, 3 variations of instruments present in a frame, 5 positions of the instruments, and 14 operation notes in the annotation classes. The length of the questions ranges from a minimum of 7 words to a maximum of 12 words.The details description of the original videos can be found at the MICCAI PitVis challenge and the videos can be directly download from UCL HDR portal.

  6. P

    ST-VQA Dataset

    • paperswithcode.com
    Updated Feb 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; Ernest Valveny; C. V. Jawahar; Dimosthenis Karatzas (2021). ST-VQA Dataset [Dataset]. https://paperswithcode.com/dataset/st-vqa
    Explore at:
    Dataset updated
    Feb 20, 2021
    Authors
    Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; Ernest Valveny; C. V. Jawahar; Dimosthenis Karatzas
    Description

    ST-VQA aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the VQA process.

  7. P

    DocCVQA Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rubèn Tito; Dimosthenis Karatzas; Ernest Valveny, DocCVQA Dataset [Dataset]. https://paperswithcode.com/dataset/doccvqa
    Explore at:
    Authors
    Rubèn Tito; Dimosthenis Karatzas; Ernest Valveny
    Description

    DocCVQA is a Document Visual Question Answering dataset, where the questions are posed over a whole collection of 14,362 scanned documents. Therefore, the task can be seen as a retrieval-style evidence seeking task where given a question, the aim is to identify and retrieve all the documents in a large document collection that are relevant to answering this question as well as provide the answer.

  8. f

    Visual question answering

    • figshare.com
    bin
    Updated Jan 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Nadian-Ghomsheh (2020). Visual question answering [Dataset]. http://doi.org/10.6084/m9.figshare.11763636.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 29, 2020
    Dataset provided by
    figshare
    Authors
    Ali Nadian-Ghomsheh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This are the image features extracted from Inception v3 network to be included for solving the VQA problem.

  9. h

    vqa-rad

    • huggingface.co
    • opendatalab.com
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Flavia Giammarino (2023). vqa-rad [Dataset]. https://huggingface.co/datasets/flaviagiammarino/vqa-rad
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 3, 2023
    Authors
    Flavia Giammarino
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Dataset Card for VQA-RAD

      Dataset Description
    

    VQA-RAD is a dataset of question-answer pairs on radiology images. The dataset is intended to be used for training and testing Medical Visual Question Answering (VQA) systems. The dataset includes both open-ended questions and binary "yes/no" questions. The dataset is built from MedPix, which is a free open-access online database of medical images. The question-answer pairs were manually generated by a team of clinicians.… See the full description on the dataset page: https://huggingface.co/datasets/flaviagiammarino/vqa-rad.

  10. P

    MP-DocVQA Dataset

    • paperswithcode.com
    Updated Feb 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rubèn Tito; Dimosthenis Karatzas; Ernest Valveny (2025). MP-DocVQA Dataset [Dataset]. https://paperswithcode.com/dataset/mp-docvqa
    Explore at:
    Dataset updated
    Feb 12, 2025
    Authors
    Rubèn Tito; Dimosthenis Karatzas; Ernest Valveny
    Description

    The dataset is aimed to perform Visual Question Answering on multipage industry scanned documents. The questions and answers are reused from Single Page DocVQA (SP-DocVQA) dataset. The images also corresponds to the same in original dataset with previous and posterior pages with a limit of up to 20 pages per document.

  11. t

    Image Captioning and Visual Question Answering - Dataset - LDM

    • service.tib.eu
    Updated Dec 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Image Captioning and Visual Question Answering - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/image-captioning-and-visual-question-answering
    Explore at:
    Dataset updated
    Dec 17, 2024
    Description

    The dataset is used for image captioning and visual question answering.

  12. V

    Visual Question Answering Technology Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Visual Question Answering Technology Report [Dataset]. https://www.archivemarketresearch.com/reports/visual-question-answering-technology-58313
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Mar 15, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Visual Question Answering (VQA) technology market is experiencing robust growth, driven by increasing demand for advanced image analysis and AI-powered solutions across diverse industries. The market, estimated at $2 billion in 2025, is projected to expand at a Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033. This significant growth is fueled by several key factors. The proliferation of big data and the advancements in deep learning algorithms are enabling more accurate and efficient VQA systems. Furthermore, the rising adoption of VQA in sectors such as healthcare (for medical image analysis), retail (for enhanced customer experience), and autonomous vehicles (for scene understanding) is significantly boosting market expansion. The increasing availability of powerful cloud computing resources further facilitates the development and deployment of complex VQA models. While challenges such as data bias and the need for robust annotation techniques remain, the overall market outlook for VQA technology is extremely positive. Segmentation analysis reveals strong growth across various application areas. The software industry currently leads in VQA adoption, followed by the computer and electronics industries. Within the technology itself, image classification and image identification are the dominant segments, indicating a strong focus on practical applications. Geographically, North America and Europe currently hold the largest market shares, but the Asia-Pacific region is expected to witness substantial growth in the coming years, driven by increasing investments in AI and technological advancements in countries like China and India. Key players like Toshiba Corporation, Amazon Science, and Cognex are actively contributing to market growth through continuous innovation and strategic partnerships. The competitive landscape is dynamic, with both established tech giants and emerging startups vying for market share. The long-term outlook suggests that VQA technology will continue to be a critical component of various emerging technologies and will play a pivotal role in shaping the future of artificial intelligence.

  13. V

    Visual Question Answering Technology Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Visual Question Answering Technology Report [Dataset]. https://www.archivemarketresearch.com/reports/visual-question-answering-technology-12874
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Feb 6, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Visual Question Answering (VQA) Technology market is poised for significant growth due to its increasing adoption across various industries. With a market size of XXX million in 2025, the market is projected to grow at a CAGR of XX% during the forecast period 2025-2033. This growth is attributed to the rising demand for automated systems for complex tasks, advancements in artificial intelligence (AI), and the increasing availability of image and video data. VQA technology has applications in the software, computer, and electronic industries, providing solutions for image identification, image classification, and other tasks. Various factors are driving the growth of the VQA technology market. The increasing adoption of AI-powered solutions, the growing need for efficient and accurate image processing, and the rising demand for automated customer service are major factors driving the market. Moreover, the advancements in natural language processing (NLP) and computer vision technologies further enhance the capabilities of VQA systems. However, the availability of limited training data for VQA models and the need for specialized hardware for processing large datasets pose certain challenges to the market's growth. Despite these challenges, the increasing R&D investments by market players and the collaborative efforts to develop standardized datasets are expected to create new growth opportunities in the coming years.

  14. Text VQA

    • kaggle.com
    Updated Mar 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dmytro Kozii (2021). Text VQA [Dataset]. https://www.kaggle.com/dmytruto/textvqa/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Dmytro Kozii
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TextVQA requires models to read and reason about text in an image to answer questions based on them. In order to perform well on this task, models need to first detect and read text in the images. Models then need to reason about this to answer the question. Current state-of-the-art models fail to answer questions in TextVQA because they do not have text reading and reasoning capabilities. See the examples in the image to compare ground truth answers and corresponding predictions by a state-of-the-art model. Challenge link: https://eval.ai/web/challenges/challenge-page/874/

  15. D

    Visual Analysis System for Scene-Graph-Based Visual Question Answering

    • darus.uni-stuttgart.de
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noel Schäfer; Pascal Tilli; Tanja Munz-Körner; Sebastian Künzel; Sandeep Vidyapu; Ngoc Thang Vu; Daniel Weiskopf (2023). Visual Analysis System for Scene-Graph-Based Visual Question Answering [Dataset]. http://doi.org/10.18419/DARUS-3589
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    DaRUS
    Authors
    Noel Schäfer; Pascal Tilli; Tanja Munz-Körner; Sebastian Künzel; Sandeep Vidyapu; Ngoc Thang Vu; Daniel Weiskopf
    License

    https://spdx.org/licenses/MIT.htmlhttps://spdx.org/licenses/MIT.html

    Dataset funded by
    DFG
    Description

    Source code of our visual analysis system to explore scene-graph-based visual question answering. This approach is built on top of the state-of-the-art GraphVQA framework which was trained on the GQA dataset. Instructions on how to use our system can be found in the README.

  16. p

    Visual Question Answering evaluation dataset for MIMIC CXR

    • physionet.org
    Updated Jan 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timo Kohlberger; Charles Lau; Tom Pollard; Andrew Sellergren; Atilla Kiraly; Fayaz Jamil (2025). Visual Question Answering evaluation dataset for MIMIC CXR [Dataset]. http://doi.org/10.13026/cvsk-ny21
    Explore at:
    Dataset updated
    Jan 28, 2025
    Authors
    Timo Kohlberger; Charles Lau; Tom Pollard; Andrew Sellergren; Atilla Kiraly; Fayaz Jamil
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    MIMIC CXR [1] is a large publicly available dataset of chest radiographs in DICOM format with free-text radiology reports. In addition, labels for the presence of 12 different chest-related pathologies, as well as of any support devices, and overall normal/abnormal status were made available via the MIMIC Chest X-ray JPG (MIMIC-CXR-JPG) [2] labels, which were generated using the CheXpert and NegBio algorithms.

    Based on these labels, we created a visual question answering dataset comprising 224 questions for 48 cases from the official test set, and 111 questions for 23 validation cases. A majority (68%) of the questions are close-ended (answerable with yes or no), and focus on the presence of one out of 15 chest pathologies, or any support device, or generically on any abnormality, whereas the remaining open-ended questions inquire about the location, size, severity or type of a pathology/device, if present in the specific case, indicated by the MIMIC-CXR-JPG labels.

    For each question and case we also provide a reference answer, which was authored by a board-certified radiologist (with 17 years of post-residency experience) based on the chest X-ray and original radiology report

  17. D

    Extended Visual Analysis System for Scene-Graph-Based Visual Question...

    • darus.uni-stuttgart.de
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noel Schäfer; Sebastian Künzel; Pascal Tilli; Tanja Munz-Körner; Sandeep Vidyapu; Ngoc Thang Vu; Daniel Weiskopf (2025). Extended Visual Analysis System for Scene-Graph-Based Visual Question Answering [Dataset]. http://doi.org/10.18419/DARUS-3909
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 28, 2025
    Dataset provided by
    DaRUS
    Authors
    Noel Schäfer; Sebastian Künzel; Pascal Tilli; Tanja Munz-Körner; Sandeep Vidyapu; Ngoc Thang Vu; Daniel Weiskopf
    License

    https://spdx.org/licenses/MIT.htmlhttps://spdx.org/licenses/MIT.html

    Dataset funded by
    DFG
    Description

    Source code of our extended visual analysis system to explore scene-graph-based visual question answering. This approach is built on top of the state-of-the-art GraphVQA framework which was trained on the GQA dataset. Additionally, it is an improved version of our system that can be found here Instructions on how to use our system can be found in the README.

  18. P

    VQA-RAD Dataset

    • paperswithcode.com
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason J. Lau; Soumya Gayen; Asma Ben Abacha; Dina Demner-Fushman (2023). VQA-RAD Dataset [Dataset]. https://paperswithcode.com/dataset/vqa-rad
    Explore at:
    Dataset updated
    Jun 2, 2023
    Authors
    Jason J. Lau; Soumya Gayen; Asma Ben Abacha; Dina Demner-Fushman
    Description

    VQA-RAD consists of 3,515 question–answer pairs on 315 radiology images.

  19. visual-question-answering-checkpoint-downloads

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face OSS Metrics, visual-question-answering-checkpoint-downloads [Dataset]. https://huggingface.co/datasets/open-source-metrics/visual-question-answering-checkpoint-downloads
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face OSS Metrics
    Description

    open-source-metrics/visual-question-answering-checkpoint-downloads dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. P

    OpenViVQA Dataset

    • paperswithcode.com
    Updated May 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nghia Hieu Nguyen; Duong T. D. Vo; Kiet Van Nguyen; Ngan Luu-Thuy Nguyen (2023). OpenViVQA Dataset [Dataset]. https://paperswithcode.com/dataset/openvivqa
    Explore at:
    Dataset updated
    May 9, 2023
    Authors
    Nghia Hieu Nguyen; Duong T. D. Vo; Kiet Van Nguyen; Ngan Luu-Thuy Nguyen
    Description

    In recent years, visual question answering (VQA) has attracted attention from the research community because of its highly potential applications (such as virtual assistance on intelligent cars, assistant devices for blind people, or information retrieval from document images using natural language as queries) and challenge. The VQA task requires methods that have the ability to fuse the information from questions and images to produce appropriate answers. Neural visual question answering models have achieved tremendous growth on large-scale datasets which are mostly for resource-rich languages such as English. However, available datasets narrow the VQA task as the answers selection task or answer classification task. We argue that this form of VQA is far from human ability and eliminates the challenge of the answering aspect in the VQA task by just selecting answers rather than generating them. In this paper, we introduce the OpenViVQA (Open-domain Vietnamese Visual Question Answering) dataset, the first large-scale dataset for VQA with open-ended answers in Vietnamese, consists of 11,000+ images associated with 37,000+ question–answer pairs (QAs).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh (2023). Visual Question Answering Dataset [Dataset]. https://paperswithcode.com/dataset/visual-question-answering

Visual Question Answering Dataset

VQA

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Nov 5, 2023
Authors
Aishwarya Agrawal; Jiasen Lu; Stanislaw Antol; Margaret Mitchell; C. Lawrence Zitnick; Dhruv Batra; Devi Parikh
Description

Visual Question Answering (VQA) is a dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. The first version of the dataset was released in October 2015. VQA v2.0 was released in April 2017.

Search
Clear search
Close search
Google apps
Main menu