22 datasets found
  1. h

    textvqa

    • huggingface.co
    • live.european-language-grid.eu
    Updated May 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI at Meta (2024). textvqa [Dataset]. https://huggingface.co/datasets/facebook/textvqa
    Explore at:
    Dataset updated
    May 23, 2024
    Dataset authored and provided by
    AI at Meta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TextVQA requires models to read and reason about text in images to answer questions about them. Specifically, models need to incorporate a new modality of text present in the images and reason over it to answer TextVQA questions. TextVQA dataset contains 45,336 questions over 28,408 images from the OpenImages dataset.

  2. h

    textvqa

    • huggingface.co
    Updated Apr 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMMs-Lab (2024). textvqa [Dataset]. https://huggingface.co/datasets/lmms-lab/textvqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2024
    Dataset authored and provided by
    LMMs-Lab
    Description

    Large-scale Multi-modality Models Evaluation Suite

    Accelerating the development of large-scale multi-modality models (LMMs) with lmms-eval

    🏠 Homepage | 📚 Documentation | 🤗 Huggingface Datasets

      This Dataset
    

    This is a formatted version of TextVQA. It is used in our lmms-eval pipeline to allow for one-click evaluations of large multi-modality models. @inproceedings{singh2019towards, title={Towards vqa models that can read}, author={Singh, Amanpreet and… See the full description on the dataset page: https://huggingface.co/datasets/lmms-lab/textvqa.

  3. TextVQA

    • huggingface.co
    Updated Oct 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redactable.com (2024). TextVQA [Dataset]. https://huggingface.co/datasets/redactable-llm/TextVQA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2024
    Dataset provided by
    Redactable Inc.
    Description

    redactable-llm/TextVQA dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. Text VQA

    • kaggle.com
    Updated Mar 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dmytro Kozii (2021). Text VQA [Dataset]. https://www.kaggle.com/dmytruto/textvqa/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Dmytro Kozii
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TextVQA requires models to read and reason about text in an image to answer questions based on them. In order to perform well on this task, models need to first detect and read text in the images. Models then need to reason about this to answer the question. Current state-of-the-art models fail to answer questions in TextVQA because they do not have text reading and reasoning capabilities. See the examples in the image to compare ground truth answers and corresponding predictions by a state-of-the-art model. Challenge link: https://eval.ai/web/challenges/challenge-page/874/

  5. h

    textvqa

    • huggingface.co
    Updated Mar 11, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LESS IS MORE (2011). textvqa [Dataset]. https://huggingface.co/datasets/LIME-DATA/textvqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 11, 2011
    Authors
    LESS IS MORE
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    LIME-DATA/textvqa dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    Viet-ShareGPT-4o-Text-VQA

    • huggingface.co
    • kaggle.com
    Updated May 15, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fifth Civil Defender - 5CD (2014). Viet-ShareGPT-4o-Text-VQA [Dataset]. https://huggingface.co/datasets/5CD-AI/Viet-ShareGPT-4o-Text-VQA
    Explore at:
    Dataset updated
    May 15, 2014
    Dataset authored and provided by
    Fifth Civil Defender - 5CD
    Area covered
    Vietnam
    Description

    Dataset Overview

    This dataset is was created from 42,678 Vietnamese 🇻🇳 images with the last GPT-4o. The dataset has superior quality compared to other existing datasets with:

    Highly detailed descriptions, from the overall composition of the image to descriptions of each object, including their location, quantity, etc. Descriptions of text include not only recognition but also the font style, color, position, and size of the text. Answers are very long and detailed, including… See the full description on the dataset page: https://huggingface.co/datasets/5CD-AI/Viet-ShareGPT-4o-Text-VQA.

  7. h

    textvqa-sample

    • huggingface.co
    Updated Feb 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niels Rogge (2025). textvqa-sample [Dataset]. https://huggingface.co/datasets/nielsr/textvqa-sample
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 17, 2025
    Authors
    Niels Rogge
    Description

    nielsr/textvqa-sample dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. f

    Comparative results based on the TextVQA.

    • plos.figshare.com
    xls
    Updated Aug 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xianli Sheng (2023). Comparative results based on the TextVQA. [Dataset]. http://doi.org/10.1371/journal.pone.0290315.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Aug 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Xianli Sheng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Existing visual question answering methods typically concentrate only on visual targets in images, ignoring the key textual content in the images, thereby limiting the depth and accuracy of image content comprehension. Inspired by this, we pay attention to the task of text-based visual question answering, address the performance bottleneck issue caused by over-fitting risk in existing self-attention-based models, and propose a scenario text visual question answering method called INT2-VQA that fuses knowledge manifestation based on inter-modality and intra-modality collaborations. Specifically, we model the complementary priori knowledge of locational collaboration between visual targets and textual targets across modalities and the contextual semantical collaboration among textual word targets within a modality. Based on this, a universal knowledge-reinforced attention module is designed to achieve a unified encoding manifestation of both relations. Extensive ablation experiments, contrast experiments, and visual analyses demonstrate the effectiveness of the proposed method and prove its superiority over the other state-of-the-art methods.

  9. O

    TextOCR

    • opendatalab.com
    zip
    Updated Apr 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Facebook AI Research (2023). TextOCR [Dataset]. https://opendatalab.com/OpenDataLab/TextOCR
    Explore at:
    zip(9222216147 bytes)Available download formats
    Dataset updated
    Apr 20, 2023
    Dataset provided by
    Facebook AI Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TextOCR requires models to perform text-recognition on arbitrary shaped scene-text present on natural images. TextOCR provides ~1M high quality word annotations on TextVQA images allowing application of end-to-end reasoning on downstream tasks such as visual question answering or image captioning.

  10. h

    textvqa

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pratham, textvqa [Dataset]. https://huggingface.co/datasets/yobro4619/textvqa
    Explore at:
    Authors
    Pratham
    Description

    yobro4619/textvqa dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    spotlight-textvqa-enrichment

    • huggingface.co
    Updated Mar 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    spotlight-textvqa-enrichment [Dataset]. https://huggingface.co/datasets/renumics/spotlight-textvqa-enrichment
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 13, 2024
    Dataset authored and provided by
    Renumics
    Description

    Dataset Card for "spotlight-textvqa-enrichment"

    More Information needed

  12. h

    textvqa-final

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pratham, textvqa-final [Dataset]. https://huggingface.co/datasets/yobro4619/textvqa-final
    Explore at:
    Authors
    Pratham
    Description

    yobro4619/textvqa-final dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. h

    redactable-text-vqa

    • huggingface.co
    Updated Oct 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David DeLaurier (2024). redactable-text-vqa [Dataset]. https://huggingface.co/datasets/d-delaurier/redactable-text-vqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2024
    Authors
    David DeLaurier
    Description

    TextVQA

      Overview
    

    TextVQA requires models to read and reason about text in images to answer questions about them. Specifically, models need to incorporate a new modality of text present in the images and reason over it to answer TextVQA questions.

      Statistics
    

    28,408 images from OpenImages 45,336 questions 453,360 ground truth answers

      Code and Papers
    

    TextVQA and LoRRA at https://github.com/facebookresearch/pythia. Iterative Answer Prediction with… See the full description on the dataset page: https://huggingface.co/datasets/d-delaurier/redactable-text-vqa.

  14. h

    textvqa-chat

    • huggingface.co
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ananya Prakash Singh (2025). textvqa-chat [Dataset]. https://huggingface.co/datasets/PresentLogic/textvqa-chat
    Explore at:
    Dataset updated
    May 29, 2025
    Authors
    Ananya Prakash Singh
    Description

    PresentLogic/textvqa-chat dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    TextVQA_GT_bbox

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jiarui zhang (2025). TextVQA_GT_bbox [Dataset]. https://huggingface.co/datasets/jrzhang/TextVQA_GT_bbox
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    jiarui zhang
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    TextVQA validation set with grounding truth bounding box

    The dataset used in the paper MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs for studying MLLMs' attention patterns. The dataset is sourced from TextVQA and annotated manually with ground-truth bounding boxes. We consider questions with a single area of interest in the image so that 4370 out of 5000 samples are kept.

      Citation
    

    If you find our paper and code useful… See the full description on the dataset page: https://huggingface.co/datasets/jrzhang/TextVQA_GT_bbox.

  16. h

    MMEB-eval-TextVQA-beir-v2

    • huggingface.co
    Updated Jul 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zilin Xiao (2025). MMEB-eval-TextVQA-beir-v2 [Dataset]. https://huggingface.co/datasets/MrZilinXiao/MMEB-eval-TextVQA-beir-v2
    Explore at:
    Dataset updated
    Jul 27, 2025
    Authors
    Zilin Xiao
    Description

    MrZilinXiao/MMEB-eval-TextVQA-beir-v2 dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    mobile_capture_vqa

    • huggingface.co
    Updated Jun 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arnaud Stiegler (2024). mobile_capture_vqa [Dataset]. https://huggingface.co/datasets/arnaudstiegler/mobile_capture_vqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 24, 2024
    Authors
    Arnaud Stiegler
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Mobile Capture VQA

      What is it?
    

    This benchmark is an early effort to provide some evaluation of the different existing VLMs on mobile capture data. It contains:

    122 unique images 871 question/answers pairs

    This dataset is a collection of "mobile capture" images, i.e. images made from a cellphone. Most existing benchmarks rely on document scans/PDFs (DocVQA, ChartQA) or scene text recognition (TextVQA) but overlook the unique challenges that mobile capture poses:

    poor… See the full description on the dataset page: https://huggingface.co/datasets/arnaudstiegler/mobile_capture_vqa.

  18. h

    VoQA

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    An JiaNing, VoQA [Dataset]. https://huggingface.co/datasets/AJN-AI/VoQA
    Explore at:
    Authors
    An JiaNing
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Resources

    Paper: VoQA: Visual-only Question Answering Evaluation code and task brief introduction: LuyangJ/VoQA (Github) Dataset folders: train (VoQA Training Dataset, 3.35M samples) test (VoQA Benchmark, 134k samples)

      VoQA Benchmark
    
    
    
    
    
      Sub-tasks
    

    The VoQA evaluation dataset (VoQA Benchmark) includes the following tasks:

    GQA POPE ScienceQA TextVQA VQAv2

    Each task contains two data formats:

    VoQA Task: Watermark rendering images Traditional VQA Task: original… See the full description on the dataset page: https://huggingface.co/datasets/AJN-AI/VoQA.

  19. h

    test-public

    • huggingface.co
    Updated Apr 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    test-public [Dataset]. https://huggingface.co/datasets/taiseimatsuoka/test-public
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 8, 2024
    Authors
    taisei matsuoka
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    nanoLLaVA - Sub 1B Vision-Language Model

      Description
    

    nanoLLaVA is a "small but mighty" 1B vision-language model designed to run efficiently on edge devices.

    Base LLM: Quyen-SE-v0.1 (Qwen1.5-0.5B) Vision Encoder: google/siglip-so400m-patch14-384

    Model VQA v2 TextVQA ScienceQA POPE MMMU (Test) MMMU (Eval) GQA MM-VET

    Score 70.84 46.71 58.97 84.1 28.6 30.4 54.79 23.9

      Training Data
    

    Training Data will be released later as I am still writing a… See the full description on the dataset page: https://huggingface.co/datasets/taiseimatsuoka/test-public.

  20. h

    WeThink-Multimodal-Reasoning-120K

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WeThink, WeThink-Multimodal-Reasoning-120K [Dataset]. https://huggingface.co/datasets/WeThink/WeThink-Multimodal-Reasoning-120K
    Explore at:
    Authors
    WeThink
    Description

    WeThink-Multimodal-Reasoning-120K

      Image Type
    

    Images data can be access from https://huggingface.co/datasets/Xkev/LLaVA-CoT-100k

    Image Type Source Dataset Images

    General Images COCO 25,344

    SAM-1B 18,091

    Visual Genome 4,441

    GQA 3,251

    PISC 835

    LLaVA 134

    Text-Intensive Images TextVQA 25,483

    ShareTextVQA 538

    DocVQA 4,709

    OCR-VQA5,142

    ChartQA 21,781

    Scientific & Technical GeoQA+ 4,813

    ScienceQA 4,990

    AI2D 1,812

    CLEVR-Math 677… See the full description on the dataset page: https://huggingface.co/datasets/WeThink/WeThink-Multimodal-Reasoning-120K.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
AI at Meta (2024). textvqa [Dataset]. https://huggingface.co/datasets/facebook/textvqa

textvqa

TextVQA

facebook/textvqa

Explore at:
Dataset updated
May 23, 2024
Dataset authored and provided by
AI at Meta
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

TextVQA requires models to read and reason about text in images to answer questions about them. Specifically, models need to incorporate a new modality of text present in the images and reason over it to answer TextVQA questions. TextVQA dataset contains 45,336 questions over 28,408 images from the OpenImages dataset.

Search
Clear search
Close search
Google apps
Main menu