22 datasets found

h
textvqa
huggingface.co
live.european-language-grid.eu
Updated May 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI at Meta (2024). textvqa [Dataset]. https://huggingface.co/datasets/facebook/textvqa
Explore at:
Dataset updated
May 23, 2024
Dataset authored and provided by
AI at Meta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TextVQA requires models to read and reason about text in images to answer questions about them. Specifically, models need to incorporate a new modality of text present in the images and reason over it to answer TextVQA questions. TextVQA dataset contains 45,336 questions over 28,408 images from the OpenImages dataset.
h
textvqa
huggingface.co
Updated Apr 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LMMs-Lab (2024). textvqa [Dataset]. https://huggingface.co/datasets/lmms-lab/textvqa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 21, 2024
Dataset authored and provided by
LMMs-Lab
Description
Large-scale Multi-modality Models Evaluation Suite

Accelerating the development of large-scale multi-modality models (LMMs) with lmms-eval

🏠 Homepage | 📚 Documentation | 🤗 Huggingface Datasets

This Dataset

This is a formatted version of TextVQA. It is used in our lmms-eval pipeline to allow for one-click evaluations of large multi-modality models. @inproceedings{singh2019towards, title={Towards vqa models that can read}, author={Singh, Amanpreet and… See the full description on the dataset page: https://huggingface.co/datasets/lmms-lab/textvqa.
TextVQA
huggingface.co
Updated Oct 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redactable.com (2024). TextVQA [Dataset]. https://huggingface.co/datasets/redactable-llm/TextVQA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 22, 2024
Dataset provided by
Redactable Inc.
Description
redactable-llm/TextVQA dataset hosted on Hugging Face and contributed by the HF Datasets community
Text VQA
kaggle.com
Updated Mar 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dmytro Kozii (2021). Text VQA [Dataset]. https://www.kaggle.com/dmytruto/textvqa/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 15, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dmytro Kozii
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TextVQA requires models to read and reason about text in an image to answer questions based on them. In order to perform well on this task, models need to first detect and read text in the images. Models then need to reason about this to answer the question. Current state-of-the-art models fail to answer questions in TextVQA because they do not have text reading and reasoning capabilities. See the examples in the image to compare ground truth answers and corresponding predictions by a state-of-the-art model. Challenge link: https://eval.ai/web/challenges/challenge-page/874/
h
textvqa
huggingface.co
Updated Mar 11, 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LESS IS MORE (2011). textvqa [Dataset]. https://huggingface.co/datasets/LIME-DATA/textvqa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 11, 2011
Authors
LESS IS MORE
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
LIME-DATA/textvqa dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Viet-ShareGPT-4o-Text-VQA
huggingface.co
kaggle.com
Updated May 15, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fifth Civil Defender - 5CD (2014). Viet-ShareGPT-4o-Text-VQA [Dataset]. https://huggingface.co/datasets/5CD-AI/Viet-ShareGPT-4o-Text-VQA
Explore at:
Dataset updated
May 15, 2014
Dataset authored and provided by
Fifth Civil Defender - 5CD
Area covered
Vietnam
Description
Dataset Overview

This dataset is was created from 42,678 Vietnamese 🇻🇳 images with the last GPT-4o. The dataset has superior quality compared to other existing datasets with:

Highly detailed descriptions, from the overall composition of the image to descriptions of each object, including their location, quantity, etc. Descriptions of text include not only recognition but also the font style, color, position, and size of the text. Answers are very long and detailed, including… See the full description on the dataset page: https://huggingface.co/datasets/5CD-AI/Viet-ShareGPT-4o-Text-VQA.
h
textvqa-sample
huggingface.co
Updated Feb 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niels Rogge (2025). textvqa-sample [Dataset]. https://huggingface.co/datasets/nielsr/textvqa-sample
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 17, 2025
Authors
Niels Rogge
Description
nielsr/textvqa-sample dataset hosted on Hugging Face and contributed by the HF Datasets community
f
Comparative results based on the TextVQA.
plos.figshare.com
xls
Updated Aug 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xianli Sheng (2023). Comparative results based on the TextVQA. [Dataset]. http://doi.org/10.1371/journal.pone.0290315.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0290315.t002
Dataset updated
Aug 30, 2023
Dataset provided by
PLOS ONE
Authors
Xianli Sheng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Existing visual question answering methods typically concentrate only on visual targets in images, ignoring the key textual content in the images, thereby limiting the depth and accuracy of image content comprehension. Inspired by this, we pay attention to the task of text-based visual question answering, address the performance bottleneck issue caused by over-fitting risk in existing self-attention-based models, and propose a scenario text visual question answering method called INT2-VQA that fuses knowledge manifestation based on inter-modality and intra-modality collaborations. Specifically, we model the complementary priori knowledge of locational collaboration between visual targets and textual targets across modalities and the contextual semantical collaboration among textual word targets within a modality. Based on this, a universal knowledge-reinforced attention module is designed to achieve a unified encoding manifestation of both relations. Extensive ablation experiments, contrast experiments, and visual analyses demonstrate the effectiveness of the proposed method and prove its superiority over the other state-of-the-art methods.
O
TextOCR
opendatalab.com
zip
Updated Apr 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Facebook AI Research (2023). TextOCR [Dataset]. https://opendatalab.com/OpenDataLab/TextOCR
Explore at:
zip(9222216147 bytes)Available download formats
Dataset updated
Apr 20, 2023
Dataset provided by
Facebook AI Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TextOCR requires models to perform text-recognition on arbitrary shaped scene-text present on natural images. TextOCR provides ~1M high quality word annotations on TextVQA images allowing application of end-to-end reasoning on downstream tasks such as visual question answering or image captioning.
h
textvqa
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pratham, textvqa [Dataset]. https://huggingface.co/datasets/yobro4619/textvqa
Explore at:
Authors
Pratham
Description
yobro4619/textvqa dataset hosted on Hugging Face and contributed by the HF Datasets community
h
spotlight-textvqa-enrichment
huggingface.co
Updated Mar 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
spotlight-textvqa-enrichment [Dataset]. https://huggingface.co/datasets/renumics/spotlight-textvqa-enrichment
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 13, 2024
Dataset authored and provided by
Renumics
Description
Dataset Card for "spotlight-textvqa-enrichment"

More Information needed
h
textvqa-final
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pratham, textvqa-final [Dataset]. https://huggingface.co/datasets/yobro4619/textvqa-final
Explore at:
Authors
Pratham
Description
yobro4619/textvqa-final dataset hosted on Hugging Face and contributed by the HF Datasets community
h
redactable-text-vqa
huggingface.co
Updated Oct 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David DeLaurier (2024). redactable-text-vqa [Dataset]. https://huggingface.co/datasets/d-delaurier/redactable-text-vqa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 22, 2024
Authors
David DeLaurier
Description
TextVQA

Overview

TextVQA requires models to read and reason about text in images to answer questions about them. Specifically, models need to incorporate a new modality of text present in the images and reason over it to answer TextVQA questions.

Statistics

28,408 images from OpenImages 45,336 questions 453,360 ground truth answers

Code and Papers

TextVQA and LoRRA at https://github.com/facebookresearch/pythia. Iterative Answer Prediction with… See the full description on the dataset page: https://huggingface.co/datasets/d-delaurier/redactable-text-vqa.
h
textvqa-chat
huggingface.co
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ananya Prakash Singh (2025). textvqa-chat [Dataset]. https://huggingface.co/datasets/PresentLogic/textvqa-chat
Explore at:
Dataset updated
May 29, 2025
Authors
Ananya Prakash Singh
Description
PresentLogic/textvqa-chat dataset hosted on Hugging Face and contributed by the HF Datasets community
h
TextVQA_GT_bbox
huggingface.co
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
jiarui zhang (2025). TextVQA_GT_bbox [Dataset]. https://huggingface.co/datasets/jrzhang/TextVQA_GT_bbox
Explore at:
Dataset updated
May 11, 2025
Authors
jiarui zhang
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
TextVQA validation set with grounding truth bounding box

The dataset used in the paper MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs for studying MLLMs' attention patterns. The dataset is sourced from TextVQA and annotated manually with ground-truth bounding boxes. We consider questions with a single area of interest in the image so that 4370 out of 5000 samples are kept.

Citation

If you find our paper and code useful… See the full description on the dataset page: https://huggingface.co/datasets/jrzhang/TextVQA_GT_bbox.
h
MMEB-eval-TextVQA-beir-v2
huggingface.co
Updated Jul 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zilin Xiao (2025). MMEB-eval-TextVQA-beir-v2 [Dataset]. https://huggingface.co/datasets/MrZilinXiao/MMEB-eval-TextVQA-beir-v2
Explore at:
Dataset updated
Jul 27, 2025
Authors
Zilin Xiao
Description
MrZilinXiao/MMEB-eval-TextVQA-beir-v2 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
mobile_capture_vqa
huggingface.co
Updated Jun 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arnaud Stiegler (2024). mobile_capture_vqa [Dataset]. https://huggingface.co/datasets/arnaudstiegler/mobile_capture_vqa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 24, 2024
Authors
Arnaud Stiegler
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Mobile Capture VQA

What is it?

This benchmark is an early effort to provide some evaluation of the different existing VLMs on mobile capture data. It contains:

122 unique images 871 question/answers pairs

This dataset is a collection of "mobile capture" images, i.e. images made from a cellphone. Most existing benchmarks rely on document scans/PDFs (DocVQA, ChartQA) or scene text recognition (TextVQA) but overlook the unique challenges that mobile capture poses:

poor… See the full description on the dataset page: https://huggingface.co/datasets/arnaudstiegler/mobile_capture_vqa.
h
VoQA
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
An JiaNing, VoQA [Dataset]. https://huggingface.co/datasets/AJN-AI/VoQA
Explore at:
Authors
An JiaNing
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Resources

Paper: VoQA: Visual-only Question Answering Evaluation code and task brief introduction: LuyangJ/VoQA (Github) Dataset folders: train (VoQA Training Dataset, 3.35M samples) test (VoQA Benchmark, 134k samples)

VoQA Benchmark Sub-tasks

The VoQA evaluation dataset (VoQA Benchmark) includes the following tasks:

GQA POPE ScienceQA TextVQA VQAv2

Each task contains two data formats:

VoQA Task: Watermark rendering images Traditional VQA Task: original… See the full description on the dataset page: https://huggingface.co/datasets/AJN-AI/VoQA.
h
test-public
huggingface.co
Updated Apr 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
test-public [Dataset]. https://huggingface.co/datasets/taiseimatsuoka/test-public
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 8, 2024
Authors
taisei matsuoka
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
nanoLLaVA - Sub 1B Vision-Language Model

Description

nanoLLaVA is a "small but mighty" 1B vision-language model designed to run efficiently on edge devices.

Base LLM: Quyen-SE-v0.1 (Qwen1.5-0.5B) Vision Encoder: google/siglip-so400m-patch14-384

Model VQA v2 TextVQA ScienceQA POPE MMMU (Test) MMMU (Eval) GQA MM-VET

Score 70.84 46.71 58.97 84.1 28.6 30.4 54.79 23.9

Training Data

Training Data will be released later as I am still writing a… See the full description on the dataset page: https://huggingface.co/datasets/taiseimatsuoka/test-public.
h
WeThink-Multimodal-Reasoning-120K
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WeThink, WeThink-Multimodal-Reasoning-120K [Dataset]. https://huggingface.co/datasets/WeThink/WeThink-Multimodal-Reasoning-120K
Explore at:
Authors
WeThink
Description
WeThink-Multimodal-Reasoning-120K

Image Type

Images data can be access from https://huggingface.co/datasets/Xkev/LLaVA-CoT-100k

Image Type Source Dataset Images

General Images COCO 25,344

SAM-1B 18,091

Visual Genome 4,441

GQA 3,251

PISC 835

LLaVA 134

Text-Intensive Images TextVQA 25,483

ShareTextVQA 538

DocVQA 4,709

OCR-VQA5,142

ChartQA 21,781

Scientific & Technical GeoQA+ 4,813

ScienceQA 4,990

AI2D 1,812

CLEVR-Math 677… See the full description on the dataset page: https://huggingface.co/datasets/WeThink/WeThink-Multimodal-Reasoning-120K.

Facebook

Twitter

Click to copy link

Link copied

Cite

AI at Meta (2024). textvqa [Dataset]. https://huggingface.co/datasets/facebook/textvqa

textvqa

TextVQA

facebook/textvqa

Explore at:

Dataset updated

May 23, 2024

Dataset authored and provided by

AI at Meta

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

TextVQA requires models to read and reason about text in images to answer questions about them. Specifically, models need to incorporate a new modality of text present in the images and reason over it to answer TextVQA questions. TextVQA dataset contains 45,336 questions over 28,408 images from the OpenImages dataset.

Clear search

Close search

Google apps

Main menu

textvqa

textvqa

TextVQA

Text VQA

textvqa

Viet-ShareGPT-4o-Text-VQA

textvqa-sample

Comparative results based on the TextVQA.

TextOCR

textvqa

spotlight-textvqa-enrichment

textvqa-final

redactable-text-vqa

textvqa-chat

TextVQA_GT_bbox

MMEB-eval-TextVQA-beir-v2

mobile_capture_vqa

VoQA

test-public

WeThink-Multimodal-Reasoning-120K

textvqaSee More Versions

TextVQA

facebook/textvqa

textvqa