Dataset Overview
The dataset comprises over 137,000 images potentially containing Vietnamese 🇻🇳 textual content. It was curated using the Gemini 1.5 Flash model, currently Google model leading on the WildVision Arena Leaderboard for Visual Question Answering (VQA). Each image is accompanied by a detailed description and 5 self-generated questions and answers related to the textual content within the image. In total, there are more than 822,679 individual questions, encompassing… See the full description on the dataset page: https://huggingface.co/datasets/5CD-AI/Viet-OCR-VQA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TextVQA requires models to read and reason about text in images to answer questions about them. Specifically, models need to incorporate a new modality of text present in the images and reason over it to answer TextVQA questions. TextVQA dataset contains 45,336 questions over 28,408 images from the OpenImages dataset.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Dataset Overview
The dataset comprises over 137,000 images potentially containing Vietnamese 🇻🇳 textual content. It was curated using the Gemini 1.5 Flash model, currently Google model leading on the WildVision Arena Leaderboard for Visual Question Answering (VQA). Each image is accompanied by a detailed description and 5 self-generated questions and answers related to the textual content within the image. In total, there are more than 822,679 individual questions, encompassing… See the full description on the dataset page: https://huggingface.co/datasets/5CD-AI/Viet-OCR-VQA.