Dataset Overview
The dataset comprises over 137,000 images potentially containing Vietnamese 🇻🇳 textual content. It was curated using the Gemini 1.5 Flash model, currently Google model leading on the WildVision Arena Leaderboard for Visual Question Answering (VQA). Each image is accompanied by a detailed description and 5 self-generated questions and answers related to the textual content within the image. In total, there are more than 822,679 individual questions, encompassing… See the full description on the dataset page: https://huggingface.co/datasets/5CD-AI/Viet-OCR-VQA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TextVQA requires models to read and reason about text in images to answer questions about them. Specifically, models need to incorporate a new modality of text present in the images and reason over it to answer TextVQA questions. TextVQA dataset contains 45,336 questions over 28,408 images from the OpenImages dataset.
Dataset Overview
This dataset is was collected from 2034 Vietnamese 🇻🇳 Receipts MC-OCR 2021 [1]. Each receipt has been analyzed and annotated using advanced Visual Question Answering (VQA) techniques to produce a comprehensive dataset. There is a set of 14,238 detailed descriptions, key information extraction (KIE), and query-based questions and answers generated by the Gemini 1.5 Flash model, currently Google's leading model on the WildVision Arena Leaderboard. This results in a… See the full description on the dataset page: https://huggingface.co/datasets/5CD-AI/Viet-Receipt-VQA.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Dataset Overview
The dataset comprises over 137,000 images potentially containing Vietnamese 🇻🇳 textual content. It was curated using the Gemini 1.5 Flash model, currently Google model leading on the WildVision Arena Leaderboard for Visual Question Answering (VQA). Each image is accompanied by a detailed description and 5 self-generated questions and answers related to the textual content within the image. In total, there are more than 822,679 individual questions, encompassing… See the full description on the dataset page: https://huggingface.co/datasets/5CD-AI/Viet-OCR-VQA.