The dataset is aimed to perform Visual Question Answering on multipage industry scanned documents. The questions and answers are reused from Single Page DocVQA (SP-DocVQA) dataset. The images also corresponds to the same in original dataset with previous and posterior pages with a limit of up to 20 pages per document.
Dataset Overview
This dataset is a continuation of the ongoing work from Viet Document VAQ dataset was collected from 64,765 pages of Vietnamese 🇻🇳 textbooks( Sách bài tập, chuyên đề, sách giáo án của Bộ GDĐT, Cánh Diều, Chân trời sáng tạo, Kết nối tri thức), spanning all subjects from grades 1 to 12. Each page has been analyzed and annotated using advanced Visual Question Answering (VQA) techniques to produce a comprehensive dataset. There is a set of 388,277 detailed… See the full description on the dataset page: https://huggingface.co/datasets/5CD-AI/Viet-Doc-VQA-II.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Large-scale Multi-modality Models Evaluation Suite
Accelerating the development of large-scale multi-modality models (LMMs) with lmms-eval
🏠 Homepage | 📚 Documentation | 🤗 Huggingface Datasets
This Dataset
This is a formatted version of DocVQA. It is used in our lmms-eval pipeline to allow for one-click evaluations of large multi-modality models. @article{mathew2020docvqa, title={DocVQA: A Dataset for VQA on Document Images. CoRR abs/2007.00398 (2020)}… See the full description on the dataset page: https://huggingface.co/datasets/lmms-lab/DocVQA.
HuggingFaceM4/DocumentVQA dataset hosted on Hugging Face and contributed by the HF Datasets community
The Document Conversion and Retrieval System (DOCRS) is a repository of building construction and real property based documents that have been completed. The documents are archival in nature and the system is accessed by CFM personnel and authorized station engineering personnel. Access to these documents limited due to security concerns because many of the documents are building plans type documents for structures throughout the VA. DOCRS is a web based system hosted within the VA intranet.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
JA-VG-VQA-500
Dataset Description
JA-VG-VQA-500 is a 500-sample subset of Japanese Visual Genome VQA dataset. This dataset was used in the evaluation of EvoVLM-JP-v1-7B. Please refer to our report and blog for more details. We are grateful to the developers for making the dataset available under Creative Commons Attribution 4.0 License.
Visual Genome Japanese Visual Genome VQA dataset
Usage
Use the code below to get started with the dataset. from datasets… See the full description on the dataset page: https://huggingface.co/datasets/SakanaAI/JA-VG-VQA-500.
Beginning with the Government Paperwork Elimination Act of 1998 (GPEA), the Federal government has encouraged the use of electronic / digital signatures to enable electronic transactions with agencies, while still providing a means for proof of user consent and non-repudiation. To support this capability, some means of reliable user identity management must exist. Currently, Veterans have to physically print, sign, and mail various documents that, in turn, need to be processed by VA. This process creates a huge inconvenience on the part of the veteran and a financial burden on VA. eSig enables veterans and their surrogates to digitally sign forms that require a high level of verification that the user signing the document is a legitimate and authorized user. In addition, eSig provides a mechanism for VA applications to verify the authenticity of user documents and data integrity on user forms. This capability is enabled by the eSig service. The eSig service signing process includes the following steps: 1. Form Signing Attestation: The user affirms their intent to electronically sign the document and understands re-authentication is part of that process. 2. Re-Authentication: The user must refresh their authentication by repeating the authentication process. 3. Form Signing: The form and the identity of the user are presented to the eSig service, where they are digitally bound and secured. 4. Form Storage: The signed form must be stored for later validation. In this process, the application is entirely responsible for steps 1, 2, and 4. In step 3, the application must use the eSig web service to request signing of the document. The following table lists the detailed functions offered by the eSig service.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The dataset is aimed to perform Visual Question Answering on multipage industry scanned documents. The questions and answers are reused from Single Page DocVQA (SP-DocVQA) dataset. The images also corresponds to the same in original dataset with previous and posterior pages with a limit of up to 20 pages per document.