DocVQA consists of 50,000 questions defined on 12,000+ document images.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Large-scale Multi-modality Models Evaluation Suite
Accelerating the development of large-scale multi-modality models (LMMs) with lmms-eval
🏠 Homepage | 📚 Documentation | 🤗 Huggingface Datasets
This Dataset
This is a formatted version of DocVQA. It is used in our lmms-eval pipeline to allow for one-click evaluations of large multi-modality models. @article{mathew2020docvqa, title={DocVQA: A Dataset for VQA on Document Images. CoRR abs/2007.00398 (2020)}… See the full description on the dataset page: https://huggingface.co/datasets/lmms-lab/DocVQA.
The dataset is aimed to perform Visual Question Answering on multipage industry scanned documents. The questions and answers are reused from Single Page DocVQA (SP-DocVQA) dataset. The images also corresponds to the same in original dataset with previous and posterior pages with a limit of up to 20 pages per document.
lmms-lab/MP-DocVQA dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Description
This is the test set taken from the DocVQA dataset. It includes collected images from the UCSF Industry Documents Library. Questions and answers were manually annotated. Example of data (see viewer)
Data Curation
To ensure homogeneity across our benchmarked datasets, we subsampled the original test set to 500 pairs and renamed the different columns.
Load the dataset
from datasets import load_dataset ds =… See the full description on the dataset page: https://huggingface.co/datasets/vidore/docvqa_test_subsampled.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for DocVQA Dataset
Dataset Summary
DocVQA dataset is a document dataset introduced in Mathew et al. (2021) consisting of 50,000 questions defined on 12,000+ document images. Please visit the challenge page (https://rrc.cvc.uab.es/?ch=17) and paper (https://arxiv.org/abs/2007.00398) for further information.
Usage
This dataset can be used with current releases of Hugging Face datasets library. Here is an example using a custom collator to bundle… See the full description on the dataset page: https://huggingface.co/datasets/pixparse/docvqa-single-page-questions.
vikhyatk/docvqa-val dataset hosted on Hugging Face and contributed by the HF Datasets community
https://rrc.cvc.uab.es/?ch=17&com=downloadshttps://rrc.cvc.uab.es/?ch=17&com=downloads
Document Visual Question Answering (DocVQA) seeks to inspire a “purpose-driven” point of view in Document Analysis and Recognition research, where the document content is extracted and used to respond to high-level tasks defined by the human consumers of this information. To this end we organize a series of challenges and release datasets to enable machines "understand" document images and thereby answer questions asked on them. There are 50 K questions and 12K Images in the dataset. Images are collected from UCSF Industry Documents Library. Questions and answers are manually annotated.
Dataset Description
This is a VQA dataset based on Industrial Documents from MP-DocVQA dataset from MP-DocVQA.
Load the dataset
from datasets import load_dataset import csv
def load_beir_qrels(qrels_file): qrels = {} with open(qrels_file) as f: tsvreader = csv.DictReader(f, delimiter="\t") for row in tsvreader: qid = row["query-id"] pid = row["corpus-id"] rel = int(row["score"]) if qid in qrels:… See the full description on the dataset page: https://huggingface.co/datasets/openbmb/VisRAG-Ret-Test-MP-DocVQA.
The dataset used for testing the Vary-base model, containing DocVQA and ChartQA datasets.
InfographicVQA is a dataset that comprises a diverse collection of infographics along with natural language questions and answers annotations. The collected questions require methods to jointly reason over the document layout, textual content, graphical elements, and data visualizations. We curate the dataset with emphasis on questions that require elementary reasoning and basic arithmetic skills.
TextVQA is a dataset to benchmark visual reasoning based on text in images. TextVQA requires models to read and reason about text in images to answer questions about them. Specifically, models need to incorporate a new modality of text present in the images and reason over it to answer TextVQA questions.
Statistics * 28,408 images from OpenImages * 45,336 questions * 453,360 ground truth answers
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset description
The doc-vqa Dataset integrates images from the Infographic_vqa dataset sourced from HuggingFaceM4 The Cauldron dataset, as well as images from the dataset AFTDB (Arxiv Figure Table Database) curated by cmarkea. This dataset consists of pairs of images and corresponding text, with each image linked to an average of five questions and answers available in both English and French. These questions and answers were generated using Gemini 1.5 Pro, thereby… See the full description on the dataset page: https://huggingface.co/datasets/cmarkea/doc-vqa.
RIPS-Goog-23/DocVQA dataset hosted on Hugging Face and contributed by the HF Datasets community
plaguss/docvqa-test dataset hosted on Hugging Face and contributed by the HF Datasets community
HuggingFaceM4/DocumentVQA dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for "docvqa"
More Information needed
Disclaimer
This dataset may contain publicly available images or text data. All data is provided for research and educational purposes only. If you are the rights holder of any content and have concerns regarding intellectual property or copyright, please contact us at "support-data (at) jina.ai" for removal. We do not collect or process personal, sensitive, or private information intentionally. If you believe this dataset includes such content (e.g., portraits, location-linked… See the full description on the dataset page: https://huggingface.co/datasets/jinaai/docvqa.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
llamastack/docVQA dataset hosted on Hugging Face and contributed by the HF Datasets community
DocVQA consists of 50,000 questions defined on 12,000+ document images.