2 datasets found

h
omnidocbench
huggingface.co
Updated Dec 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sami (2024). omnidocbench [Dataset]. https://huggingface.co/datasets/samiuc/omnidocbench
Explore at:
Dataset updated
Dec 11, 2024
Authors
Sami
Description
OmniDocBench

English | 简体中文 OmniDocBench is an evaluation dataset for diverse document parsing in real-world scenarios, with the following characteristics:

Diverse Document Types: The evaluation set contains 981 PDF pages, covering 9 document types, 4 layout types and 3 language types. It has broad coverage including academic papers, financial reports, newspapers, textbooks, handwritten notes, etc. Rich Annotations: Contains location information for 15 block-level (text paragraphs… See the full description on the dataset page: https://huggingface.co/datasets/samiuc/omnidocbench.
h
OmniDocBench
huggingface.co
Updated Dec 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenDataLab (2024). OmniDocBench [Dataset]. https://huggingface.co/datasets/opendatalab/OmniDocBench
Explore at:
Dataset updated
Dec 11, 2024
Dataset authored and provided by
OpenDataLab
Description
OmniDocBench

English | 简体中文 OmniDocBench is an evaluation dataset for diverse document parsing in real-world scenarios, with the following characteristics:

Diverse Document Types: The evaluation set contains 1355 PDF pages, covering 9 document types, 4 layout types and 3 language types. It has broad coverage including academic papers, financial reports, newspapers, textbooks, handwritten notes, etc. Rich Annotations: Contains location information for 15 block-level (text paragraphs… See the full description on the dataset page: https://huggingface.co/datasets/opendatalab/OmniDocBench.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Sami (2024). omnidocbench [Dataset]. https://huggingface.co/datasets/samiuc/omnidocbench

omnidocbench

samiuc/omnidocbench

Explore at:

20 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Dec 11, 2024

Authors

Sami

Description

OmniDocBench

English | 简体中文 OmniDocBench is an evaluation dataset for diverse document parsing in real-world scenarios, with the following characteristics:

Diverse Document Types: The evaluation set contains 981 PDF pages, covering 9 document types, 4 layout types and 3 language types. It has broad coverage including academic papers, financial reports, newspapers, textbooks, handwritten notes, etc. Rich Annotations: Contains location information for 15 block-level (text paragraphs… See the full description on the dataset page: https://huggingface.co/datasets/samiuc/omnidocbench.

Clear search

Close search

Google apps

Main menu