3 datasets found

h
OmniDocBench
huggingface.co
Updated Dec 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenDataLab (2024). OmniDocBench [Dataset]. https://huggingface.co/datasets/opendatalab/OmniDocBench
Explore at:
Dataset updated
Dec 11, 2024
Dataset authored and provided by
OpenDataLab
Description
OmniDocBench

English | 简体中文 OmniDocBench is an evaluation dataset for diverse document parsing in real-world scenarios, with the following characteristics:

Diverse Document Types: The evaluation set contains 1355 PDF pages, covering 9 document types, 4 layout types and 3 language types. It has broad coverage including academic papers, financial reports, newspapers, textbooks, handwritten notes, etc. Rich Annotations: Contains location information for 15 block-level (text paragraphs… See the full description on the dataset page: https://huggingface.co/datasets/opendatalab/OmniDocBench.
h
OmniDocBench
huggingface.co
Updated Jan 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Quivr (2025). OmniDocBench [Dataset]. https://huggingface.co/datasets/Quivr/OmniDocBench
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 31, 2025
Dataset authored and provided by
Quivr
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Forked from opendatalab/OmniDocBench.

Sampler

We have added a simple Python tool for filtering and performing stratified sampling on OmniDocBench data.

Features

Filter JSON entries based on custom criteria Perform stratified sampling based on multiple categories Handle nested JSON fields

Installation Local Development Install (Recommended)

git clone https://huggingface.co/Quivr/OmniDocBench.git cd OmniDocBench pip install -r requirements.txt #… See the full description on the dataset page: https://huggingface.co/datasets/Quivr/OmniDocBench.
h
zh_en_rec_bench
huggingface.co
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
puhuilab (2025). zh_en_rec_bench [Dataset]. https://huggingface.co/datasets/puhuilab/zh_en_rec_bench
Explore at:
Dataset updated
Jun 26, 2025
Authors
puhuilab
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
ZH_EN_RecBench

zh_en_rec_bench is a benchmark dataset designed to evaluate the robustness and generalization capabilities of text recognition models across multiple scenarios and both Chinese and English scripts. It is constructed by sampling and manually correcting subsets of data from OmniDocBench and TC-STR, with erroneous ground truth labels revised to ensure high-quality evaluation.

Dataset Overview

This benchmark includes four distinct text recognition scenarios:… See the full description on the dataset page: https://huggingface.co/datasets/puhuilab/zh_en_rec_bench.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

OpenDataLab (2024). OmniDocBench [Dataset]. https://huggingface.co/datasets/opendatalab/OmniDocBench

OmniDocBench

opendatalab/OmniDocBench

Explore at:

20 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Dec 11, 2024

Dataset authored and provided by

OpenDataLab

Description

OmniDocBench

English | 简体中文 OmniDocBench is an evaluation dataset for diverse document parsing in real-world scenarios, with the following characteristics:

Diverse Document Types: The evaluation set contains 1355 PDF pages, covering 9 document types, 4 layout types and 3 language types. It has broad coverage including academic papers, financial reports, newspapers, textbooks, handwritten notes, etc. Rich Annotations: Contains location information for 15 block-level (text paragraphs… See the full description on the dataset page: https://huggingface.co/datasets/opendatalab/OmniDocBench.

Clear search

Close search

Google apps

Main menu

OmniDocBench

OmniDocBench

zh_en_rec_bench

OmniDocBenchSee More Versions

opendatalab/OmniDocBench

OmniDocBench