Facebook
TwitterOmniDocBench
English | 简体中文 OmniDocBench is an evaluation dataset for diverse document parsing in real-world scenarios, with the following characteristics:
Diverse Document Types: The evaluation set contains 981 PDF pages, covering 9 document types, 4 layout types and 3 language types. It has broad coverage including academic papers, financial reports, newspapers, textbooks, handwritten notes, etc. Rich Annotations: Contains location information for 15 block-level (text paragraphs… See the full description on the dataset page: https://huggingface.co/datasets/samiuc/omnidocbench.
Facebook
TwitterOmniDocBench
English | 简体中文 OmniDocBench is an evaluation dataset for diverse document parsing in real-world scenarios, with the following characteristics:
Diverse Document Types: The evaluation set contains 1355 PDF pages, covering 9 document types, 4 layout types and 3 language types. It has broad coverage including academic papers, financial reports, newspapers, textbooks, handwritten notes, etc. Rich Annotations: Contains location information for 15 block-level (text paragraphs… See the full description on the dataset page: https://huggingface.co/datasets/opendatalab/OmniDocBench.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterOmniDocBench
English | 简体中文 OmniDocBench is an evaluation dataset for diverse document parsing in real-world scenarios, with the following characteristics:
Diverse Document Types: The evaluation set contains 981 PDF pages, covering 9 document types, 4 layout types and 3 language types. It has broad coverage including academic papers, financial reports, newspapers, textbooks, handwritten notes, etc. Rich Annotations: Contains location information for 15 block-level (text paragraphs… See the full description on the dataset page: https://huggingface.co/datasets/samiuc/omnidocbench.