2 datasets found
  1. h

    omnidocbench

    • huggingface.co
    Updated Dec 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sami (2024). omnidocbench [Dataset]. https://huggingface.co/datasets/samiuc/omnidocbench
    Explore at:
    Dataset updated
    Dec 11, 2024
    Authors
    Sami
    Description

    OmniDocBench

    English | 简体中文 OmniDocBench is an evaluation dataset for diverse document parsing in real-world scenarios, with the following characteristics:

    Diverse Document Types: The evaluation set contains 981 PDF pages, covering 9 document types, 4 layout types and 3 language types. It has broad coverage including academic papers, financial reports, newspapers, textbooks, handwritten notes, etc. Rich Annotations: Contains location information for 15 block-level (text paragraphs… See the full description on the dataset page: https://huggingface.co/datasets/samiuc/omnidocbench.

  2. h

    OmniDocBench

    • huggingface.co
    Updated Dec 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenDataLab (2024). OmniDocBench [Dataset]. https://huggingface.co/datasets/opendatalab/OmniDocBench
    Explore at:
    Dataset updated
    Dec 11, 2024
    Dataset authored and provided by
    OpenDataLab
    Description

    OmniDocBench

    English | 简体中文 OmniDocBench is an evaluation dataset for diverse document parsing in real-world scenarios, with the following characteristics:

    Diverse Document Types: The evaluation set contains 1355 PDF pages, covering 9 document types, 4 layout types and 3 language types. It has broad coverage including academic papers, financial reports, newspapers, textbooks, handwritten notes, etc. Rich Annotations: Contains location information for 15 block-level (text paragraphs… See the full description on the dataset page: https://huggingface.co/datasets/opendatalab/OmniDocBench.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sami (2024). omnidocbench [Dataset]. https://huggingface.co/datasets/samiuc/omnidocbench

omnidocbench

samiuc/omnidocbench

Explore at:
20 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Dec 11, 2024
Authors
Sami
Description

OmniDocBench

English | 简体中文 OmniDocBench is an evaluation dataset for diverse document parsing in real-world scenarios, with the following characteristics:

Diverse Document Types: The evaluation set contains 981 PDF pages, covering 9 document types, 4 layout types and 3 language types. It has broad coverage including academic papers, financial reports, newspapers, textbooks, handwritten notes, etc. Rich Annotations: Contains location information for 15 block-level (text paragraphs… See the full description on the dataset page: https://huggingface.co/datasets/samiuc/omnidocbench.

Search
Clear search
Close search
Google apps
Main menu