3 datasets found
  1. h

    invoices-donut-data-v1-with-ocr

    • huggingface.co
    Updated Mar 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Pansa (2019). invoices-donut-data-v1-with-ocr [Dataset]. https://huggingface.co/datasets/MJPansa/invoices-donut-data-v1-with-ocr
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 23, 2019
    Authors
    Marco Pansa
    Description

    bbox column is [x, y, width, height] ymean is y position of the mean of the box line is the line number calculated using ymean

  2. synthdog-ko

    • huggingface.co
    Updated Dec 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NAVER CLOVA INFORMATION EXTRACTION (2024). synthdog-ko [Dataset]. https://huggingface.co/datasets/naver-clova-ix/synthdog-ko
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 13, 2024
    Dataset provided by
    Naver Corporationhttp://www.navercorp.com/
    Authors
    NAVER CLOVA INFORMATION EXTRACTION
    Description

    Donut 🍩 : OCR-Free Document Understanding Transformer (ECCV 2022) -- SynthDoG datasets

    For more information, please visit https://github.com/clovaai/donut

    The links to the SynthDoG-generated datasets are here:

    synthdog-en: English, 0.5M. synthdog-zh: Chinese, 0.5M. synthdog-ja: Japanese, 0.5M. synthdog-ko: Korean, 0.5M.

    To generate synthetic datasets with our SynthDoG, please see ./synthdog/README.md and our paper for details.

      How to Cite
    

    If you find this work useful… See the full description on the dataset page: https://huggingface.co/datasets/naver-clova-ix/synthdog-ko.

  3. donut_vqa

    • huggingface.co
    Updated Jul 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jina AI (2025). donut_vqa [Dataset]. https://huggingface.co/datasets/jinaai/donut_vqa
    Explore at:
    Dataset updated
    Jul 20, 2025
    Dataset authored and provided by
    Jina AI
    Description

    DonutVQA Dataset

    This dataset is derived from the donut-vqa dataset, reformatting the test split with modified field names, so that it can be used in the ViDoRe benchmark. The text_description column contains OCR text extracted from the images using EasyOCR.

      Disclaimer
    

    This dataset may contain publicly available images or text data. All data is provided for research and educational purposes only. If you are the rights holder of any content and have concerns regarding… See the full description on the dataset page: https://huggingface.co/datasets/jinaai/donut_vqa.

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Marco Pansa (2019). invoices-donut-data-v1-with-ocr [Dataset]. https://huggingface.co/datasets/MJPansa/invoices-donut-data-v1-with-ocr

invoices-donut-data-v1-with-ocr

MJPansa/invoices-donut-data-v1-with-ocr

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 23, 2019
Authors
Marco Pansa
Description

bbox column is [x, y, width, height] ymean is y position of the mean of the box line is the line number calculated using ymean

Search
Clear search
Close search
Google apps
Main menu