9 datasets found
  1. h

    ocr-receipts-text-detection

    • huggingface.co
    Updated Sep 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training Data (2023). ocr-receipts-text-detection [Dataset]. https://huggingface.co/datasets/TrainingDataPro/ocr-receipts-text-detection
    Explore at:
    Dataset updated
    Sep 19, 2023
    Authors
    Training Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    The Grocery Store Receipts Dataset is a collection of photos captured from various grocery store receipts. This dataset is specifically designed for tasks related to Optical Character Recognition (OCR) and is useful for retail. Each image in the dataset is accompanied by bounding box annotations, indicating the precise locations of specific text segments on the receipts. The text segments are categorized into four classes: item, store, date_time and total.

  2. h

    receipts-google-ocr

    • huggingface.co
    Updated May 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Mayes (2024). receipts-google-ocr [Dataset]. https://huggingface.co/datasets/amaye15/receipts-google-ocr
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 15, 2024
    Authors
    Andrew Mayes
    Description

    amaye15/receipts-google-ocr dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    CORU

    • huggingface.co
    Updated Jun 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelrahman Abdallah (2025). CORU [Dataset]. https://huggingface.co/datasets/abdoelsayed/CORU
    Explore at:
    Dataset updated
    Jun 19, 2025
    Authors
    Abdelrahman Abdallah
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    ReceiptSense: Beyond Traditional OCR - A Dataset for Receipt Understanding

      🔥 News
    

    [2024] ReceiptSense dataset is now publicly available! [2024] Paper accepted and published

      📖 Abstract
    

    Multilingual OCR and information extraction from receipts remains challenging, particularly for complex scripts like Arabic. We introduce ReceiptSense, a comprehensive dataset designed for Arabic-English receipt understanding comprising:

    20,000 annotated receipts… See the full description on the dataset page: https://huggingface.co/datasets/abdoelsayed/CORU.

  4. h

    SROIE_2019_text_recognition

    • huggingface.co
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    priyank (2025). SROIE_2019_text_recognition [Dataset]. https://huggingface.co/datasets/priyank-m/SROIE_2019_text_recognition
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2025
    Authors
    priyank
    License

    https://choosealicense.com/licenses/undefined/https://choosealicense.com/licenses/undefined/

    Description

    This dataset we prepared using the Scanned receipts OCR and information extraction(SROIE) dataset. The SROIE dataset contains 973 scanned receipts in English language. Cropping the bounding boxes from each of the receipts to generate this text-recognition dataset resulted in 33626 images for train set and 18704 images for the test set. The text annotations for all the images inside a split are stored in a metadata.jsonl file. usage: from dataset import load_dataset data =… See the full description on the dataset page: https://huggingface.co/datasets/priyank-m/SROIE_2019_text_recognition.

  5. h

    invoices-and-receipts_ocr_v1

    • huggingface.co
    Updated Oct 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    minyang (2023). invoices-and-receipts_ocr_v1 [Dataset]. https://huggingface.co/datasets/mychen76/invoices-and-receipts_ocr_v1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 28, 2023
    Authors
    minyang
    Description

    Dataset Card for "invoices-and-receipts_ocr_v1"

    More Information needed

  6. h

    Chinese-OCR

    • huggingface.co
    Updated Sep 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    longmaodata (2024). Chinese-OCR [Dataset]. https://huggingface.co/datasets/longmaodata/Chinese-OCR
    Explore at:
    Dataset updated
    Sep 12, 2024
    Authors
    longmaodata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Join the group

    🚀🚀🚀🚀https://t.me/+Y5kL2iHis9A0ZWI1 ✅ No need to apply for direct access to other datasets ✅ Mutual communication within the industry ✅ Get more information and consultation ✅ Timely dataset update notifications

      Chinese OCR
    

    Scene Types: Natural, Reshot, Screenshots Collection Environments: Magazines, Newspapers, Books, Signage, Receipts, Maps, PPTs, Menus, Product Packaging, Train Tickets, Banners, Bulletin Boards, Cards Lighting Distribution: Normal… See the full description on the dataset page: https://huggingface.co/datasets/longmaodata/Chinese-OCR.

  7. h

    invoice-ocr-json

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gokul Raja R, invoice-ocr-json [Dataset]. https://huggingface.co/datasets/GokulRajaR/invoice-ocr-json
    Explore at:
    Authors
    Gokul Raja R
    Description

    Invoice OCR Dataset

    This dataset contains annotated invoice images and their corresponding OCR-extracted text in structured JSON format. The data was originally sourced from an open-source invoice dataset and processed using the GPT-4o mini model to extract relevant fields such as invoice number, date, total amount, vendor, and line items.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    This dataset is designed to support training and evaluation of document understanding… See the full description on the dataset page: https://huggingface.co/datasets/GokulRajaR/invoice-ocr-json.

  8. h

    Viet-Receipt-VQA

    • huggingface.co
    Updated Oct 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fifth Civil Defender - 5CD (2024). Viet-Receipt-VQA [Dataset]. https://huggingface.co/datasets/5CD-AI/Viet-Receipt-VQA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 14, 2024
    Dataset authored and provided by
    Fifth Civil Defender - 5CD
    Area covered
    Việt Nam
    Description

    Dataset Overview

    This dataset is was collected from 2034 Vietnamese 🇻🇳 Receipts MC-OCR 2021 [1]. Each receipt has been analyzed and annotated using advanced Visual Question Answering (VQA) techniques to produce a comprehensive dataset. There is a set of 14,238 detailed descriptions, key information extraction (KIE), and query-based questions and answers generated by the Gemini 1.5 Flash model, currently Google's leading model on the WildVision Arena Leaderboard. This results in a… See the full description on the dataset page: https://huggingface.co/datasets/5CD-AI/Viet-Receipt-VQA.

  9. h

    sroie_ner_instruct

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuja Toor, sroie_ner_instruct [Dataset]. https://huggingface.co/datasets/shujatoor/sroie_ner_instruct
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Shuja Toor
    Description

    Instruction Dataset to Finetune model for Named Entity Recognition (NER)

    This instruction dataset can be used to fine tune your model for the purpose of performing Named Entity Recognition (NER) This dataset contains 5.27k instruction examples. The dataset was created using SROIE Dataset which contains 973 receipts. Paddleocr was used to perform OCR for the original receipts.

      Summary:
    

    Original Receipts Used ~ 973 Library used to perform OCR: Paddleocr Dataset created… See the full description on the dataset page: https://huggingface.co/datasets/shujatoor/sroie_ner_instruct.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Training Data (2023). ocr-receipts-text-detection [Dataset]. https://huggingface.co/datasets/TrainingDataPro/ocr-receipts-text-detection

ocr-receipts-text-detection

TrainingDataPro/ocr-receipts-text-detection

Explore at:
Dataset updated
Sep 19, 2023
Authors
Training Data
License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

The Grocery Store Receipts Dataset is a collection of photos captured from various grocery store receipts. This dataset is specifically designed for tasks related to Optical Character Recognition (OCR) and is useful for retail. Each image in the dataset is accompanied by bounding box annotations, indicating the precise locations of specific text segments on the receipts. The text segments are categorized into four classes: item, store, date_time and total.

Search
Clear search
Close search
Google apps
Main menu