9 datasets found

h
ocr-receipts-text-detection
huggingface.co
Updated Sep 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2023). ocr-receipts-text-detection [Dataset]. https://huggingface.co/datasets/TrainingDataPro/ocr-receipts-text-detection
Explore at:
Dataset updated
Sep 19, 2023
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
The Grocery Store Receipts Dataset is a collection of photos captured from various grocery store receipts. This dataset is specifically designed for tasks related to Optical Character Recognition (OCR) and is useful for retail. Each image in the dataset is accompanied by bounding box annotations, indicating the precise locations of specific text segments on the receipts. The text segments are categorized into four classes: item, store, date_time and total.
h
receipts-google-ocr
huggingface.co
Updated May 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Mayes (2024). receipts-google-ocr [Dataset]. https://huggingface.co/datasets/amaye15/receipts-google-ocr
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 15, 2024
Authors
Andrew Mayes
Description
amaye15/receipts-google-ocr dataset hosted on Hugging Face and contributed by the HF Datasets community
h
CORU
huggingface.co
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdelrahman Abdallah (2025). CORU [Dataset]. https://huggingface.co/datasets/abdoelsayed/CORU
Explore at:
Dataset updated
Jun 19, 2025
Authors
Abdelrahman Abdallah
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
ReceiptSense: Beyond Traditional OCR - A Dataset for Receipt Understanding

🔥 News

[2024] ReceiptSense dataset is now publicly available! [2024] Paper accepted and published

📖 Abstract

Multilingual OCR and information extraction from receipts remains challenging, particularly for complex scripts like Arabic. We introduce ReceiptSense, a comprehensive dataset designed for Arabic-English receipt understanding comprising:

20,000 annotated receipts… See the full description on the dataset page: https://huggingface.co/datasets/abdoelsayed/CORU.
h
SROIE_2019_text_recognition
huggingface.co
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
priyank (2025). SROIE_2019_text_recognition [Dataset]. https://huggingface.co/datasets/priyank-m/SROIE_2019_text_recognition
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 21, 2025
Authors
priyank
License
https://choosealicense.com/licenses/undefined/https://choosealicense.com/licenses/undefined/
Description
This dataset we prepared using the Scanned receipts OCR and information extraction(SROIE) dataset. The SROIE dataset contains 973 scanned receipts in English language. Cropping the bounding boxes from each of the receipts to generate this text-recognition dataset resulted in 33626 images for train set and 18704 images for the test set. The text annotations for all the images inside a split are stored in a metadata.jsonl file. usage: from dataset import load_dataset data =… See the full description on the dataset page: https://huggingface.co/datasets/priyank-m/SROIE_2019_text_recognition.
h
invoices-and-receipts_ocr_v1
huggingface.co
Updated Oct 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
minyang (2023). invoices-and-receipts_ocr_v1 [Dataset]. https://huggingface.co/datasets/mychen76/invoices-and-receipts_ocr_v1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 28, 2023
Authors
minyang
Description
Dataset Card for "invoices-and-receipts_ocr_v1"

More Information needed
h
Chinese-OCR
huggingface.co
Updated Sep 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
longmaodata (2024). Chinese-OCR [Dataset]. https://huggingface.co/datasets/longmaodata/Chinese-OCR
Explore at:
Dataset updated
Sep 12, 2024
Authors
longmaodata
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Join the group

🚀🚀🚀🚀https://t.me/+Y5kL2iHis9A0ZWI1 ✅ No need to apply for direct access to other datasets ✅ Mutual communication within the industry ✅ Get more information and consultation ✅ Timely dataset update notifications

Chinese OCR

Scene Types: Natural, Reshot, Screenshots Collection Environments: Magazines, Newspapers, Books, Signage, Receipts, Maps, PPTs, Menus, Product Packaging, Train Tickets, Banners, Bulletin Boards, Cards Lighting Distribution: Normal… See the full description on the dataset page: https://huggingface.co/datasets/longmaodata/Chinese-OCR.
h
invoice-ocr-json
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gokul Raja R, invoice-ocr-json [Dataset]. https://huggingface.co/datasets/GokulRajaR/invoice-ocr-json
Explore at:
Authors
Gokul Raja R
Description
Invoice OCR Dataset

This dataset contains annotated invoice images and their corresponding OCR-extracted text in structured JSON format. The data was originally sourced from an open-source invoice dataset and processed using the GPT-4o mini model to extract relevant fields such as invoice number, date, total amount, vendor, and line items.

Dataset Details Dataset Description

This dataset is designed to support training and evaluation of document understanding… See the full description on the dataset page: https://huggingface.co/datasets/GokulRajaR/invoice-ocr-json.
h
Viet-Receipt-VQA
huggingface.co
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fifth Civil Defender - 5CD (2024). Viet-Receipt-VQA [Dataset]. https://huggingface.co/datasets/5CD-AI/Viet-Receipt-VQA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 14, 2024
Dataset authored and provided by
Fifth Civil Defender - 5CD
Area covered
Việt Nam
Description
Dataset Overview

This dataset is was collected from 2034 Vietnamese 🇻🇳 Receipts MC-OCR 2021 [1]. Each receipt has been analyzed and annotated using advanced Visual Question Answering (VQA) techniques to produce a comprehensive dataset. There is a set of 14,238 detailed descriptions, key information extraction (KIE), and query-based questions and answers generated by the Gemini 1.5 Flash model, currently Google's leading model on the WildVision Arena Leaderboard. This results in a… See the full description on the dataset page: https://huggingface.co/datasets/5CD-AI/Viet-Receipt-VQA.
h
sroie_ner_instruct
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuja Toor, sroie_ner_instruct [Dataset]. https://huggingface.co/datasets/shujatoor/sroie_ner_instruct
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Shuja Toor
Description
Instruction Dataset to Finetune model for Named Entity Recognition (NER)

This instruction dataset can be used to fine tune your model for the purpose of performing Named Entity Recognition (NER) This dataset contains 5.27k instruction examples. The dataset was created using SROIE Dataset which contains 973 receipts. Paddleocr was used to perform OCR for the original receipts.

Summary:

Original Receipts Used ~ 973 Library used to perform OCR: Paddleocr Dataset created… See the full description on the dataset page: https://huggingface.co/datasets/shujatoor/sroie_ner_instruct.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Training Data (2023). ocr-receipts-text-detection [Dataset]. https://huggingface.co/datasets/TrainingDataPro/ocr-receipts-text-detection

ocr-receipts-text-detection

TrainingDataPro/ocr-receipts-text-detection

Explore at:

Dataset updated

Sep 19, 2023

Authors

Training Data

License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

The Grocery Store Receipts Dataset is a collection of photos captured from various grocery store receipts. This dataset is specifically designed for tasks related to Optical Character Recognition (OCR) and is useful for retail. Each image in the dataset is accompanied by bounding box annotations, indicating the precise locations of specific text segments on the receipts. The text segments are categorized into four classes: item, store, date_time and total.

Clear search

Close search

Google apps

Main menu

ocr-receipts-text-detection

receipts-google-ocr

CORU

SROIE_2019_text_recognition

invoices-and-receipts_ocr_v1

Chinese-OCR

invoice-ocr-json

Viet-Receipt-VQA

sroie_ner_instruct

ocr-receipts-text-detectionSee More Versions

TrainingDataPro/ocr-receipts-text-detection

ocr-receipts-text-detection