Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The Grocery Store Receipts Dataset is a collection of photos captured from various grocery store receipts. This dataset is specifically designed for tasks related to Optical Character Recognition (OCR) and is useful for retail. Each image in the dataset is accompanied by bounding box annotations, indicating the precise locations of specific text segments on the receipts. The text segments are categorized into four classes: item, store, date_time and total.
amaye15/receipts-google-ocr dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
ReceiptSense: Beyond Traditional OCR - A Dataset for Receipt Understanding
🔥 News
[2024] ReceiptSense dataset is now publicly available! [2024] Paper accepted and published
📖 Abstract
Multilingual OCR and information extraction from receipts remains challenging, particularly for complex scripts like Arabic. We introduce ReceiptSense, a comprehensive dataset designed for Arabic-English receipt understanding comprising:
20,000 annotated receipts… See the full description on the dataset page: https://huggingface.co/datasets/abdoelsayed/CORU.
https://choosealicense.com/licenses/undefined/https://choosealicense.com/licenses/undefined/
This dataset we prepared using the Scanned receipts OCR and information extraction(SROIE) dataset. The SROIE dataset contains 973 scanned receipts in English language. Cropping the bounding boxes from each of the receipts to generate this text-recognition dataset resulted in 33626 images for train set and 18704 images for the test set. The text annotations for all the images inside a split are stored in a metadata.jsonl file. usage: from dataset import load_dataset data =… See the full description on the dataset page: https://huggingface.co/datasets/priyank-m/SROIE_2019_text_recognition.
Dataset Card for "invoices-and-receipts_ocr_v1"
More Information needed
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Join the group
🚀🚀🚀🚀https://t.me/+Y5kL2iHis9A0ZWI1 ✅ No need to apply for direct access to other datasets ✅ Mutual communication within the industry ✅ Get more information and consultation ✅ Timely dataset update notifications
Chinese OCR
Scene Types: Natural, Reshot, Screenshots Collection Environments: Magazines, Newspapers, Books, Signage, Receipts, Maps, PPTs, Menus, Product Packaging, Train Tickets, Banners, Bulletin Boards, Cards Lighting Distribution: Normal… See the full description on the dataset page: https://huggingface.co/datasets/longmaodata/Chinese-OCR.
Invoice OCR Dataset
This dataset contains annotated invoice images and their corresponding OCR-extracted text in structured JSON format. The data was originally sourced from an open-source invoice dataset and processed using the GPT-4o mini model to extract relevant fields such as invoice number, date, total amount, vendor, and line items.
Dataset Details
Dataset Description
This dataset is designed to support training and evaluation of document understanding… See the full description on the dataset page: https://huggingface.co/datasets/GokulRajaR/invoice-ocr-json.
Dataset Overview
This dataset is was collected from 2034 Vietnamese 🇻🇳 Receipts MC-OCR 2021 [1]. Each receipt has been analyzed and annotated using advanced Visual Question Answering (VQA) techniques to produce a comprehensive dataset. There is a set of 14,238 detailed descriptions, key information extraction (KIE), and query-based questions and answers generated by the Gemini 1.5 Flash model, currently Google's leading model on the WildVision Arena Leaderboard. This results in a… See the full description on the dataset page: https://huggingface.co/datasets/5CD-AI/Viet-Receipt-VQA.
Instruction Dataset to Finetune model for Named Entity Recognition (NER)
This instruction dataset can be used to fine tune your model for the purpose of performing Named Entity Recognition (NER) This dataset contains 5.27k instruction examples. The dataset was created using SROIE Dataset which contains 973 receipts. Paddleocr was used to perform OCR for the original receipts.
Summary:
Original Receipts Used ~ 973 Library used to perform OCR: Paddleocr Dataset created… See the full description on the dataset page: https://huggingface.co/datasets/shujatoor/sroie_ner_instruct.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The Grocery Store Receipts Dataset is a collection of photos captured from various grocery store receipts. This dataset is specifically designed for tasks related to Optical Character Recognition (OCR) and is useful for retail. Each image in the dataset is accompanied by bounding box annotations, indicating the precise locations of specific text segments on the receipts. The text segments are categorized into four classes: item, store, date_time and total.