86 datasets found
  1. Invoices Dataset

    • kaggle.com
    zip
    Updated Jan 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cankat Saraç (2022). Invoices Dataset [Dataset]. https://www.kaggle.com/datasets/cankatsrc/invoices
    Explore at:
    zip(574249 bytes)Available download formats
    Dataset updated
    Jan 18, 2022
    Authors
    Cankat Saraç
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    The invoice dataset provided is a mock dataset generated using the Python Faker library. It has been designed to mimic the format of data collected from an online store. The dataset contains various fields, including first name, last name, email, product ID, quantity, amount, invoice date, address, city, and stock code. All of the data in the dataset is randomly generated and does not represent actual individuals or products. The dataset can be used for various purposes, including testing algorithms or models related to invoice management, e-commerce, or customer behavior analysis. The data in this dataset can be used to identify trends, patterns, or anomalies in online shopping behavior, which can help businesses to optimize their online sales strategies.

  2. h

    invoices-example

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parsee.ai, invoices-example [Dataset]. https://huggingface.co/datasets/parsee-ai/invoices-example
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    Parsee.ai
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Inoices Sample Dataset

    This is a sample dataset generated on app.parsee.ai for invoices. The goal was to evaluate different LLMs on this RAG task using the Parsee evaluation tools. A full study can be found here: https://github.com/parsee-ai/parsee-datasets/blob/main/datasets/invoices/parsee-loader/README.md parsee-core version used: 0.1.3.11 This dataset was created on the basis of 15 sample invoices (PDF files). All PDF files are publicly accessible on parsee.ai, to access them… See the full description on the dataset page: https://huggingface.co/datasets/parsee-ai/invoices-example.

  3. m

    Samples of electronic invoices

    • data.mendeley.com
    Updated Jun 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marek Kozłowski (2021). Samples of electronic invoices [Dataset]. http://doi.org/10.17632/tnj49gpmtz.2
    Explore at:
    Dataset updated
    Jun 1, 2021
    Authors
    Marek Kozłowski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Electronic invoices have become the product of the information age, increasing their utility on the nowadays market. Looking at real electronic invoices across the globe, we have come up with sufficient placement of the information. Each detail has been generated in a programmable way using Python programs. Billing information is minimalistic to omit or lower the chance of fraud detection. The process of collecting each product has been achieved by scrapping popular online marketplaces. As a result, categorized groups have been created to imitate a manner of the persona. The direction of the potential reusability is heading towards becoming an input of the machine learning fraud detection algorithms or data extraction mechanisms. Datasets presents 1000 samples each of auto-generated invoices containing: - valid information. - valid information with colored iban background. RGB color of a background varies between (255,255,240) to (255,255,254). - valid information with modified space between iban characters. Charspace coefficient varies between 0.001 to 1.

    Both ends of a special invoice modifier represents a domain from detectable to non-detectable factor by a human eye. Nomenclature: invoice_

  4. High-Quality Invoice Images for OCR

    • kaggle.com
    zip
    Updated May 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    osama hosam Abdellatif (2025). High-Quality Invoice Images for OCR [Dataset]. https://www.kaggle.com/datasets/osamahosamabdellatif/high-quality-invoice-images-for-ocr
    Explore at:
    zip(1221649842 bytes)Available download formats
    Dataset updated
    May 9, 2025
    Authors
    osama hosam Abdellatif
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    High-Quality Invoice Images for OCR

    Overview

    High-Quality Invoice Images for OCR is a curated dataset containing professionally scanned and digitally captured invoice documents. It is designed for training, fine-tuning, and evaluating OCR models, machine learning pipelines, and data extraction systems.

    This dataset focuses on clean, structured invoices to simulate real-world scenarios in financial document automation.

    What's Inside

    📄 Variety of invoice templates from multiple industries (e.g., retail, manufacturing, services)

    🖋️ Different currencies, tax formats, and layouts

    📸 High-resolution scanned and photographed invoices

    🏷️ Optional field annotations (e.g., invoice number, date, total amount, vendor name) for supervised training

    Key Applications

    Training and fine-tuning OCR and Document AI models

    Machine learning for structured and semi-structured data extraction

    Intelligent Document Processing (IDP) and Robotic Process Automation (RPA)

    Benchmarking table detection, key-value extraction, and layout analysis models

    Why Use This Dataset?

    ✅ High-quality images optimized for OCR and data extraction tasks

    ✅ Real-world invoice variations to improve model robustness

    ✅ Ideal for machine learning workflows in finance, ERP, and accounting systems

    ✅ Supports rapid prototyping for invoice understanding models

    Ideal For

    Researchers working on OCR and document understanding

    Developers building invoice processing systems

    Machine learning engineers fine-tuning models for data extraction

    Startups and enterprises automating financial workflows

  5. h

    invoices-donut-data-v1

    • huggingface.co
    • opendatalab.com
    Updated May 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katana ML (2023). invoices-donut-data-v1 [Dataset]. https://huggingface.co/datasets/katanaml-org/invoices-donut-data-v1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 11, 2023
    Dataset authored and provided by
    Katana ML
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Invoices (Sparrow)

    This dataset contains 500 invoice documents annotated and processed to be ready for Donut ML model fine-tuning. Annotation and data preparation task was done by Katana ML team. Sparrow - open-source data extraction solution by Katana ML. Original dataset info: Kozłowski, Marek; Weichbroth, Paweł (2021), “Samples of electronic invoices”, Mendeley Data, V2, doi: 10.17632/tnj49gpmtz.2

  6. invoice_sample

    • kaggle.com
    zip
    Updated Feb 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prashanth Sheri (2023). invoice_sample [Dataset]. https://www.kaggle.com/datasets/prashanthsheri/invoice-sample
    Explore at:
    zip(21646 bytes)Available download formats
    Dataset updated
    Feb 23, 2023
    Authors
    Prashanth Sheri
    Description

    Dataset

    This dataset was created by Prashanth Sheri

    Contents

  7. Company Documents Dataset

    • kaggle.com
    zip
    Updated May 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayoub Cherguelaine (2024). Company Documents Dataset [Dataset]. https://www.kaggle.com/datasets/ayoubcherguelaine/company-documents-dataset
    Explore at:
    zip(9789538 bytes)Available download formats
    Dataset updated
    May 23, 2024
    Authors
    Ayoub Cherguelaine
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Overview

    This dataset contains a collection of over 2,000 company documents, categorized into four main types: invoices, inventory reports, purchase orders, and shipping orders. Each document is provided in PDF format, accompanied by a CSV file that includes the text extracted from these documents, their respective labels, and the word count of each document. This dataset is ideal for various natural language processing (NLP) tasks, including text classification, information extraction, and document clustering.

    Dataset Content

    PDF Documents: The dataset includes 2,677 PDF files, each representing a unique company document. These documents are derived from the Northwind dataset, which is commonly used for demonstrating database functionalities.

    The document types are:

    • Invoices: Detailed records of transactions between a buyer and a seller.
    • Inventory Reports: Records of inventory levels, including items in stock and units sold.
    • Purchase Orders: Requests made by a buyer to a seller to purchase products or services.
    • Shipping Orders: Instructions for the delivery of goods to specified recipients.

    Example Entries

    Here are a few example entries from the CSV file:

    Shipping Order:

    • Order ID: 10718
    • Shipping Details: "Ship Name: Königlich Essen, Ship Address: Maubelstr. 90, Ship City: ..."
    • Word Count: 120

    Invoice:

    • Order ID: 10707
    • Customer Details: "Customer ID: Arout, Order Date: 2017-10-16, Contact Name: Th..."
    • Word Count: 66

    Purchase Order:

    • Order ID: 10892
    • Order Details: "Order Date: 2018-02-17, Customer Name: Catherine Dewey, Products: Product ..."
    • Word Count: 26

    Applications

    This dataset can be used for:

    • Text Classification: Train models to classify documents into their respective categories.
    • Information Extraction: Extract specific fields and details from the documents.
    • Document Clustering: Group similar documents together based on their content.
    • OCR and Text Mining: Improve OCR (Optical Character Recognition) models and text mining techniques using real-world data.
  8. Invoices Data

    • kaggle.com
    Updated Sep 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghassen Khaled (2023). Invoices Data [Dataset]. https://www.kaggle.com/datasets/ghassenkhaled/invoices-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 3, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ghassen Khaled
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is related to financial transactions or invoices and includes information about the invoiced parties, services, and financial details. Depending on your specific analysis or use case.

  9. d

    Sample Receipt

    • catalog.data.gov
    • fisheries.noaa.gov
    • +1more
    Updated Oct 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact, Custodian) (2024). Sample Receipt [Dataset]. https://catalog.data.gov/dataset/sample-receipt1
    Explore at:
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    (Point of Contact, Custodian)
    Description

    Each sample that is received by NSIL is assigned a laboratory number and a case file is initiated by the sample custodian. The case file will contain all relevant paperwork for that sample including the sample submission sheet, laboratory raw data worksheets, the final results report and any other relevant documentation. The sample custodian enters the client information into the NSIL Sample tracking system (Sample receipt database) and generates appropriate client and sample receipt information. The laboratory analysts perform the appropriate analyses and record the results and whether the results are compliant or non-compliant with the assigned acceptance levels. The analysts also record the record of charges and the analytical and quality assurance units that were used to complete all analysis. The database is used to track samples analyzed by NSIL from sample receipt to reporting of results. It tracks numbers of samples, number of analytical units, types of samples, purpose for sampling ans analytical costs.

  10. t

    Inv3d: a high-resolution 3d invoice dataset for template-guided single-image...

    • service.tib.eu
    Updated Nov 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Inv3d: a high-resolution 3d invoice dataset for template-guided single-image document unwarping - meta data - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1730
    Explore at:
    Dataset updated
    Nov 28, 2024
    Description

    Abstract: Numerous business workflows involve printed forms, such as invoices or receipts, which are often manually digitalized to persistently search or store the data. As hardware scanners are costly and inflexible, smartphones are increasingly used for digitalization. Here, processing algorithms need to deal with prevailing environmental factors, such as shadows or crumples. Current state-of-the-art approaches learn supervised image dewarping models based on pairs of raw images and rectification meshes. The available results show promising predictive accuracies for dewarping, but generated errors still lead to sub-optimal information retrieval. In this paper, we explore the potential of improving dewarping models using additional, structured information in the form of invoice templates. We provide two core contributions: (1) a novel dataset, referred to as Inv3D, comprising synthetic and real-world high-resolution invoice images with structural templates, rectification meshes, and a multiplicity of per-pixel supervision signals and (2) a novel image dewarping algorithm, which extends the state-of-the-art approach GeoTr to leverage structural templates using attention. Our extensive evaluation includes an implementation of DewarpNet and shows that exploiting structured templates can improve the performance for image dewarping. We report superior performance for the proposed algorithm on our new benchmark for all metrics, including an improved local distortion of 26.1 %. We made our new dataset and all code publicly available at https://felixhertlein.github.io/inv3d. TechnicalRemarks: Each sample contains the following files: "flat_document.png" (2200x1700x3, uint8, 0-255), showcasing a document in perfect condition. "flat_information_delta.png" displays all texts which represent invoice data (2200x1700x3, uint8, 0-255). "flat_template.png" is an empty invoice template (2200x1700x3, uint8, 0-255). "flat_text_mask.png" visually presents all texts shown in the given document (2200x1700x3, uint8, 0-255). "warped_angle.png" shows warping-induced x- and y-axis angle (1600x1600x2, float32, -Pi to Pi). "warped_albedo.png" is an albedo map (1600x1600x3, uint8, 0-255). "warped_BM.npz" stores backward mapping, i. e. the realtive pixel shift from warped to normalized image for each pixel shifts (1600x1600x2, float32, 0-1). "warped_curvature.npz" has pixel-wise curvature of the warped document (1600x1600x1, float32, 0-inf). "warped_depth.npz" holds per-pixel depth between camera and document (1600x1600x3, float32, 0-inf). "warped_document.png" displays the warped document (1600x1600x3, uint8, 0-255). "warped_normal.npz" contains warped document normals (1600x1600x3, float32, -inf to inf). "warped_recon.png" features a chess-textured warped document (1600x1600x3, uint8, 0-255). "warped_text_mask.npz" is a boolean text pixel mask (1600x1600x1, bool8, True/False). "warped_UV.npz" stores warped texture coordinates (1600x1600x3, float32, 0-1). "warped_WC.npz" includes document coordinates in the 3D space (1600x1600x3, float32, -inf to inf). For more details see https://github.com/FelixHertlein/inv3d-generator. Released under CC BY-NC-SA 4.0. Excluded files are listed in 'restricted-license-files.txt' (located in record with DOI 10.35097/1730, "Inv3D: a high-resolution 3D invoice dataset for template-driven Single-Image Document Unwarping - Metadata"). These are for academic use only.

  11. h

    7000_invoice_images_with_json

    • huggingface.co
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ananthakrishnan P V (2025). 7000_invoice_images_with_json [Dataset]. https://huggingface.co/datasets/Ananthu01/7000_invoice_images_with_json
    Explore at:
    Dataset updated
    Mar 6, 2025
    Authors
    Ananthakrishnan P V
    Description

    This dataset contains 7000 invoice images and their corresponding JSON files. There are 7 types of invoices in this dataset, each one containing 1000 examples each. The data content in the invoices has been generated using Python Faker. If you do not want to download in the form of parquet (default download format) and want to download the dataset in the original format (a folder containing the 2 subfolders, image and json), use the below code: from huggingface_hub import snapshot_download… See the full description on the dataset page: https://huggingface.co/datasets/Ananthu01/7000_invoice_images_with_json.

  12. FATURA Dataset

    • zenodo.org
    zip
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahmoud Limam; Marwa Dhiaf; Yousri Kessentini; Mahmoud Limam; Marwa Dhiaf; Yousri Kessentini (2023). FATURA Dataset [Dataset]. http://doi.org/10.5281/zenodo.8261508
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mahmoud Limam; Marwa Dhiaf; Yousri Kessentini; Mahmoud Limam; Marwa Dhiaf; Yousri Kessentini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset consists of 10000 jpg images and 3x10000 json annotation files. The images are generated from 50 different templates. For each template, 200 images were generated. We provide annotations in three formats: our own original format, the COCO format and a format compatible with HuggingFace Transformers.

    In terms of objects, the dataset contains 24 different classes. The classes vary considerably in their numbers of occurrences and thus, the dataset is somewhat imbalanced.

    The annotations contain bounding box coordinates, bounding box text and object classes.

    We propose two methods for training and evaluating models. The models were trained until convergence ie until the model reaches optimal performance on the validation split and started overfitting. The model version used for evaluation is the one with the best validation performance.

    First Evaluation strategy:
    For each template, the generated images are randomly split into 3 subsets: training, validation and testing.
    In this scenario, the model trains on all templates and is thus tested on new images rather than new layouts.

    Second Evaluation strategy:
    The real templates are randomly split into a training set, and a common set of templates for validation and testing. All the variants created from the training templates are used as training dataset. The same is done to form the validation and testing datasets. The validation and testing sets are made up of the same templates but of different images.
    This approach tests the models' performance on different unseen templates/layouts, rather than the same templates with different content.

    We provide the data splits we used for every evaluation scenario. We also provide the background colors we used as augmentation for each template.

  13. d

    Company Invoice Data API | 88M+ Companies | 18 European Countries | 20+ Data...

    • datarade.ai
    .json
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HitHorizons, Company Invoice Data API | 88M+ Companies | 18 European Countries | 20+ Data Points | Monthly Updated | GDPR-Compliant [Dataset]. https://datarade.ai/data-products/hithorizons-company-invoice-data-api-20m-companies-10-hithorizons
    Explore at:
    .jsonAvailable download formats
    Dataset authored and provided by
    HitHorizons
    Area covered
    Poland, France, United Kingdom
    Description

    HitHorizons Invoice Data API gives access to aggregated company data on 88M+ companies from 18 countries.

    Available countries:

    France United Kingdom Germany Poland Czech Republic Hungary Slovakia Latvia Estonia Austria

    Parameters:

    Id Company Name Company Alternative Name Street Address Street Number Location District Region Postal Code City Country Status Incorporation Date Dissolution Date National ID Tax ID Vat ID Parent ID Idents Inactive Company Type Company Type Normalized

    parameters may vary depending on the country

  14. d

    Invoices for Open Market Order (OMO) Charges

    • catalog.data.gov
    • data.cityofnewyork.us
    Updated Nov 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2025). Invoices for Open Market Order (OMO) Charges [Dataset]. https://catalog.data.gov/dataset/invoices-for-open-market-order-omo-charges
    Explore at:
    Dataset updated
    Nov 22, 2025
    Dataset provided by
    data.cityofnewyork.us
    Description

    Contains information about invoices submitted to HPD by private contractors under an OMO. This is part of the HPD Charge Data collection of data tables.

  15. t

    Inv3d: a high-resolution 3d invoice dataset for template-guided single-image...

    • service.tib.eu
    Updated Nov 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Inv3d: a high-resolution 3d invoice dataset for template-guided single-image document unwarping - train split part 2 of 4 - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1694
    Explore at:
    Dataset updated
    Nov 28, 2024
    Description

    Abstract: Numerous business workflows involve printed forms, such as invoices or receipts, which are often manually digitalized to persistently search or store the data. As hardware scanners are costly and inflexible, smartphones are increasingly used for digitalization. Here, processing algorithms need to deal with prevailing environmental factors, such as shadows or crumples. Current state-of-the-art approaches learn supervised image dewarping models based on pairs of raw images and rectification meshes. The available results show promising predictive accuracies for dewarping, but generated errors still lead to sub-optimal information retrieval. In this paper, we explore the potential of improving dewarping models using additional, structured information in the form of invoice templates. We provide two core contributions: (1) a novel dataset, referred to as Inv3D, comprising synthetic and real-world high-resolution invoice images with structural templates, rectification meshes, and a multiplicity of per-pixel supervision signals and (2) a novel image dewarping algorithm, which extends the state-of-the-art approach GeoTr to leverage structural templates using attention. Our extensive evaluation includes an implementation of DewarpNet and shows that exploiting structured templates can improve the performance for image dewarping. We report superior performance for the proposed algorithm on our new benchmark for all metrics, including an improved local distortion of 26.1 %. We made our new dataset and all code publicly available at https://felixhertlein.github.io/inv3d. TechnicalRemarks: Each sample contains the following files: "flat_document.png" (2200x1700x3, uint8, 0-255), showcasing a document in perfect condition. "flat_information_delta.png" displays all texts which represent invoice data (2200x1700x3, uint8, 0-255). "flat_template.png" is an empty invoice template (2200x1700x3, uint8, 0-255). "flat_text_mask.png" visually presents all texts shown in the given document (2200x1700x3, uint8, 0-255). "warped_angle.png" shows warping-induced x- and y-axis angle (1600x1600x2, float32, -Pi to Pi). "warped_albedo.png" is an albedo map (1600x1600x3, uint8, 0-255). "warped_BM.npz" stores backward mapping, i. e. the realtive pixel shift from warped to normalized image for each pixel shifts (1600x1600x2, float32, 0-1). "warped_curvature.npz" has pixel-wise curvature of the warped document (1600x1600x1, float32, 0-inf). "warped_depth.npz" holds per-pixel depth between camera and document (1600x1600x3, float32, 0-inf). "warped_document.png" displays the warped document (1600x1600x3, uint8, 0-255). "warped_normal.npz" contains warped document normals (1600x1600x3, float32, -inf to inf). "warped_recon.png" features a chess-textured warped document (1600x1600x3, uint8, 0-255). "warped_text_mask.npz" is a boolean text pixel mask (1600x1600x1, bool8, True/False). "warped_UV.npz" stores warped texture coordinates (1600x1600x3, float32, 0-1). "warped_WC.npz" includes document coordinates in the 3D space (1600x1600x3, float32, -inf to inf). For more details see https://github.com/FelixHertlein/inv3d-generator. Released under CC BY-NC-SA 4.0. Excluded files are listed in 'restricted-license-files.txt' (located in record with DOI 10.35097/1730, "Inv3D: a high-resolution 3D invoice dataset for template-driven Single-Image Document Unwarping - Metadata"). These are for academic use only.

  16. OCR Receipts Text Detection - retail dataset

    • kaggle.com
    zip
    Updated Sep 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unique Data (2023). OCR Receipts Text Detection - retail dataset [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/ocr-receipts-text-detection
    Explore at:
    zip(55182311 bytes)Available download formats
    Dataset updated
    Sep 18, 2023
    Authors
    Unique Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    OCR Receipts from Grocery Stores Text Detection - retail dataset

    The Grocery Store Receipts Dataset is a collection of photos captured from various grocery store receipts. This dataset is specifically designed for tasks related to Optical Character Recognition (OCR) and is useful for retail.

    💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on our website to buy the dataset

    Each image in the dataset is accompanied by bounding box annotations, indicating the precise locations of specific text segments on the receipts. The text segments are categorized into four classes: item, store, date_time and total.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F4d5c600731265119bb28668959d5c357%2FFrame%2016.png?generation=1695111877176656&alt=media" alt="">

    Dataset structure

    • images - contains of original images of receipts
    • boxes - includes bounding box labeling for the original images
    • annotations.xml - contains coordinates of the bounding boxes and detected text, created for the original photo

    Data Format

    Each image from images folder is accompanied by an XML-annotation in the annotations.xml file indicating the coordinates of the bounding boxes and detected text . For each point, the x and y coordinates are provided.

    Classes:

    • store - name of the grocery store
    • item - item in the receipt
    • date_time - date and time of the receipt
    • total - total price of the receipt

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F62643adde75dd6ca4e3f26909174ae40%2Fcarbon.png?generation=1695112527839805&alt=media" alt="">

    Text Detection in the Receipts might be made in accordance with your requirements.

    🧩 This is just an example of the data. Leave a request here to learn more

    🚀 You can learn more about our high-quality unique datasets here

    keywords: receipts reading, retail dataset, consumer goods dataset, grocery store dataset, supermarket dataset, deep learning, retail store management, pre-labeled dataset, annotations, text detection, text recognition, optical character recognition, document text recognition, detecting text-lines, object detection, scanned documents, deep-text-recognition, text area detection, text extraction, images dataset, image-to-text, object detection

  17. S

    CCP Invoice Register

    • splitgraph.com
    • datacatalog.cookcountyil.gov
    • +2more
    Updated Oct 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    datacatalog-cookcountyil-gov (2024). CCP Invoice Register [Dataset]. https://www.splitgraph.com/datacatalog-cookcountyil-gov/ccp-invoice-register-exta-e29u/
    Explore at:
    json, application/openapi+json, application/vnd.splitgraph.imageAvailable download formats
    Dataset updated
    Oct 1, 2024
    Authors
    datacatalog-cookcountyil-gov
    Description

    This dataset provides a cumulative record of payment and invoice details for County suppliers, vendors and other payees. Data is from December 1, 2016 to present. Payment data prior to December 1, 2016 is archived in the Cook County Check Register here: https://datacatalog.cookcountyil.gov/Finance-Administration/Cook-County-Check-Register/gywr-fjeh

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  18. G

    Invoice Processing Software Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Invoice Processing Software Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/invoice-processing-software-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Aug 22, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Invoice Processing Software Market Outlook




    According to our latest research, the global invoice processing software market size was valued at USD 4.2 billion in 2024, with a robust compound annual growth rate (CAGR) of 12.8% anticipated through 2033. By 2033, the market is forecasted to reach USD 12.4 billion, driven by rapid digital transformation initiatives, increasing demand for automation in finance departments, and a growing emphasis on operational efficiency. The adoption of cloud-based solutions and the integration of artificial intelligence into invoice management platforms are among the key factors fueling this consistent growth trajectory, as organizations across various sectors seek to streamline their accounts payable processes and reduce manual intervention.




    One of the primary growth drivers for the invoice processing software market is the accelerating shift towards digital transformation across enterprises globally. As organizations increasingly seek to automate their financial workflows, invoice processing software is emerging as a critical tool for enhancing accuracy, reducing errors, and minimizing the time required for invoice approvals and payments. The integration of advanced technologies such as artificial intelligence (AI), machine learning (ML), and robotic process automation (RPA) into invoice processing solutions is enabling businesses to extract data from invoices more efficiently, detect anomalies, and ensure compliance with regulatory requirements. This not only streamlines the overall process but also provides actionable insights for better financial decision-making, further propelling market growth.




    Another significant factor contributing to the expansion of the invoice processing software market is the rising adoption of cloud-based solutions. Cloud deployment offers several advantages, including scalability, cost-effectiveness, and remote accessibility, making it an attractive option for organizations of all sizes. The COVID-19 pandemic further accelerated the migration to cloud-based platforms, as businesses prioritized remote work capabilities and digital collaboration tools. As a result, vendors are increasingly focusing on developing cloud-native invoice processing solutions with enhanced security features and seamless integration capabilities. This trend is particularly pronounced among small and medium enterprises (SMEs), which often lack the resources to maintain complex on-premises infrastructure but require robust solutions to manage their invoicing workflows efficiently.




    The invoice processing software market is also benefiting from the growing need for compliance and risk management in the face of evolving regulatory landscapes. With stricter regulations around financial reporting, tax compliance, and data privacy, organizations are under pressure to implement systems that ensure transparency and auditability in their accounts payable processes. Invoice processing software provides automated audit trails, reduces the risk of fraud, and helps organizations adhere to local and international compliance standards. This is especially crucial for industries such as banking, financial services, and insurance (BFSI), healthcare, and government, where regulatory scrutiny is particularly high. Consequently, the demand for robust invoice processing solutions is expected to remain strong across these verticals.




    From a regional perspective, North America currently dominates the invoice processing software market, accounting for the largest share in 2024. This is attributed to the presence of major technology vendors, high adoption rates of automation solutions, and a mature digital infrastructure in the region. Europe follows closely, driven by stringent regulatory requirements and a strong focus on process optimization within enterprises. The Asia Pacific region is expected to exhibit the fastest growth over the forecast period, fueled by the rapid digitalization of businesses, increasing investments in cloud technology, and the proliferation of SMEs. Latin America and the Middle East & Africa are also witnessing steady adoption, supported by ongoing efforts to modernize financial operations and improve business efficiency.



    "https://growthmarketreports.com/request-sample/85372">
    <button class="btn btn-lg text-center" id="

  19. GCP-Cloud-Billing-Data

    • kaggle.com
    zip
    Updated Aug 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SAIRAM N (2024). GCP-Cloud-Billing-Data [Dataset]. https://www.kaggle.com/datasets/sairamn19/gcp-cloud-billing-data
    Explore at:
    zip(59956 bytes)Available download formats
    Dataset updated
    Aug 30, 2024
    Authors
    SAIRAM N
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    The importance of effectively using Google Cloud Platform (GCP) billing data to gain actionable insights into cloud spending. It emphasizes the need for strategic cost management, offering guidance on how to analyze billing data, optimize resource usage, and implement best practices to minimize costs while maximizing the value derived from cloud services. The subtitle is geared towards businesses and technical teams looking to maintain financial control and improve their cloud operations.

    This dataset contains the data of GCP billing cloud cost. For a updated one, comment ! contact !

  20. k

    Invoice

    • koncile.ai
    Updated Nov 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koncile.ai (2025). Invoice [Dataset]. https://www.koncile.ai/en
    Explore at:
    Dataset updated
    Nov 20, 2025
    Dataset provided by
    Koncile.ai
    License

    https://www.koncile.ai/en/termsandconditionshttps://www.koncile.ai/en/termsandconditions

    Description

    AI-powered software to extract fields from PDF or image invoices. Reliable and available via API to turn documents into actionable data.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Cankat Saraç (2022). Invoices Dataset [Dataset]. https://www.kaggle.com/datasets/cankatsrc/invoices
Organization logo

Invoices Dataset

Invoices datasets contains randomly generate data using Faker package in Python

Explore at:
zip(574249 bytes)Available download formats
Dataset updated
Jan 18, 2022
Authors
Cankat Saraç
License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

The invoice dataset provided is a mock dataset generated using the Python Faker library. It has been designed to mimic the format of data collected from an online store. The dataset contains various fields, including first name, last name, email, product ID, quantity, amount, invoice date, address, city, and stock code. All of the data in the dataset is randomly generated and does not represent actual individuals or products. The dataset can be used for various purposes, including testing algorithms or models related to invoice management, e-commerce, or customer behavior analysis. The data in this dataset can be used to identify trends, patterns, or anomalies in online shopping behavior, which can help businesses to optimize their online sales strategies.

Search
Clear search
Close search
Google apps
Main menu