50 datasets found
  1. h

    invoices-example

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parsee.ai, invoices-example [Dataset]. https://huggingface.co/datasets/parsee-ai/invoices-example
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    Parsee.ai
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Inoices Sample Dataset

    This is a sample dataset generated on app.parsee.ai for invoices. The goal was to evaluate different LLMs on this RAG task using the Parsee evaluation tools. A full study can be found here: https://github.com/parsee-ai/parsee-datasets/blob/main/datasets/invoices/parsee-loader/README.md parsee-core version used: 0.1.3.11 This dataset was created on the basis of 15 sample invoices (PDF files). All PDF files are publicly accessible on parsee.ai, to access them… See the full description on the dataset page: https://huggingface.co/datasets/parsee-ai/invoices-example.

  2. h

    invoices-donut-data-v1

    • huggingface.co
    • opendatalab.com
    Updated May 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katana ML (2023). invoices-donut-data-v1 [Dataset]. https://huggingface.co/datasets/katanaml-org/invoices-donut-data-v1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 11, 2023
    Dataset authored and provided by
    Katana ML
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Invoices (Sparrow)

    This dataset contains 500 invoice documents annotated and processed to be ready for Donut ML model fine-tuning. Annotation and data preparation task was done by Katana ML team. Sparrow - open-source data extraction solution by Katana ML. Original dataset info: Kozłowski, Marek; Weichbroth, Paweł (2021), “Samples of electronic invoices”, Mendeley Data, V2, doi: 10.17632/tnj49gpmtz.2

  3. R

    Invoice Management Dataset

    • universe.roboflow.com
    zip
    Updated Dec 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CVIP Workspace (2024). Invoice Management Dataset [Dataset]. https://universe.roboflow.com/cvip-workspace/invoice-management
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 28, 2024
    Dataset authored and provided by
    CVIP Workspace
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Text Bounding Boxes
    Description

    Intelligent Invoice Management System

    Project Description:
    The Intelligent Invoice Management System is an advanced AI-powered platform designed to revolutionize traditional invoice processing. By automating the extraction, validation, and management of invoice data, this system addresses the inefficiencies, inaccuracies, and high costs associated with manual methods. It enables businesses to streamline operations, reduce human error, and expedite payment cycles.

    Problem Statement:
    Manual invoice processing involves labor-intensive tasks such as data entry, verification, and reconciliation. These processes are time-consuming, prone to errors, and can result in financial losses and delays. The diversity of invoice formats from various vendors adds complexity, making automation a critical need for efficiency and scalability.

    Proposed Solution:
    The Intelligent Invoice Management System automates the end-to-end process of invoice handling using AI and machine learning techniques. Core functionalities include:
    1. Invoice Generation: Automatically generate PDF invoices in at least four formats, populated with synthetic data.
    2. Data Development: Leverage a dataset containing fields such as receipt numbers, company details, sales tax information, and itemized tables to create realistic invoice samples.
    3. AI-Powered Labeling: Use Tesseract OCR to extract labeled data from invoice images, and train YOLO for label recognition, ensuring precise identification of fields.
    4. Database Integration: Store extracted information in a structured database for seamless retrieval and analysis.
    5. Web-Based Information System: Provide a user-friendly platform to upload invoices and retrieve key metrics, such as:
    - Total sales within a specified duration.
    - Total sales tax paid during a given timeframe.
    - Detailed invoice information in tabular form for specific date ranges.

    Key Features and Deliverables:
    1. Invoice Generation:
    - Generate 20,000 invoices using an automated script.
    - Include dummy logos, company details, and itemized tables for four items per invoice.

    1. Label Definition and Format:

      • Define structured labels (TBLR, CLASS Name, Recognized Text).
      • Provide labels in both XML and JSON formats for seamless integration.
    2. OCR and AI Training:

      • Automate labeling using Tesseract OCR for high-accuracy text recognition.
      • Train and test YOLO to detect and classify invoice fields (TBLR and CLASS).
    3. Database Management:

      • Store OCR-extracted labels and field data in a database.
      • Enable efficient search and aggregation of invoice data.
    4. Web-Based Interface:

      • Build a responsive system for users to upload invoices and retrieve data based on company name or NTN.
      • Display metrics and reports for total sales, tax paid, and invoice details over custom date ranges.

    Expected Outcomes: - Reduction in manual effort and operational costs.
    - Improved accuracy in invoice processing and financial reporting.
    - Enhanced scalability and adaptability for diverse invoice formats.
    - Faster turnaround time for invoice-related tasks.

    By automating critical aspects of invoice management, this system delivers a robust and intelligent solution to meet the evolving needs of businesses.

  4. C

    Event Graph of BPI Challenge 2019

    • data.4tu.nl
    zip
    Updated Apr 22, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dirk Fahland (2021). Event Graph of BPI Challenge 2019 [Dataset]. http://doi.org/10.4121/14169614.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 22, 2021
    Dataset provided by
    4TU.ResearchData
    Authors
    Dirk Fahland
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Business process event data modeled as labeled property graphs

    Data Format
    -----------

    The dataset comprises one labeled property graph in two different file formats.

    #1) Neo4j .dump format

    A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/

    /bin/neo4j-admin.(bat|sh) load --database=graph.db --from=

    The .dump was created with Neo4j v3.5.

    #2) .graphml format

    A .zip file containing a .graphml file of the entire graph


    Data Schema
    -----------

    The graph is a labeled property graph over business process event data. Each graph uses the following concepts

    :Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"

    :Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")

    :Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node

    :Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations

    :CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities

    :DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.

    :HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log

    :OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph

    :REL relationship - placeholder for any structural relationship between two :Entity nodes

    The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552


    Data Contents
    -------------

    neo4j-bpic19-2021-02-17 (.dump|.graphml.zip)

    An integrated graph describing the raw event data of the entire BPI Challenge 2019 dataset.
    van Dongen, B.F. (Boudewijn) (2019): BPI Challenge 2019. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:d06aff4b-79f0-45e6-8ec8-e19730c248f1

    This data originated from a large multinational company operating from The Netherlands in the area of coatings and paints and we ask participants to investigate the purchase order handling process for some of its 60 subsidiaries. In particular, the process owner has compliance questions. In the data, each purchase order (or purchase document) contains one or more line items. For each line item, there are roughly four types of flows in the data: (1) 3-way matching, invoice after goods receipt: For these items, the value of the goods receipt message should be matched against the value of an invoice receipt message and the value put during creation of the item (indicated by both the GR-based flag and the Goods Receipt flags set to true). (2) 3-way matching, invoice before goods receipt: Purchase Items that do require a goods receipt message, while they do not require GR-based invoicing (indicated by the GR-based IV flag set to false and the Goods Receipt flags set to true). For such purchase items, invoices can be entered before the goods are receipt, but they are blocked until goods are received. This unblocking can be done by a user, or by a batch process at regular intervals. Invoices should only be cleared if goods are received and the value matches with the invoice and the value at creation of the item. (3) 2-way matching (no goods receipt needed): For these items, the value of the invoice should match the value at creation (in full or partially until PO value is consumed), but there is no separate goods receipt message required (indicated by both the GR-based flag and the Goods Receipt flags set to false). (4)Consignment: For these items, there are no invoices on PO level as this is handled fully in a separate process. Here we see GR indicator is set to true but the GR IV flag is set to false and also we know by item type (consignment) that we do not expect an invoice against this item. Unfortunately, the complexity of the data goes further than just this division in four categories. For each purchase item, there can be many goods receipt messages and corresponding invoices which are subsequently paid. Consider for example the process of paying rent. There is a Purchase Document with one item for paying rent, but a total of 12 goods receipt messages with (cleared) invoices with a value equal to 1/12 of the total amount. For logistical services, there may even be hundreds of goods receipt messages for one line item. Overall, for each line item, the amounts of the line item, the goods receipt messages (if applicable) and the invoices have to match for the process to be compliant. Of course, the log is anonymized, but some semantics are left in the data, for example: The resources are split between batch users and normal users indicated by their name. The batch users are automated processes executed by different systems. The normal users refer to human actors in the process. The monetary values of each event are anonymized from the original data using a linear translation respecting 0, i.e. addition of multiple invoices for a single item should still lead to the original item worth (although there may be small rounding errors for numerical reasons). Company, vendor, system and document names and IDs are anonymized in a consistent way throughout the log. The company has the key, so any result can be translated by them to business insights about real customers and real purchase documents.

    The case ID is a combination of the purchase document and the purchase item. There is a total of 76,349 purchase documents containing in total 251,734 items, i.e. there are 251,734 cases. In these cases, there are 1,595,923 events relating to 42 activities performed by 627 users (607 human users and 20 batch users). Sometimes the user field is empty, or NONE, which indicates no user was recorded in the source system. For each purchase item (or case) the following attributes are recorded: concept:name: A combination of the purchase document id and the item id, Purchasing Document: The purchasing document ID, Item: The item ID, Item Type: The type of the item, GR-Based Inv. Verif.: Flag indicating if GR-based invoicing is required (see above), Goods Receipt: Flag indicating if 3-way matching is required (see above), Source: The source system of this item, Doc. Category name: The name of the category of the purchasing document, Company: The subsidiary of the company from where the purchase originated, Spend classification text: A text explaining the class of purchase item, Spend area text: A text explaining the area for the purchase item, Sub spend area text: Another text explaining the area for the purchase item, Vendor: The vendor to which the purchase document was sent, Name: The name of the vendor, Document Type: The document type, Item Category: The category as explained above (3-way with GR-based invoicing, 3-way without, 2-way, consignment).

    The data contains the following entities and their events

    - PO - Purchase Order documents handled at a large multinational company operating from The Netherlands
    - POItem - an item in a Purchase Order document describing a specific item to be purchased
    - Resource - the user or worker handling the document or a specific item
    - Vendor - the external organization from which an item is to be purchased

    Data Size
    ---------

    BPIC19, nodes: 1926651, relationships: 15082099

  5. t

    Inv3d: a high-resolution 3d invoice dataset for template-guided single-image...

    • service.tib.eu
    Updated Nov 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Inv3d: a high-resolution 3d invoice dataset for template-guided single-image document unwarping - validation split - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1686
    Explore at:
    Dataset updated
    Nov 28, 2024
    Description

    Abstract: Numerous business workflows involve printed forms, such as invoices or receipts, which are often manually digitalized to persistently search or store the data. As hardware scanners are costly and inflexible, smartphones are increasingly used for digitalization. Here, processing algorithms need to deal with prevailing environmental factors, such as shadows or crumples. Current state-of-the-art approaches learn supervised image dewarping models based on pairs of raw images and rectification meshes. The available results show promising predictive accuracies for dewarping, but generated errors still lead to sub-optimal information retrieval. In this paper, we explore the potential of improving dewarping models using additional, structured information in the form of invoice templates. We provide two core contributions: (1) a novel dataset, referred to as Inv3D, comprising synthetic and real-world high-resolution invoice images with structural templates, rectification meshes, and a multiplicity of per-pixel supervision signals and (2) a novel image dewarping algorithm, which extends the state-of-the-art approach GeoTr to leverage structural templates using attention. Our extensive evaluation includes an implementation of DewarpNet and shows that exploiting structured templates can improve the performance for image dewarping. We report superior performance for the proposed algorithm on our new benchmark for all metrics, including an improved local distortion of 26.1 %. We made our new dataset and all code publicly available at https://felixhertlein.github.io/inv3d. TechnicalRemarks: Each sample contains the following files: "flat_document.png" (2200x1700x3, uint8, 0-255), showcasing a document in perfect condition. "flat_information_delta.png" displays all texts which represent invoice data (2200x1700x3, uint8, 0-255). "flat_template.png" is an empty invoice template (2200x1700x3, uint8, 0-255). "flat_text_mask.png" visually presents all texts shown in the given document (2200x1700x3, uint8, 0-255). "warped_angle.png" shows warping-induced x- and y-axis angle (1600x1600x2, float32, -Pi to Pi). "warped_albedo.png" is an albedo map (1600x1600x3, uint8, 0-255). "warped_BM.npz" stores backward mapping, i. e. the realtive pixel shift from warped to normalized image for each pixel shifts (1600x1600x2, float32, 0-1). "warped_curvature.npz" has pixel-wise curvature of the warped document (1600x1600x1, float32, 0-inf). "warped_depth.npz" holds per-pixel depth between camera and document (1600x1600x3, float32, 0-inf). "warped_document.png" displays the warped document (1600x1600x3, uint8, 0-255). "warped_normal.npz" contains warped document normals (1600x1600x3, float32, -inf to inf). "warped_recon.png" features a chess-textured warped document (1600x1600x3, uint8, 0-255). "warped_text_mask.npz" is a boolean text pixel mask (1600x1600x1, bool8, True/False). "warped_UV.npz" stores warped texture coordinates (1600x1600x3, float32, 0-1). "warped_WC.npz" includes document coordinates in the 3D space (1600x1600x3, float32, -inf to inf). For more details see https://github.com/FelixHertlein/inv3d-generator. Released under CC BY-NC-SA 4.0. Excluded files are listed in 'restricted-license-files.txt' (located in record with DOI 10.35097/1730, "Inv3D: a high-resolution 3D invoice dataset for template-driven Single-Image Document Unwarping - Metadata"). These are for academic use only.

  6. Sample Receipt

    • fisheries.noaa.gov
    • catalog.data.gov
    Updated Mar 15, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NMFS Office Of Sustainable Fisheries (2018). Sample Receipt [Dataset]. https://www.fisheries.noaa.gov/inport/item/28593
    Explore at:
    Dataset updated
    Mar 15, 2018
    Dataset provided by
    National Marine Fisheries Servicehttps://www.fisheries.noaa.gov/
    Time period covered
    2010 - May 29, 2125
    Area covered
    all over the continental US
    Description

    Each sample that is received by NSIL is assigned a laboratory number and a case file is initiated by the sample custodian. The case file will contain all relevant paperwork for that sample including the sample submission sheet, laboratory raw data worksheets, the final results report and any other relevant documentation. The sample custodian enters the client information into the NSIL Sample...

  7. d

    Invoices for Open Market Order (OMO) Charges

    • catalog.data.gov
    • data.cityofnewyork.us
    Updated Jun 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2025). Invoices for Open Market Order (OMO) Charges [Dataset]. https://catalog.data.gov/dataset/invoices-for-open-market-order-omo-charges
    Explore at:
    Dataset updated
    Jun 7, 2025
    Dataset provided by
    data.cityofnewyork.us
    Description

    Contains information about invoices submitted to HPD by private contractors under an OMO. This is part of the HPD Charge Data collection of data tables.

  8. Dataset of invoices and receipts including annotation of relevant fields

    • zenodo.org
    zip
    Updated Apr 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francisco Cruz; Francisco Cruz; Mauro Castelli; Mauro Castelli (2022). Dataset of invoices and receipts including annotation of relevant fields [Dataset]. http://doi.org/10.5281/zenodo.6371710
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 3, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Francisco Cruz; Francisco Cruz; Mauro Castelli; Mauro Castelli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a dataset comprising 813 images of invoices and receipts of a private company in the Portuguese language. It also includes text files with the transcription of relevant fields for each document – seller name, seller address, seller tax identification, buyer tax identification, invoice date, invoice total amount, invoice tax amount, and document reference.

  9. a

    Invoice and Credit Notes over £500 - Dataset - Angus Council Open Data

    • opendata.angus.gov.uk
    Updated Aug 3, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2011). Invoice and Credit Notes over £500 - Dataset - Angus Council Open Data [Dataset]. https://opendata.angus.gov.uk/dataset/invoice-and-credit-notes-over-f500
    Explore at:
    Dataset updated
    Aug 3, 2011
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Every six months, we publish a list of all invoices and credit notes over £500 we receive, providing details on supplier name, amount paid, invoice reference, gross amount and vat amount. Spend on premises, transport and supplies and services Payments to contractors who do work on our behalf. Other spend we incur in carrying out our business. Some areas of spend are covered by the Data Protection Act and are not published in full. This includes: Personal information, for example, individual payments for adoption and fostering, care related payments. * Payments to staff These entries have been omitted from the report.

  10. d

    Company Invoice Data API | 20M+ Companies | 10 European Countries | 20+ Data...

    • datarade.ai
    .json
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HitHorizons, Company Invoice Data API | 20M+ Companies | 10 European Countries | 20+ Data Points | Monthly Updated | GDPR-Compliant [Dataset]. https://datarade.ai/data-products/hithorizons-company-invoice-data-api-20m-companies-10-hithorizons
    Explore at:
    .jsonAvailable download formats
    Dataset authored and provided by
    HitHorizons
    Area covered
    United Kingdom
    Description

    HitHorizons Invoice Data API gives access to aggregated company data on 20M+ companies from 10 countries.

    Available countries:

    France United Kingdom Germany Poland Czech Republic Hungary Slovakia Latvia Estonia Austria

    Parameters:

    Id Company Name Company Alternative Name Street Address Street Number Location District Region Postal Code City Country Status Incorporation Date Dissolution Date National ID Tax ID Vat ID Parent ID Idents Inactive Company Type Company Type Normalized

    parameters may vary depending on the country

  11. R

    Question Answers Label Dataset

    • universe.roboflow.com
    zip
    Updated Nov 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Question Answer Labelling (2022). Question Answers Label Dataset [Dataset]. https://universe.roboflow.com/question-answer-labelling/question-answers-label
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 30, 2022
    Dataset authored and provided by
    Question Answer Labelling
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Digital Document Management: This model can be used to effectively organize and manage digital documents. By identifying areas such as headers, addresses, and vendors, it could streamline workflows in companies dealing with large amounts of papers, forms or invoices.

    2. Automated Data Extraction: The model could be used in extracting pertinent information from documents automatically. For example, pulling out questions and answers from educational materials, extracting vendor or address information from invoices, or grabbing column headers from statistical reports.

    3. Augmented Reality (AR) Applications: "Question Answers Label" can be utilized in AR glasses to give real-time information about objects a user sees, especially in the realm of paper documents.

    4. Virtual Assistance: This model may be used to build a virtual assistant capable of reading and understanding physical documents. For instance, reading out a user's mail, helping learning from textbooks, or assisting in reviewing legal documents.

    5. Accessibility Tools for Visually Impaired: The tool could be utilized to interpret written documents for visually impaired people by identifying and vocalizing text based on their classes (answers, questions, headers, etc).

  12. OCR Receipts Text Detection - retail dataset

    • kaggle.com
    Updated Sep 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Training Data (2023). OCR Receipts Text Detection - retail dataset [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/ocr-receipts-text-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 19, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Training Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    OCR Receipts from Grocery Stores Text Detection - retail dataset

    The Grocery Store Receipts Dataset is a collection of photos captured from various grocery store receipts. This dataset is specifically designed for tasks related to Optical Character Recognition (OCR) and is useful for retail.

    💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset

    Each image in the dataset is accompanied by bounding box annotations, indicating the precise locations of specific text segments on the receipts. The text segments are categorized into four classes: item, store, date_time and total.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F4d5c600731265119bb28668959d5c357%2FFrame%2016.png?generation=1695111877176656&alt=media" alt="">

    Dataset structure

    • images - contains of original images of receipts
    • boxes - includes bounding box labeling for the original images
    • annotations.xml - contains coordinates of the bounding boxes and detected text, created for the original photo

    Data Format

    Each image from images folder is accompanied by an XML-annotation in the annotations.xml file indicating the coordinates of the bounding boxes and detected text . For each point, the x and y coordinates are provided.

    Classes:

    • store - name of the grocery store
    • item - item in the receipt
    • date_time - date and time of the receipt
    • total - total price of the receipt

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F62643adde75dd6ca4e3f26909174ae40%2Fcarbon.png?generation=1695112527839805&alt=media" alt="">

    Text Detection in the Receipts might be made in accordance with your requirements.

    💴 Buy the Dataset: This is just an example of the data. Leave a request on https://trainingdata.pro/datasets to discuss your requirements, learn about the price and buy the dataset

    TrainingData provides high-quality data annotation tailored to your needs

    keywords: receipts reading, retail dataset, consumer goods dataset, grocery store dataset, supermarket dataset, deep learning, retail store management, pre-labeled dataset, annotations, text detection, text recognition, optical character recognition, document text recognition, detecting text-lines, object detection, scanned documents, deep-text-recognition, text area detection, text extraction, images dataset, image-to-text, object detection

  13. e

    City of Alavus purchase invoices 2022

    • data.europa.eu
    unknown
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alavuden kaupunki, City of Alavus purchase invoices 2022 [Dataset]. https://data.europa.eu/data/datasets/4874de5b-a120-4a7b-92dc-b5f9e15c946c?locale=en
    Explore at:
    unknown(12580561)Available download formats
    Dataset authored and provided by
    Alavuden kaupunki
    Area covered
    Alavus
    Description

    The data includes the City of Alavude's business ID purchase invoices. The data includes, for example, the date, receipt and accounting information of the purchase invoice, the service category and the supplier's name and business ID. Invoices of private traders are shown in anonymised form in the data.

  14. t

    Inv3d: a high-resolution 3d invoice dataset for template-guided single-image...

    • service.tib.eu
    Updated Nov 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Inv3d: a high-resolution 3d invoice dataset for template-guided single-image document unwarping - train split part 2 of 4 - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1694
    Explore at:
    Dataset updated
    Nov 28, 2024
    Description

    Abstract: Numerous business workflows involve printed forms, such as invoices or receipts, which are often manually digitalized to persistently search or store the data. As hardware scanners are costly and inflexible, smartphones are increasingly used for digitalization. Here, processing algorithms need to deal with prevailing environmental factors, such as shadows or crumples. Current state-of-the-art approaches learn supervised image dewarping models based on pairs of raw images and rectification meshes. The available results show promising predictive accuracies for dewarping, but generated errors still lead to sub-optimal information retrieval. In this paper, we explore the potential of improving dewarping models using additional, structured information in the form of invoice templates. We provide two core contributions: (1) a novel dataset, referred to as Inv3D, comprising synthetic and real-world high-resolution invoice images with structural templates, rectification meshes, and a multiplicity of per-pixel supervision signals and (2) a novel image dewarping algorithm, which extends the state-of-the-art approach GeoTr to leverage structural templates using attention. Our extensive evaluation includes an implementation of DewarpNet and shows that exploiting structured templates can improve the performance for image dewarping. We report superior performance for the proposed algorithm on our new benchmark for all metrics, including an improved local distortion of 26.1 %. We made our new dataset and all code publicly available at https://felixhertlein.github.io/inv3d. TechnicalRemarks: Each sample contains the following files: "flat_document.png" (2200x1700x3, uint8, 0-255), showcasing a document in perfect condition. "flat_information_delta.png" displays all texts which represent invoice data (2200x1700x3, uint8, 0-255). "flat_template.png" is an empty invoice template (2200x1700x3, uint8, 0-255). "flat_text_mask.png" visually presents all texts shown in the given document (2200x1700x3, uint8, 0-255). "warped_angle.png" shows warping-induced x- and y-axis angle (1600x1600x2, float32, -Pi to Pi). "warped_albedo.png" is an albedo map (1600x1600x3, uint8, 0-255). "warped_BM.npz" stores backward mapping, i. e. the realtive pixel shift from warped to normalized image for each pixel shifts (1600x1600x2, float32, 0-1). "warped_curvature.npz" has pixel-wise curvature of the warped document (1600x1600x1, float32, 0-inf). "warped_depth.npz" holds per-pixel depth between camera and document (1600x1600x3, float32, 0-inf). "warped_document.png" displays the warped document (1600x1600x3, uint8, 0-255). "warped_normal.npz" contains warped document normals (1600x1600x3, float32, -inf to inf). "warped_recon.png" features a chess-textured warped document (1600x1600x3, uint8, 0-255). "warped_text_mask.npz" is a boolean text pixel mask (1600x1600x1, bool8, True/False). "warped_UV.npz" stores warped texture coordinates (1600x1600x3, float32, 0-1). "warped_WC.npz" includes document coordinates in the 3D space (1600x1600x3, float32, -inf to inf). For more details see https://github.com/FelixHertlein/inv3d-generator. Released under CC BY-NC-SA 4.0. Excluded files are listed in 'restricted-license-files.txt' (located in record with DOI 10.35097/1730, "Inv3D: a high-resolution 3D invoice dataset for template-driven Single-Image Document Unwarping - Metadata"). These are for academic use only.

  15. S

    Hospital Billing - Event Log

    • data.4tu.nl
    • figshare.com
    zip
    Updated Aug 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Felix Mannhardt (2017). Hospital Billing - Event Log [Dataset]. http://doi.org/10.4121/uuid:76c46b83-c930-4798-a1c9-4be94dfeb741
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 1, 2017
    Dataset provided by
    Eindhoven University of Technology
    Authors
    Felix Mannhardt
    License

    https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use

    Time period covered
    Dec 13, 2012 - Jan 19, 2016
    Description

    The 'Hospital Billing' event log was obtained from the financial modules of the ERP system of a regional hospital. The event log contains events that are related to the billing of medical services that have been provided by the hospital. Each trace of the event log records the activities executed to bill a package of medical services that were bundled together. The event log does not contain information about the actual medical services provided by the hospital.

    The 100,000 traces in the event log are a random sample of process instances that were recorded over three years. Several attributes such as the 'state' of the process, the 'caseType', the underlying 'diagnosis' etc. are included in the event log. Events and attribute values have been anonymized. The time stamps of events have been randomized for this purpose, but the time between events within a trace has not been altered.

    More information about the event log can be found in the related publications.

  16. E-Invoicing Market Analysis, Size, and Forecast 2025-2029: Europe (Denmark,...

    • technavio.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). E-Invoicing Market Analysis, Size, and Forecast 2025-2029: Europe (Denmark, France, Germany, UK), APAC (China, India, Japan, South Korea), North America (US and Canada), South America , and Middle East and Africa [Dataset]. https://www.technavio.com/report/e-invoicing-market-industry-analysis
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Global, United States
    Description

    Snapshot img

    E-Invoicing Market Size 2025-2029

    The e-invoicing market size is forecast to increase by USD 36.1 billion at a CAGR of 29.9% between 2024 and 2029.

    The market is experiencing significant growth, driven by the convenience and easy accessibility of mobile payment systems. This trend is particularly prominent in regions with a high smartphone penetration and a growing preference for digital transactions. However, regulatory hurdles impact adoption in certain markets, as governments implement complex compliance requirements. These regulations aim to standardize invoicing processes and enhance tax revenue collection, but they add complexity to the market. Another key trend shaping the market is the increased security of documents using blockchain technology. This innovation addresses concerns around data privacy and security, which are increasingly important as businesses digitize their operations.
    Yet, the market faces challenges from the threat of cyber-attacks, which can compromise sensitive financial information. Companies must invest in robust security measures to protect their systems and maintain customer trust. Effective implementation of these strategies will enable businesses to capitalize on the market's growth potential and navigate challenges effectively.
    

    What will be the Size of the E-Invoicing Market during the forecast period?

    Request Free Sample

    In today's business landscape, the digitalization of invoicing is a significant trend, with an increasing number of businesses adopting electronic formats for their invoicing operations. This shift towards digital platforms is driven by the adoption of technology, regulatory initiatives, and tax obligations. Recipients now prefer receiving invoices digitally, making it essential for businesses to keep up with this trend. Digital transformation in financial management tools and accounting software has enabled the standardization of invoice formats, enhancing efficiency and sustainability. However, this transition comes with challenges, including security issues and the need for regulatory compliance. Artificial intelligence (AI) and machine learning (ML) are revolutionizing invoicing processes, automating system design and billing operations.
    Cloud platforms provide a secure and accessible solution for businesses to manage their invoices, ensuring seamless electronic transactions. Despite the benefits, there are concerns around fraud and tax evasion. Businesses must invest in training and implementing automated systems to mitigate these risks. The future of invoicing lies in the digital method, as it streamlines business functions and enhances operational efficiency. Regulatory mandates require businesses to comply with strict guidelines for electronic invoicing, making it crucial for businesses to stay informed and adapt to the changing landscape. With the increasing importance of digital platforms, businesses must prioritize the security of their invoicing systems and invest in AI and ML technologies to stay competitive.
    

    How is this E-Invoicing Industry segmented?

    The e-invoicing industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    End-user
    
      B2B
      B2C
    
    
    Deployment
    
      Cloud-based
      On-premises
    
    
    Application
    
      Energy and Utilities
      FMCG
      E-Commerce
      BFSI
      Government
      Others
    
    
    Geography
    
      North America
    
        US
        Canada
    
    
      Europe
    
        Denmark
        France
        Germany
        UK
    
    
      APAC
    
        China
        India
        Japan
        South Korea
    
    
      Rest of World (ROW)
    

    By End-user Insights

    The B2B segment is estimated to witness significant growth during the forecast period.

    In the dynamic business landscape of 2024, the market experienced significant growth, fueled by the increasing globalization and the expanding presence of IT, banking, financial services and insurance (BFSI), and retail sectors. These industries sought centralized, Internet-based billing and invoicing solutions to streamline their operations. Regulatory requirements in banking and retail sectors, the rise of e-commerce, and the emergence of innovative mobile payment methods propelled the market forward. E-invoicing through email, websites, e-post briefs, fax, and text messages became increasingly popular. The adoption of e-invoicing was anticipated to surge, particularly among small and medium enterprises (SMEs) in emerging economies.

    Machine learning and artificial intelligence technologies enhanced invoice accuracy and processing efficiency, while cloud-based systems ensured secure records and financial transparency. Cross-border trade transactions and purchase orders were simplified, reducing payment delays and waste generation. Blockchain technology ensured secure transactions and regulatory compliance. Financia

  17. Accounts Receivable Automation Market Analysis North America, Europe, APAC,...

    • technavio.com
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). Accounts Receivable Automation Market Analysis North America, Europe, APAC, Middle East and Africa, South America - US, China, UK, Germany, Canada, India, Japan, South Korea, France, UAE - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/accounts-receivable-automation-market-industry-analysis
    Explore at:
    Dataset updated
    Nov 28, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    United States, Global
    Description

    Snapshot img

    Accounts Receivable Automation Market Size 2024-2028

    The accounts receivable automation market size is forecast to increase by USD 968.4 million at a CAGR of 9% between 2023 and 2028.

    The market is experiencing significant growth due to the increasing adoption of advanced technologies such as invoice automation and payment gateway integration. Businesses are seeking to optimize their financial processes by implementing AR best practices, including invoice processing and credit risk management. Digital payments and subscription models are becoming increasingly popular, leading to a decrease in Day Sales Outstanding (DSO). Furthermore, the emergence of Machine Learning (ML) and Artificial Intelligence (AI) solutions for AR automation is revolutionizing the industry. However, data privacy and security concerns remain a challenge, necessitating stringent compliance measures. AR audit and consulting services are also gaining traction to help businesses navigate the complexities of AR technology implementation. In summary, the AR automation market is driven by the need for efficient invoice processing, improved credit risk management, and the adoption of advanced technologies, while addressing data privacy and security concerns remains a priority.
    

    What will be the Size of the Market During the Forecast Period?

    Request Free Sample

    Accounts receivable (AR) is a crucial business process that involves managing and collecting payments from customers for goods or services provided. Traditional AR methodologies have relied heavily on manual processing, which can lead to inefficiencies, payment issues, and increased overheads. However, the shift towards automation in AR processes has gained traction in the US business landscape, offering numerous benefits. Automated accounts receivable (AAR) systems streamline the AR process by integrating various functions such as invoice generation, payment processing, customer communication, and auditing. By automating these tasks, businesses can significantly reduce their accounting cycle time, improve cost-efficiency, and enhance customer experience.
    
    
    
    Furthermore, the market is poised for growth due to several factors. Firstly, the increasing complexity of financial systems necessitates the need for scalable and flexible solutions. Secondly, business leaders recognize the importance of business resilience in the face of identity frauds and cybercrimes. AAR systems offer strong data security measures, ensuring the protection of sensitive customer and financial information. Moreover, customer behavior and payment patterns continue to evolve, with a growing preference for digital payments and subscriptions. AAR systems can facilitate these payment methods, providing customers with convenient and flexible payment options. Additionally, AAR solutions offer deployment flexibility, enabling businesses to integrate them with their existing systems and processes.
    

    How is this market segmented and which is the largest segment?

    The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    Component
    
      Solution
      Services
    
    
    Geography
    
      North America
    
        Canada
        US
    
    
      Europe
    
        Germany
        UK
    
    
      APAC
    
        China
    
    
      Middle East and Africa
    
    
    
      South America
    

    By Component Insights

    The solution segment is estimated to witness significant growth during the forecast period.
    

    Accounts receivable automation has become a priority for businesses seeking to streamline their financial processes and enhance customer satisfaction. Cloud-based accounting solutions have gained popularity due to their accessibility and flexibility, allowing teams to collaborate effectively and process invoices in real-time. AR automation systems offer advanced functionalities such as cash application automation, dispute management, credit evaluation, and collection management. These solutions can process large volumes of customer invoices efficiently, reducing the workload on accounting teams and accelerating payment collections. The solution segment holds the largest market share in The market, and this trend is expected to continue.

    Furthermore, patient services organizations, in particular, stand to benefit significantly from AR automation, as it can help protect patient data while ensuring invoice accuracy and cost-effective operations. Business process optimization is a key driver of AR automation adoption, as it enables organizations to reduce manual errors, improve cash flow, and enhance overall financial management.

    Get a glance at the market report of share of various segments Request Free Sample

    The solution segment was valued at USD 743.40 million in 2018 and showed a gradual increase during the forecast period.

    Regional Ana

  18. A

    Invoice and Credit Notes over PS500

    • dtechtive.com
    • find.data.gov.scot
    xls
    Updated Dec 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angus Council (2023). Invoice and Credit Notes over PS500 [Dataset]. https://dtechtive.com/datasets/44097
    Explore at:
    xls(0.8755 MB)Available download formats
    Dataset updated
    Dec 12, 2023
    Dataset provided by
    Angus Council
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Scotland
    Description

    Every six months, we publish a list of all invoices and credit notes over PS500 we receive, providing details on supplier name, amount paid, invoice reference, gross amount and vat amount. Spend on premises, transport and supplies and services Payments to contractors who do work on our behalf. Other spend we incur in carrying out our business. Some areas of spend are covered by the Data Protection Act and are not published in full. This includes: Personal information, for example, individual payments for adoption and fostering, care related payments. * Payments to staff These entries have been omitted from the report.

  19. k

    Inv3D: a high-resolution 3D invoice dataset for template-guided single-image...

    • radar.kit.edu
    • radar-service.eu
    tar
    Updated Sep 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Philipp; Felix Hertlein; Alexander Naumann (2023). Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping - Meta data [Dataset]. http://doi.org/10.35097/1730
    Explore at:
    tar(1727488 bytes)Available download formats
    Dataset updated
    Sep 1, 2023
    Dataset provided by
    Karlsruhe Institute of Technology
    Hertlein, Felix
    Philipp, Patrick
    Naumann, Alexander
    Authors
    Patrick Philipp; Felix Hertlein; Alexander Naumann
    Description

    Each sample contains the following files: "flat_document.png" (2200x1700x3, uint8, 0-255), showcasing a document in perfect condition. "flat_information_delta.png" displays all texts which represent invoice data (2200x1700x3, uint8, 0-255). "flat_template.png" is an empty invoice template (2200x1700x3, uint8, 0-255). "flat_text_mask.png" visually presents all texts shown in the given document (2200x1700x3, uint8, 0-255). "warped_angle.png" shows warping-induced x- and y-axis angle (1600x1600x2, float32, -Pi to Pi). "warped_albedo.png" is an albedo map (1600x1600x3, uint8, 0-255). "warped_BM.npz" stores backward mapping, i. e. the realtive pixel shift from warped to normalized image for each pixel shifts (1600x1600x2, float32, 0-1). "warped_curvature.npz" has pixel-wise curvature of the warped document (1600x1600x1, float32, 0-inf). "warped_depth.npz" holds per-pixel depth between camera and document (1600x1600x3, float32, 0-inf). "warped_document.png" displays the warped document (1600x1600x3, uint8, 0-255). "warped_normal.npz" contains warped document normals (1600x1600x3, float32, -inf to inf). "warped_recon.png" features a chess-textured warped document (1600x1600x3, uint8, 0-255). "warped_text_mask.npz" is a boolean text pixel mask (1600x1600x1, bool8, True/False). "warped_UV.npz" stores warped texture coordinates (1600x1600x3, float32, 0-1). "warped_WC.npz" includes document coordinates in the 3D space (1600x1600x3, float32, -inf to inf). For more details see https://github.com/FelixHertlein/inv3d-generator. Released under CC BY-NC-SA 4.0. Excluded files are listed in 'restricted-license-files.txt' (located in record with DOI 10.35097/1730, "Inv3D: a high-resolution 3D invoice dataset for template-driven Single-Image Document Unwarping - Metadata"). These are for academic use only.

  20. e

    Accounts payable 2023

    • data.europa.eu
    • gimi9.com
    csv, json
    Updated Feb 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Södertälje kommun (2025). Accounts payable 2023 [Dataset]. https://data.europa.eu/data/datasets/https-catalog-sodertalje-se-store-12-resource-149/embed
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 27, 2025
    Dataset authored and provided by
    Södertälje kommun
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Description

    The data set of accounts receivable contains information about purchases made, supplier names, purchasing organization, voucher numbers and more. The voucher number is a unique identifier that can be used if you wish to see the invoice to which the purchase relates.

    In some cases, it contains [Contains personal data]. This means that the supplier is a sole proprietor and that the company's name and organization number are covered by the provisions of the general data protection regulation. When it says [May contain personal data], it means that it has not been possible to determine automatically whether the supplier is a sole proprietor or not; this may, for example, be foreign suppliers or compensation for expenses paid by employees.

    When it says [Can be covered by confidentiality], it means that any information regarding the purchase can be covered by confidentiality and that a more thorough examination must be made.

    To take part in the underlying invoices, use X municipality's e-service: https://service.organizationX.se/bestall-fakturabild and enter the voucher number for the invoice or invoices you are interested in.

    A description of the specification on which the data set is based is available via dataportal.se - see the field Fulfils for link.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Parsee.ai, invoices-example [Dataset]. https://huggingface.co/datasets/parsee-ai/invoices-example

invoices-example

parsee-ai/invoices-example

Explore at:
25 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
Parsee.ai
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Inoices Sample Dataset

This is a sample dataset generated on app.parsee.ai for invoices. The goal was to evaluate different LLMs on this RAG task using the Parsee evaluation tools. A full study can be found here: https://github.com/parsee-ai/parsee-datasets/blob/main/datasets/invoices/parsee-loader/README.md parsee-core version used: 0.1.3.11 This dataset was created on the basis of 15 sample invoices (PDF files). All PDF files are publicly accessible on parsee.ai, to access them… See the full description on the dataset page: https://huggingface.co/datasets/parsee-ai/invoices-example.

Search
Clear search
Close search
Google apps
Main menu