50 datasets found

h
invoices-example
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Parsee.ai, invoices-example [Dataset]. https://huggingface.co/datasets/parsee-ai/invoices-example
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
Parsee.ai
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Inoices Sample Dataset

This is a sample dataset generated on app.parsee.ai for invoices. The goal was to evaluate different LLMs on this RAG task using the Parsee evaluation tools. A full study can be found here: https://github.com/parsee-ai/parsee-datasets/blob/main/datasets/invoices/parsee-loader/README.md parsee-core version used: 0.1.3.11 This dataset was created on the basis of 15 sample invoices (PDF files). All PDF files are publicly accessible on parsee.ai, to access them… See the full description on the dataset page: https://huggingface.co/datasets/parsee-ai/invoices-example.
h
invoices-donut-data-v1
huggingface.co
opendatalab.com
Updated May 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katana ML (2023). invoices-donut-data-v1 [Dataset]. https://huggingface.co/datasets/katanaml-org/invoices-donut-data-v1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 11, 2023
Dataset authored and provided by
Katana ML
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for Invoices (Sparrow)

This dataset contains 500 invoice documents annotated and processed to be ready for Donut ML model fine-tuning. Annotation and data preparation task was done by Katana ML team. Sparrow - open-source data extraction solution by Katana ML. Original dataset info: Kozłowski, Marek; Weichbroth, Paweł (2021), “Samples of electronic invoices”, Mendeley Data, V2, doi: 10.17632/tnj49gpmtz.2
R
Invoice Management Dataset
universe.roboflow.com
zip
Updated Dec 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CVIP Workspace (2024). Invoice Management Dataset [Dataset]. https://universe.roboflow.com/cvip-workspace/invoice-management
Explore at:
zipAvailable download formats
Dataset updated
Dec 28, 2024
Dataset authored and provided by
CVIP Workspace
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Text Bounding Boxes
Description
Intelligent Invoice Management System

Project Description:
The Intelligent Invoice Management System is an advanced AI-powered platform designed to revolutionize traditional invoice processing. By automating the extraction, validation, and management of invoice data, this system addresses the inefficiencies, inaccuracies, and high costs associated with manual methods. It enables businesses to streamline operations, reduce human error, and expedite payment cycles.

Problem Statement:
Manual invoice processing involves labor-intensive tasks such as data entry, verification, and reconciliation. These processes are time-consuming, prone to errors, and can result in financial losses and delays. The diversity of invoice formats from various vendors adds complexity, making automation a critical need for efficiency and scalability.

Proposed Solution:
The Intelligent Invoice Management System automates the end-to-end process of invoice handling using AI and machine learning techniques. Core functionalities include:
1. Invoice Generation: Automatically generate PDF invoices in at least four formats, populated with synthetic data.
2. Data Development: Leverage a dataset containing fields such as receipt numbers, company details, sales tax information, and itemized tables to create realistic invoice samples.
3. AI-Powered Labeling: Use Tesseract OCR to extract labeled data from invoice images, and train YOLO for label recognition, ensuring precise identification of fields.
4. Database Integration: Store extracted information in a structured database for seamless retrieval and analysis.
5. Web-Based Information System: Provide a user-friendly platform to upload invoices and retrieve key metrics, such as:
- Total sales within a specified duration.
- Total sales tax paid during a given timeframe.
- Detailed invoice information in tabular form for specific date ranges.

Key Features and Deliverables:
1. Invoice Generation:
- Generate 20,000 invoices using an automated script.
- Include dummy logos, company details, and itemized tables for four items per invoice.

Label Definition and Format:

Define structured labels (TBLR, CLASS Name, Recognized Text).

Provide labels in both XML and JSON formats for seamless integration.

OCR and AI Training:

Automate labeling using Tesseract OCR for high-accuracy text recognition.

Train and test YOLO to detect and classify invoice fields (TBLR and CLASS).

Database Management:

Store OCR-extracted labels and field data in a database.

Enable efficient search and aggregation of invoice data.

Web-Based Interface:

Build a responsive system for users to upload invoices and retrieve data based on company name or NTN.

Display metrics and reports for total sales, tax paid, and invoice details over custom date ranges.

Expected Outcomes: - Reduction in manual effort and operational costs.
- Improved accuracy in invoice processing and financial reporting.
- Enhanced scalability and adaptability for diverse invoice formats.
- Faster turnaround time for invoice-related tasks.

By automating critical aspects of invoice management, this system delivers a robust and intelligent solution to meet the evolving needs of businesses.
C
Event Graph of BPI Challenge 2019
data.4tu.nl
zip
Updated Apr 22, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dirk Fahland (2021). Event Graph of BPI Challenge 2019 [Dataset]. http://doi.org/10.4121/14169614.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/14169614.v1
Dataset updated
Apr 22, 2021
Dataset provided by
4TU.ResearchData
Authors
Dirk Fahland
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Business process event data modeled as labeled property graphs

Data Format
-----------

The dataset comprises one labeled property graph in two different file formats.

#1) Neo4j .dump format

A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/

/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=

The .dump was created with Neo4j v3.5.

#2) .graphml format

A .zip file containing a .graphml file of the entire graph

Data Schema
-----------

The graph is a labeled property graph over business process event data. Each graph uses the following concepts

:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"

:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")

:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node

:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations

:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities

:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.

:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log

:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph

:REL relationship - placeholder for any structural relationship between two :Entity nodes

The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552

Data Contents
-------------

neo4j-bpic19-2021-02-17 (.dump|.graphml.zip)

An integrated graph describing the raw event data of the entire BPI Challenge 2019 dataset.
van Dongen, B.F. (Boudewijn) (2019): BPI Challenge 2019. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:d06aff4b-79f0-45e6-8ec8-e19730c248f1

This data originated from a large multinational company operating from The Netherlands in the area of coatings and paints and we ask participants to investigate the purchase order handling process for some of its 60 subsidiaries. In particular, the process owner has compliance questions. In the data, each purchase order (or purchase document) contains one or more line items. For each line item, there are roughly four types of flows in the data: (1) 3-way matching, invoice after goods receipt: For these items, the value of the goods receipt message should be matched against the value of an invoice receipt message and the value put during creation of the item (indicated by both the GR-based flag and the Goods Receipt flags set to true). (2) 3-way matching, invoice before goods receipt: Purchase Items that do require a goods receipt message, while they do not require GR-based invoicing (indicated by the GR-based IV flag set to false and the Goods Receipt flags set to true). For such purchase items, invoices can be entered before the goods are receipt, but they are blocked until goods are received. This unblocking can be done by a user, or by a batch process at regular intervals. Invoices should only be cleared if goods are received and the value matches with the invoice and the value at creation of the item. (3) 2-way matching (no goods receipt needed): For these items, the value of the invoice should match the value at creation (in full or partially until PO value is consumed), but there is no separate goods receipt message required (indicated by both the GR-based flag and the Goods Receipt flags set to false). (4)Consignment: For these items, there are no invoices on PO level as this is handled fully in a separate process. Here we see GR indicator is set to true but the GR IV flag is set to false and also we know by item type (consignment) that we do not expect an invoice against this item. Unfortunately, the complexity of the data goes further than just this division in four categories. For each purchase item, there can be many goods receipt messages and corresponding invoices which are subsequently paid. Consider for example the process of paying rent. There is a Purchase Document with one item for paying rent, but a total of 12 goods receipt messages with (cleared) invoices with a value equal to 1/12 of the total amount. For logistical services, there may even be hundreds of goods receipt messages for one line item. Overall, for each line item, the amounts of the line item, the goods receipt messages (if applicable) and the invoices have to match for the process to be compliant. Of course, the log is anonymized, but some semantics are left in the data, for example: The resources are split between batch users and normal users indicated by their name. The batch users are automated processes executed by different systems. The normal users refer to human actors in the process. The monetary values of each event are anonymized from the original data using a linear translation respecting 0, i.e. addition of multiple invoices for a single item should still lead to the original item worth (although there may be small rounding errors for numerical reasons). Company, vendor, system and document names and IDs are anonymized in a consistent way throughout the log. The company has the key, so any result can be translated by them to business insights about real customers and real purchase documents.

The case ID is a combination of the purchase document and the purchase item. There is a total of 76,349 purchase documents containing in total 251,734 items, i.e. there are 251,734 cases. In these cases, there are 1,595,923 events relating to 42 activities performed by 627 users (607 human users and 20 batch users). Sometimes the user field is empty, or NONE, which indicates no user was recorded in the source system. For each purchase item (or case) the following attributes are recorded: concept:name: A combination of the purchase document id and the item id, Purchasing Document: The purchasing document ID, Item: The item ID, Item Type: The type of the item, GR-Based Inv. Verif.: Flag indicating if GR-based invoicing is required (see above), Goods Receipt: Flag indicating if 3-way matching is required (see above), Source: The source system of this item, Doc. Category name: The name of the category of the purchasing document, Company: The subsidiary of the company from where the purchase originated, Spend classification text: A text explaining the class of purchase item, Spend area text: A text explaining the area for the purchase item, Sub spend area text: Another text explaining the area for the purchase item, Vendor: The vendor to which the purchase document was sent, Name: The name of the vendor, Document Type: The document type, Item Category: The category as explained above (3-way with GR-based invoicing, 3-way without, 2-way, consignment).

The data contains the following entities and their events

- PO - Purchase Order documents handled at a large multinational company operating from The Netherlands
- POItem - an item in a Purchase Order document describing a specific item to be purchased
- Resource - the user or worker handling the document or a specific item
- Vendor - the external organization from which an item is to be purchased

Data Size
---------

BPIC19, nodes: 1926651, relationships: 15082099
t
Inv3d: a high-resolution 3d invoice dataset for template-guided single-image...
service.tib.eu
Updated Nov 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Inv3d: a high-resolution 3d invoice dataset for template-guided single-image document unwarping - validation split - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1686
Explore at:
Dataset updated
Nov 28, 2024
Description
Abstract: Numerous business workflows involve printed forms, such as invoices or receipts, which are often manually digitalized to persistently search or store the data. As hardware scanners are costly and inflexible, smartphones are increasingly used for digitalization. Here, processing algorithms need to deal with prevailing environmental factors, such as shadows or crumples. Current state-of-the-art approaches learn supervised image dewarping models based on pairs of raw images and rectification meshes. The available results show promising predictive accuracies for dewarping, but generated errors still lead to sub-optimal information retrieval. In this paper, we explore the potential of improving dewarping models using additional, structured information in the form of invoice templates. We provide two core contributions: (1) a novel dataset, referred to as Inv3D, comprising synthetic and real-world high-resolution invoice images with structural templates, rectification meshes, and a multiplicity of per-pixel supervision signals and (2) a novel image dewarping algorithm, which extends the state-of-the-art approach GeoTr to leverage structural templates using attention. Our extensive evaluation includes an implementation of DewarpNet and shows that exploiting structured templates can improve the performance for image dewarping. We report superior performance for the proposed algorithm on our new benchmark for all metrics, including an improved local distortion of 26.1 %. We made our new dataset and all code publicly available at https://felixhertlein.github.io/inv3d. TechnicalRemarks: Each sample contains the following files: "flat_document.png" (2200x1700x3, uint8, 0-255), showcasing a document in perfect condition. "flat_information_delta.png" displays all texts which represent invoice data (2200x1700x3, uint8, 0-255). "flat_template.png" is an empty invoice template (2200x1700x3, uint8, 0-255). "flat_text_mask.png" visually presents all texts shown in the given document (2200x1700x3, uint8, 0-255). "warped_angle.png" shows warping-induced x- and y-axis angle (1600x1600x2, float32, -Pi to Pi). "warped_albedo.png" is an albedo map (1600x1600x3, uint8, 0-255). "warped_BM.npz" stores backward mapping, i. e. the realtive pixel shift from warped to normalized image for each pixel shifts (1600x1600x2, float32, 0-1). "warped_curvature.npz" has pixel-wise curvature of the warped document (1600x1600x1, float32, 0-inf). "warped_depth.npz" holds per-pixel depth between camera and document (1600x1600x3, float32, 0-inf). "warped_document.png" displays the warped document (1600x1600x3, uint8, 0-255). "warped_normal.npz" contains warped document normals (1600x1600x3, float32, -inf to inf). "warped_recon.png" features a chess-textured warped document (1600x1600x3, uint8, 0-255). "warped_text_mask.npz" is a boolean text pixel mask (1600x1600x1, bool8, True/False). "warped_UV.npz" stores warped texture coordinates (1600x1600x3, float32, 0-1). "warped_WC.npz" includes document coordinates in the 3D space (1600x1600x3, float32, -inf to inf). For more details see https://github.com/FelixHertlein/inv3d-generator. Released under CC BY-NC-SA 4.0. Excluded files are listed in 'restricted-license-files.txt' (located in record with DOI 10.35097/1730, "Inv3D: a high-resolution 3D invoice dataset for template-driven Single-Image Document Unwarping - Metadata"). These are for academic use only.
Sample Receipt
fisheries.noaa.gov
catalog.data.gov
Updated Mar 15, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NMFS Office Of Sustainable Fisheries (2018). Sample Receipt [Dataset]. https://www.fisheries.noaa.gov/inport/item/28593
Explore at:
Dataset updated
Mar 15, 2018
Dataset provided by
National Marine Fisheries Servicehttps://www.fisheries.noaa.gov/
Time period covered
2010 - May 29, 2125
Area covered
all over the continental US
Description
Each sample that is received by NSIL is assigned a laboratory number and a case file is initiated by the sample custodian. The case file will contain all relevant paperwork for that sample including the sample submission sheet, laboratory raw data worksheets, the final results report and any other relevant documentation. The sample custodian enters the client information into the NSIL Sample...
d
Invoices for Open Market Order (OMO) Charges
catalog.data.gov
data.cityofnewyork.us
Updated Jun 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2025). Invoices for Open Market Order (OMO) Charges [Dataset]. https://catalog.data.gov/dataset/invoices-for-open-market-order-omo-charges
Explore at:
Dataset updated
Jun 7, 2025
Dataset provided by
data.cityofnewyork.us
Description
Contains information about invoices submitted to HPD by private contractors under an OMO. This is part of the HPD Charge Data collection of data tables.
Dataset of invoices and receipts including annotation of relevant fields
zenodo.org
zip
Updated Apr 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francisco Cruz; Francisco Cruz; Mauro Castelli; Mauro Castelli (2022). Dataset of invoices and receipts including annotation of relevant fields [Dataset]. http://doi.org/10.5281/zenodo.6371710
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6371710
Dataset updated
Apr 3, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Francisco Cruz; Francisco Cruz; Mauro Castelli; Mauro Castelli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a dataset comprising 813 images of invoices and receipts of a private company in the Portuguese language. It also includes text files with the transcription of relevant fields for each document – seller name, seller address, seller tax identification, buyer tax identification, invoice date, invoice total amount, invoice tax amount, and document reference.
a
Invoice and Credit Notes over £500 - Dataset - Angus Council Open Data
opendata.angus.gov.uk
Updated Aug 3, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2011). Invoice and Credit Notes over £500 - Dataset - Angus Council Open Data [Dataset]. https://opendata.angus.gov.uk/dataset/invoice-and-credit-notes-over-f500
Explore at:
Dataset updated
Aug 3, 2011
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Every six months, we publish a list of all invoices and credit notes over £500 we receive, providing details on supplier name, amount paid, invoice reference, gross amount and vat amount. Spend on premises, transport and supplies and services Payments to contractors who do work on our behalf. Other spend we incur in carrying out our business. Some areas of spend are covered by the Data Protection Act and are not published in full. This includes: Personal information, for example, individual payments for adoption and fostering, care related payments. * Payments to staff These entries have been omitted from the report.
d
Company Invoice Data API | 20M+ Companies | 10 European Countries | 20+ Data...
datarade.ai
.json
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HitHorizons, Company Invoice Data API | 20M+ Companies | 10 European Countries | 20+ Data Points | Monthly Updated | GDPR-Compliant [Dataset]. https://datarade.ai/data-products/hithorizons-company-invoice-data-api-20m-companies-10-hithorizons
Explore at:
.jsonAvailable download formats
Dataset authored and provided by
HitHorizons
Area covered
United Kingdom
Description
HitHorizons Invoice Data API gives access to aggregated company data on 20M+ companies from 10 countries.

Available countries:

France United Kingdom Germany Poland Czech Republic Hungary Slovakia Latvia Estonia Austria

Parameters:

Id Company Name Company Alternative Name Street Address Street Number Location District Region Postal Code City Country Status Incorporation Date Dissolution Date National ID Tax ID Vat ID Parent ID Idents Inactive Company Type Company Type Normalized

parameters may vary depending on the country
R
Question Answers Label Dataset
universe.roboflow.com
zip
Updated Nov 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Question Answer Labelling (2022). Question Answers Label Dataset [Dataset]. https://universe.roboflow.com/question-answer-labelling/question-answers-label
Explore at:
zipAvailable download formats
Dataset updated
Nov 30, 2022
Dataset authored and provided by
Question Answer Labelling
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Objects Bounding Boxes
Description
Here are a few use cases for this project:

Digital Document Management: This model can be used to effectively organize and manage digital documents. By identifying areas such as headers, addresses, and vendors, it could streamline workflows in companies dealing with large amounts of papers, forms or invoices.

Automated Data Extraction: The model could be used in extracting pertinent information from documents automatically. For example, pulling out questions and answers from educational materials, extracting vendor or address information from invoices, or grabbing column headers from statistical reports.

Augmented Reality (AR) Applications: "Question Answers Label" can be utilized in AR glasses to give real-time information about objects a user sees, especially in the realm of paper documents.

Virtual Assistance: This model may be used to build a virtual assistant capable of reading and understanding physical documents. For instance, reading out a user's mail, helping learning from textbooks, or assisting in reviewing legal documents.

Accessibility Tools for Visually Impaired: The tool could be utilized to interpret written documents for visually impaired people by identifying and vocalizing text based on their classes (answers, questions, headers, etc).
OCR Receipts Text Detection - retail dataset
kaggle.com
Updated Sep 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2023). OCR Receipts Text Detection - retail dataset [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/ocr-receipts-text-detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
OCR Receipts from Grocery Stores Text Detection - retail dataset

The Grocery Store Receipts Dataset is a collection of photos captured from various grocery store receipts. This dataset is specifically designed for tasks related to Optical Character Recognition (OCR) and is useful for retail.

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset

Each image in the dataset is accompanied by bounding box annotations, indicating the precise locations of specific text segments on the receipts. The text segments are categorized into four classes: item, store, date_time and total.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F4d5c600731265119bb28668959d5c357%2FFrame%2016.png?generation=1695111877176656&alt=media" alt="">

Dataset structure

images - contains of original images of receipts

boxes - includes bounding box labeling for the original images

annotations.xml - contains coordinates of the bounding boxes and detected text, created for the original photo

Data Format

Each image from images folder is accompanied by an XML-annotation in the annotations.xml file indicating the coordinates of the bounding boxes and detected text . For each point, the x and y coordinates are provided.

Classes:

store - name of the grocery store

item - item in the receipt

date_time - date and time of the receipt

total - total price of the receipt

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F62643adde75dd6ca4e3f26909174ae40%2Fcarbon.png?generation=1695112527839805&alt=media" alt="">

Text Detection in the Receipts might be made in accordance with your requirements.

💴 Buy the Dataset: This is just an example of the data. Leave a request on https://trainingdata.pro/datasets to discuss your requirements, learn about the price and buy the dataset

TrainingData provides high-quality data annotation tailored to your needs

keywords: receipts reading, retail dataset, consumer goods dataset, grocery store dataset, supermarket dataset, deep learning, retail store management, pre-labeled dataset, annotations, text detection, text recognition, optical character recognition, document text recognition, detecting text-lines, object detection, scanned documents, deep-text-recognition, text area detection, text extraction, images dataset, image-to-text, object detection
e
City of Alavus purchase invoices 2022
data.europa.eu
unknown
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alavuden kaupunki, City of Alavus purchase invoices 2022 [Dataset]. https://data.europa.eu/data/datasets/4874de5b-a120-4a7b-92dc-b5f9e15c946c?locale=en
Explore at:
unknown(12580561)Available download formats
Dataset authored and provided by
Alavuden kaupunki
Area covered
Alavus
Description
The data includes the City of Alavude's business ID purchase invoices. The data includes, for example, the date, receipt and accounting information of the purchase invoice, the service category and the supplier's name and business ID. Invoices of private traders are shown in anonymised form in the data.
t
Inv3d: a high-resolution 3d invoice dataset for template-guided single-image...
service.tib.eu
Updated Nov 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Inv3d: a high-resolution 3d invoice dataset for template-guided single-image document unwarping - train split part 2 of 4 - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1694
Explore at:
Dataset updated
Nov 28, 2024
Description
Abstract: Numerous business workflows involve printed forms, such as invoices or receipts, which are often manually digitalized to persistently search or store the data. As hardware scanners are costly and inflexible, smartphones are increasingly used for digitalization. Here, processing algorithms need to deal with prevailing environmental factors, such as shadows or crumples. Current state-of-the-art approaches learn supervised image dewarping models based on pairs of raw images and rectification meshes. The available results show promising predictive accuracies for dewarping, but generated errors still lead to sub-optimal information retrieval. In this paper, we explore the potential of improving dewarping models using additional, structured information in the form of invoice templates. We provide two core contributions: (1) a novel dataset, referred to as Inv3D, comprising synthetic and real-world high-resolution invoice images with structural templates, rectification meshes, and a multiplicity of per-pixel supervision signals and (2) a novel image dewarping algorithm, which extends the state-of-the-art approach GeoTr to leverage structural templates using attention. Our extensive evaluation includes an implementation of DewarpNet and shows that exploiting structured templates can improve the performance for image dewarping. We report superior performance for the proposed algorithm on our new benchmark for all metrics, including an improved local distortion of 26.1 %. We made our new dataset and all code publicly available at https://felixhertlein.github.io/inv3d. TechnicalRemarks: Each sample contains the following files: "flat_document.png" (2200x1700x3, uint8, 0-255), showcasing a document in perfect condition. "flat_information_delta.png" displays all texts which represent invoice data (2200x1700x3, uint8, 0-255). "flat_template.png" is an empty invoice template (2200x1700x3, uint8, 0-255). "flat_text_mask.png" visually presents all texts shown in the given document (2200x1700x3, uint8, 0-255). "warped_angle.png" shows warping-induced x- and y-axis angle (1600x1600x2, float32, -Pi to Pi). "warped_albedo.png" is an albedo map (1600x1600x3, uint8, 0-255). "warped_BM.npz" stores backward mapping, i. e. the realtive pixel shift from warped to normalized image for each pixel shifts (1600x1600x2, float32, 0-1). "warped_curvature.npz" has pixel-wise curvature of the warped document (1600x1600x1, float32, 0-inf). "warped_depth.npz" holds per-pixel depth between camera and document (1600x1600x3, float32, 0-inf). "warped_document.png" displays the warped document (1600x1600x3, uint8, 0-255). "warped_normal.npz" contains warped document normals (1600x1600x3, float32, -inf to inf). "warped_recon.png" features a chess-textured warped document (1600x1600x3, uint8, 0-255). "warped_text_mask.npz" is a boolean text pixel mask (1600x1600x1, bool8, True/False). "warped_UV.npz" stores warped texture coordinates (1600x1600x3, float32, 0-1). "warped_WC.npz" includes document coordinates in the 3D space (1600x1600x3, float32, -inf to inf). For more details see https://github.com/FelixHertlein/inv3d-generator. Released under CC BY-NC-SA 4.0. Excluded files are listed in 'restricted-license-files.txt' (located in record with DOI 10.35097/1730, "Inv3D: a high-resolution 3D invoice dataset for template-driven Single-Image Document Unwarping - Metadata"). These are for academic use only.
S
Hospital Billing - Event Log
data.4tu.nl
figshare.com
zip
Updated Aug 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Felix Mannhardt (2017). Hospital Billing - Event Log [Dataset]. http://doi.org/10.4121/uuid:76c46b83-c930-4798-a1c9-4be94dfeb741
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:76c46b83-c930-4798-a1c9-4be94dfeb741
Dataset updated
Aug 1, 2017
Dataset provided by
Eindhoven University of Technology
Authors
Felix Mannhardt
License
https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
Time period covered
Dec 13, 2012 - Jan 19, 2016
Description
The 'Hospital Billing' event log was obtained from the financial modules of the ERP system of a regional hospital. The event log contains events that are related to the billing of medical services that have been provided by the hospital. Each trace of the event log records the activities executed to bill a package of medical services that were bundled together. The event log does not contain information about the actual medical services provided by the hospital.

The 100,000 traces in the event log are a random sample of process instances that were recorded over three years. Several attributes such as the 'state' of the process, the 'caseType', the underlying 'diagnosis' etc. are included in the event log. Events and attribute values have been anonymized. The time stamps of events have been randomized for this purpose, but the time between events within a trace has not been altered.

More information about the event log can be found in the related publications.

E-Invoicing Market Analysis, Size, and Forecast 2025-2029: Europe (Denmark,...

technavio.com

Updated Jan 15, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio (2025). E-Invoicing Market Analysis, Size, and Forecast 2025-2029: Europe (Denmark, France, Germany, UK), APAC (China, India, Japan, South Korea), North America (US and Canada), South America , and Middle East and Africa [Dataset]. https://www.technavio.com/report/e-invoicing-market-industry-analysis

Explore at:

Dataset updated

Jan 15, 2025

Dataset provided by

TechNavio

Authors

Technavio

Time period covered

2021 - 2025

Area covered

Global, United States

Description

Snapshot img

E-Invoicing Market Size 2025-2029

The e-invoicing market size is forecast to increase by USD 36.1 billion at a CAGR of 29.9% between 2024 and 2029.

The market is experiencing significant growth, driven by the convenience and easy accessibility of mobile payment systems. This trend is particularly prominent in regions with a high smartphone penetration and a growing preference for digital transactions. However, regulatory hurdles impact adoption in certain markets, as governments implement complex compliance requirements. These regulations aim to standardize invoicing processes and enhance tax revenue collection, but they add complexity to the market. Another key trend shaping the market is the increased security of documents using blockchain technology. This innovation addresses concerns around data privacy and security, which are increasingly important as businesses digitize their operations.
Yet, the market faces challenges from the threat of cyber-attacks, which can compromise sensitive financial information. Companies must invest in robust security measures to protect their systems and maintain customer trust. Effective implementation of these strategies will enable businesses to capitalize on the market's growth potential and navigate challenges effectively.

What will be the Size of the E-Invoicing Market during the forecast period?

Request Free Sample

In today's business landscape, the digitalization of invoicing is a significant trend, with an increasing number of businesses adopting electronic formats for their invoicing operations. This shift towards digital platforms is driven by the adoption of technology, regulatory initiatives, and tax obligations. Recipients now prefer receiving invoices digitally, making it essential for businesses to keep up with this trend. Digital transformation in financial management tools and accounting software has enabled the standardization of invoice formats, enhancing efficiency and sustainability. However, this transition comes with challenges, including security issues and the need for regulatory compliance. Artificial intelligence (AI) and machine learning (ML) are revolutionizing invoicing processes, automating system design and billing operations.
Cloud platforms provide a secure and accessible solution for businesses to manage their invoices, ensuring seamless electronic transactions. Despite the benefits, there are concerns around fraud and tax evasion. Businesses must invest in training and implementing automated systems to mitigate these risks. The future of invoicing lies in the digital method, as it streamlines business functions and enhances operational efficiency. Regulatory mandates require businesses to comply with strict guidelines for electronic invoicing, making it crucial for businesses to stay informed and adapt to the changing landscape. With the increasing importance of digital platforms, businesses must prioritize the security of their invoicing systems and invest in AI and ML technologies to stay competitive.

How is this E-Invoicing Industry segmented?

The e-invoicing industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

End-user

  B2B
  B2C


Deployment

  Cloud-based
  On-premises


Application

  Energy and Utilities
  FMCG
  E-Commerce
  BFSI
  Government
  Others


Geography

  North America

    US
    Canada


  Europe

    Denmark
    France
    Germany
    UK


  APAC

    China
    India
    Japan
    South Korea


  Rest of World (ROW)

By End-user Insights

The B2B segment is estimated to witness significant growth during the forecast period.

In the dynamic business landscape of 2024, the market experienced significant growth, fueled by the increasing globalization and the expanding presence of IT, banking, financial services and insurance (BFSI), and retail sectors. These industries sought centralized, Internet-based billing and invoicing solutions to streamline their operations. Regulatory requirements in banking and retail sectors, the rise of e-commerce, and the emergence of innovative mobile payment methods propelled the market forward. E-invoicing through email, websites, e-post briefs, fax, and text messages became increasingly popular. The adoption of e-invoicing was anticipated to surge, particularly among small and medium enterprises (SMEs) in emerging economies.

Machine learning and artificial intelligence technologies enhanced invoice accuracy and processing efficiency, while cloud-based systems ensured secure records and financial transparency. Cross-border trade transactions and purchase orders were simplified, reducing payment delays and waste generation. Blockchain technology ensured secure transactions and regulatory compliance. Financia

Accounts Receivable Automation Market Analysis North America, Europe, APAC,...

technavio.com

Updated Nov 28, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio (2024). Accounts Receivable Automation Market Analysis North America, Europe, APAC, Middle East and Africa, South America - US, China, UK, Germany, Canada, India, Japan, South Korea, France, UAE - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/accounts-receivable-automation-market-industry-analysis

Explore at:

Dataset updated

Nov 28, 2024

Dataset provided by

TechNavio

Authors

Technavio

Time period covered

2021 - 2025

Area covered

United States, Global

Description

Snapshot img

Accounts Receivable Automation Market Size 2024-2028

The accounts receivable automation market size is forecast to increase by USD 968.4 million at a CAGR of 9% between 2023 and 2028.

The market is experiencing significant growth due to the increasing adoption of advanced technologies such as invoice automation and payment gateway integration. Businesses are seeking to optimize their financial processes by implementing AR best practices, including invoice processing and credit risk management. Digital payments and subscription models are becoming increasingly popular, leading to a decrease in Day Sales Outstanding (DSO). Furthermore, the emergence of Machine Learning (ML) and Artificial Intelligence (AI) solutions for AR automation is revolutionizing the industry. However, data privacy and security concerns remain a challenge, necessitating stringent compliance measures. AR audit and consulting services are also gaining traction to help businesses navigate the complexities of AR technology implementation. In summary, the AR automation market is driven by the need for efficient invoice processing, improved credit risk management, and the adoption of advanced technologies, while addressing data privacy and security concerns remains a priority.

What will be the Size of the Market During the Forecast Period?

Request Free Sample

Accounts receivable (AR) is a crucial business process that involves managing and collecting payments from customers for goods or services provided. Traditional AR methodologies have relied heavily on manual processing, which can lead to inefficiencies, payment issues, and increased overheads. However, the shift towards automation in AR processes has gained traction in the US business landscape, offering numerous benefits. Automated accounts receivable (AAR) systems streamline the AR process by integrating various functions such as invoice generation, payment processing, customer communication, and auditing. By automating these tasks, businesses can significantly reduce their accounting cycle time, improve cost-efficiency, and enhance customer experience.



Furthermore, the market is poised for growth due to several factors. Firstly, the increasing complexity of financial systems necessitates the need for scalable and flexible solutions. Secondly, business leaders recognize the importance of business resilience in the face of identity frauds and cybercrimes. AAR systems offer strong data security measures, ensuring the protection of sensitive customer and financial information. Moreover, customer behavior and payment patterns continue to evolve, with a growing preference for digital payments and subscriptions. AAR systems can facilitate these payment methods, providing customers with convenient and flexible payment options. Additionally, AAR solutions offer deployment flexibility, enabling businesses to integrate them with their existing systems and processes.

How is this market segmented and which is the largest segment?

The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

Component

  Solution
  Services


Geography

  North America

    Canada
    US


  Europe

    Germany
    UK


  APAC

    China


  Middle East and Africa



  South America

By Component Insights

The solution segment is estimated to witness significant growth during the forecast period.

Accounts receivable automation has become a priority for businesses seeking to streamline their financial processes and enhance customer satisfaction. Cloud-based accounting solutions have gained popularity due to their accessibility and flexibility, allowing teams to collaborate effectively and process invoices in real-time. AR automation systems offer advanced functionalities such as cash application automation, dispute management, credit evaluation, and collection management. These solutions can process large volumes of customer invoices efficiently, reducing the workload on accounting teams and accelerating payment collections. The solution segment holds the largest market share in The market, and this trend is expected to continue.

Furthermore, patient services organizations, in particular, stand to benefit significantly from AR automation, as it can help protect patient data while ensuring invoice accuracy and cost-effective operations. Business process optimization is a key driver of AR automation adoption, as it enables organizations to reduce manual errors, improve cash flow, and enhance overall financial management.

Get a glance at the market report of share of various segments Request Free Sample

The solution segment was valued at USD 743.40 million in 2018 and showed a gradual increase during the forecast period.

Regional Ana

A
Invoice and Credit Notes over PS500
dtechtive.com
find.data.gov.scot
xls
Updated Dec 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angus Council (2023). Invoice and Credit Notes over PS500 [Dataset]. https://dtechtive.com/datasets/44097
Explore at:
xls(0.8755 MB)Available download formats
Dataset updated
Dec 12, 2023
Dataset provided by
Angus Council
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered
Scotland
Description
Every six months, we publish a list of all invoices and credit notes over PS500 we receive, providing details on supplier name, amount paid, invoice reference, gross amount and vat amount. Spend on premises, transport and supplies and services Payments to contractors who do work on our behalf. Other spend we incur in carrying out our business. Some areas of spend are covered by the Data Protection Act and are not published in full. This includes: Personal information, for example, individual payments for adoption and fostering, care related payments. * Payments to staff These entries have been omitted from the report.
k
Inv3D: a high-resolution 3D invoice dataset for template-guided single-image...
radar.kit.edu
radar-service.eu
tar
Updated Sep 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Philipp; Felix Hertlein; Alexander Naumann (2023). Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping - Meta data [Dataset]. http://doi.org/10.35097/1730
Explore at:
tar(1727488 bytes)Available download formats
Unique identifier
https://doi.org/10.35097/1730
Dataset updated
Sep 1, 2023
Dataset provided by
Karlsruhe Institute of Technology
Hertlein, Felix
Philipp, Patrick
Naumann, Alexander
Authors
Patrick Philipp; Felix Hertlein; Alexander Naumann
Description
Each sample contains the following files: "flat_document.png" (2200x1700x3, uint8, 0-255), showcasing a document in perfect condition. "flat_information_delta.png" displays all texts which represent invoice data (2200x1700x3, uint8, 0-255). "flat_template.png" is an empty invoice template (2200x1700x3, uint8, 0-255). "flat_text_mask.png" visually presents all texts shown in the given document (2200x1700x3, uint8, 0-255). "warped_angle.png" shows warping-induced x- and y-axis angle (1600x1600x2, float32, -Pi to Pi). "warped_albedo.png" is an albedo map (1600x1600x3, uint8, 0-255). "warped_BM.npz" stores backward mapping, i. e. the realtive pixel shift from warped to normalized image for each pixel shifts (1600x1600x2, float32, 0-1). "warped_curvature.npz" has pixel-wise curvature of the warped document (1600x1600x1, float32, 0-inf). "warped_depth.npz" holds per-pixel depth between camera and document (1600x1600x3, float32, 0-inf). "warped_document.png" displays the warped document (1600x1600x3, uint8, 0-255). "warped_normal.npz" contains warped document normals (1600x1600x3, float32, -inf to inf). "warped_recon.png" features a chess-textured warped document (1600x1600x3, uint8, 0-255). "warped_text_mask.npz" is a boolean text pixel mask (1600x1600x1, bool8, True/False). "warped_UV.npz" stores warped texture coordinates (1600x1600x3, float32, 0-1). "warped_WC.npz" includes document coordinates in the 3D space (1600x1600x3, float32, -inf to inf). For more details see https://github.com/FelixHertlein/inv3d-generator. Released under CC BY-NC-SA 4.0. Excluded files are listed in 'restricted-license-files.txt' (located in record with DOI 10.35097/1730, "Inv3D: a high-resolution 3D invoice dataset for template-driven Single-Image Document Unwarping - Metadata"). These are for academic use only.
e
Accounts payable 2023
data.europa.eu
gimi9.com
csv, json
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Södertälje kommun (2025). Accounts payable 2023 [Dataset]. https://data.europa.eu/data/datasets/https-catalog-sodertalje-se-store-12-resource-149/embed
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 27, 2025
Dataset authored and provided by
Södertälje kommun
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered

Description
The data set of accounts receivable contains information about purchases made, supplier names, purchasing organization, voucher numbers and more. The voucher number is a unique identifier that can be used if you wish to see the invoice to which the purchase relates.

In some cases, it contains [Contains personal data]. This means that the supplier is a sole proprietor and that the company's name and organization number are covered by the provisions of the general data protection regulation. When it says [May contain personal data], it means that it has not been possible to determine automatically whether the supplier is a sole proprietor or not; this may, for example, be foreign suppliers or compensation for expenses paid by employees.

When it says [Can be covered by confidentiality], it means that any information regarding the purchase can be covered by confidentiality and that a more thorough examination must be made.

To take part in the underlying invoices, use X municipality's e-service: https://service.organizationX.se/bestall-fakturabild and enter the voucher number for the invoice or invoices you are interested in.

A description of the specification on which the data set is based is available via dataportal.se - see the field Fulfils for link.

Facebook

Twitter

Click to copy link

Link copied

Cite

Parsee.ai, invoices-example [Dataset]. https://huggingface.co/datasets/parsee-ai/invoices-example

invoices-example

parsee-ai/invoices-example

Explore at:

25 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset provided by

Parsee.ai

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Inoices Sample Dataset

This is a sample dataset generated on app.parsee.ai for invoices. The goal was to evaluate different LLMs on this RAG task using the Parsee evaluation tools. A full study can be found here: https://github.com/parsee-ai/parsee-datasets/blob/main/datasets/invoices/parsee-loader/README.md parsee-core version used: 0.1.3.11 This dataset was created on the basis of 15 sample invoices (PDF files). All PDF files are publicly accessible on parsee.ai, to access them… See the full description on the dataset page: https://huggingface.co/datasets/parsee-ai/invoices-example.

Clear search

Close search

Google apps

Main menu

invoices-example

invoices-donut-data-v1

Invoice Management Dataset

Event Graph of BPI Challenge 2019

Inv3d: a high-resolution 3d invoice dataset for template-guided single-image...

Sample Receipt

Invoices for Open Market Order (OMO) Charges

Dataset of invoices and receipts including annotation of relevant fields

Invoice and Credit Notes over £500 - Dataset - Angus Council Open Data

Company Invoice Data API | 20M+ Companies | 10 European Countries | 20+ Data...

Question Answers Label Dataset

OCR Receipts Text Detection - retail dataset

OCR Receipts from Grocery Stores Text Detection - retail dataset

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset

Dataset structure

Data Format

Classes:

Text Detection in the Receipts might be made in accordance with your requirements.

💴 Buy the Dataset: This is just an example of the data. Leave a request on https://trainingdata.pro/datasets to discuss your requirements, learn about the price and buy the dataset

TrainingData provides high-quality data annotation tailored to your needs

City of Alavus purchase invoices 2022

Inv3d: a high-resolution 3d invoice dataset for template-guided single-image...

Hospital Billing - Event Log

E-Invoicing Market Analysis, Size, and Forecast 2025-2029: Europe (Denmark,...

Snapshot img

Accounts Receivable Automation Market Analysis North America, Europe, APAC,...

Snapshot img

Invoice and Credit Notes over PS500

Inv3D: a high-resolution 3D invoice dataset for template-guided single-image...

Accounts payable 2023

invoices-example

parsee-ai/invoices-example