15 datasets found

OCR Receipts Text Detection - retail dataset
kaggle.com
Updated Sep 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2023). OCR Receipts Text Detection - retail dataset [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/ocr-receipts-text-detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
OCR Receipts from Grocery Stores Text Detection - retail dataset

The Grocery Store Receipts Dataset is a collection of photos captured from various grocery store receipts. This dataset is specifically designed for tasks related to Optical Character Recognition (OCR) and is useful for retail.

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset

Each image in the dataset is accompanied by bounding box annotations, indicating the precise locations of specific text segments on the receipts. The text segments are categorized into four classes: item, store, date_time and total.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F4d5c600731265119bb28668959d5c357%2FFrame%2016.png?generation=1695111877176656&alt=media" alt="">

Dataset structure

images - contains of original images of receipts

boxes - includes bounding box labeling for the original images

annotations.xml - contains coordinates of the bounding boxes and detected text, created for the original photo

Data Format

Each image from images folder is accompanied by an XML-annotation in the annotations.xml file indicating the coordinates of the bounding boxes and detected text . For each point, the x and y coordinates are provided.

Classes:

store - name of the grocery store

item - item in the receipt

date_time - date and time of the receipt

total - total price of the receipt

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F62643adde75dd6ca4e3f26909174ae40%2Fcarbon.png?generation=1695112527839805&alt=media" alt="">

Text Detection in the Receipts might be made in accordance with your requirements.

💴 Buy the Dataset: This is just an example of the data. Leave a request on https://trainingdata.pro/datasets to discuss your requirements, learn about the price and buy the dataset

TrainingData provides high-quality data annotation tailored to your needs

keywords: receipts reading, retail dataset, consumer goods dataset, grocery store dataset, supermarket dataset, deep learning, retail store management, pre-labeled dataset, annotations, text detection, text recognition, optical character recognition, document text recognition, detecting text-lines, object detection, scanned documents, deep-text-recognition, text area detection, text extraction, images dataset, image-to-text, object detection
g
Database of expense reports published on Ma Dada
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Database of expense reports published on Ma Dada [Dataset]. https://gimi9.com/dataset/eu_65787e43d4fce4bb2751cebc/
Explore at:
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Motivations for the creation of the dataset The main objective of the creation of this dataset was to create a database listing the expense notes of the mayors who had responded positively to the requests made on the site requesting access to an administrative document, Madada.fr. The database lists only the expense notes that the town halls have kindly communicated publicly on Madada.fr. Indeed, the aim was to study the exact content of the mayors’ expense notes since 2020 available on the platform. We received no funding, the database was created as part of an article written for a student magazine (https://chicane-lemag.fr/) of the Masters of Information of Sciences Po Aix and EJCAM. The article entitled “What does the analysis of data on Mayors’ expense reports in 2022 tell us?” is available at this address: https://chicane-lemag.fr/16/2023/11/27/analyse-donnees-notes-de-frais-elus/ ## Composition of the dataset The dataset contains on each line the information of a specific expense note between 2020 and 2023 according to: — Date (day/month/year) — Community concerned — Supplier’s name — SIRET number — Main activity of the supplier — Object of expenditure — Supporting document — Amount in euros — Category of expenditure ## Data collection process The data were collected manually. Two students ensured this data collection. The collection took place from 1 October 2023 to 10 November 2023. They were collected from requests for expense reports on Madada.fr. They therefore represent not all the mayors of France but only those who have communicated their notes on the platform. ## Data pre-processing The data were cleaned and prepared by a careful reading of the supporting documents submitted by the mayors in order to justify their expense reports. The information was collected and then transferred to an Airtable database planned to identify these expense reports. ## Dissemination of the dataset The data is made available to all on data.gouv.fr under ODbL license. ## Dataset maintenance There are no plans to update these data. ## Legal and ethical considerations The publication of this data is governed by the right of access to administrative documents, as provided for in Book III of the Code of Relations between the Public and the Administration. The Council of State ruled that “notes of expenses and receipts of travel as well as notes of catering expenses and receipts of representation expenses of local elected officials or public officials constitute administrative documents, which may be communicated to any person who so requests” (Council of State 52521, reading of 8 February 2023).
Public Budget Database - Governmental receipts 1962-Current
catalog.data.gov
gimi9.com
Updated Mar 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of Management and Budget (2024). Public Budget Database - Governmental receipts 1962-Current [Dataset]. https://catalog.data.gov/dataset/public-budget-database-governmental-receipts-1962-current
Explore at:
Dataset updated
Mar 22, 2024
Dataset provided by
United States Office of Management and Budgethttp://www.whitehouse.gov/omb
Description
This file contains governmental receipts for 1962 through the current budget year, as well as four years of projections. It can be used to reproduce many of the totals published in the Budget and examine unpublished details below the levels of aggregation published in the Budget.
d
Transactional E-receipt Data | Hotel, Travel, Hospitality
datarade.ai
Updated Jun 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Measurable AI (2024). Transactional E-receipt Data | Hotel, Travel, Hospitality [Dataset]. https://datarade.ai/data-products/transactional-e-receipt-data-hotel-travel-hospitality-measurable-ai
Explore at:
Dataset updated
Jun 19, 2024
Dataset authored and provided by
Measurable AI
Area covered
Sri Lanka, Nigeria, Belgium, Jordan, Tunisia, Korea (Republic of), Argentina, United States of America, France, Germany
Description
Metrics that can be unearthed will be ones contained in the email booking invoice such as Hotel name, type of room, dates, check in and check out times, price paid, duration of stay. We can go back to 5 years of history.

We also have cancellation emails.

Any hotel vendor can be requested too. We will conduct a search in our database to see if it justifies a parser build to extract the data.
c
Sample Receipt
s.cnmilf.com
fisheries.noaa.gov
Updated Oct 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact, Custodian) (2024). Sample Receipt [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/sample-receipt1
Explore at:
Dataset updated
Oct 19, 2024
Dataset provided by
(Point of Contact, Custodian)
Description
Each sample that is received by NSIL is assigned a laboratory number and a case file is initiated by the sample custodian. The case file will contain all relevant paperwork for that sample including the sample submission sheet, laboratory raw data worksheets, the final results report and any other relevant documentation. The sample custodian enters the client information into the NSIL Sample tracking system (Sample receipt database) and generates appropriate client and sample receipt information. The laboratory analysts perform the appropriate analyses and record the results and whether the results are compliant or non-compliant with the assigned acceptance levels. The analysts also record the record of charges and the analytical and quality assurance units that were used to complete all analysis. The database is used to track samples analyzed by NSIL from sample receipt to reporting of results. It tracks numbers of samples, number of analytical units, types of samples, purpose for sampling ans analytical costs.
H
Hong Kong SAR, China Govt Consolidated Acc: ytd: Repayment of Bonds and...
ceicdata.com
Updated May 15, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2019). Hong Kong SAR, China Govt Consolidated Acc: ytd: Repayment of Bonds and Notes [Dataset]. https://www.ceicdata.com/en/hong-kong/government-consolidated-account-receipts-and-payments/govt-consolidated-acc-ytd-repayment-of-bonds-and-notes
Explore at:
Dataset updated
May 15, 2019
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 1, 2017 - Mar 1, 2018
Area covered
Hong Kong
Variables measured
Operating Statement
Description
Hong Kong Govt Consolidated Acc: Year to Date: Repayment of Bonds and Notes data was reported at 0.000 HKD mn in May 2018. This stayed constant from the previous number of 0.000 HKD mn for Apr 2018. Hong Kong Govt Consolidated Acc: Year to Date: Repayment of Bonds and Notes data is updated monthly, averaging 0.000 HKD mn from Jul 2014 (Median) to May 2018, with 47 observations. The data reached an all-time high of 9,687.800 HKD mn in Mar 2015 and a record low of 0.000 HKD mn in May 2018. Hong Kong Govt Consolidated Acc: Year to Date: Repayment of Bonds and Notes data remains active status in CEIC and is reported by The Treasury. The data is categorized under Global Database’s Hong Kong – Table HK.F002: Government Consolidated Account: Receipts and Payments.
C
Event Graph of BPI Challenge 2019
data.4tu.nl
zip
Updated Apr 22, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dirk Fahland (2021). Event Graph of BPI Challenge 2019 [Dataset]. http://doi.org/10.4121/14169614.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/14169614.v1
Dataset updated
Apr 22, 2021
Dataset provided by
4TU.ResearchData
Authors
Dirk Fahland
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Business process event data modeled as labeled property graphs

Data Format
-----------

The dataset comprises one labeled property graph in two different file formats.

#1) Neo4j .dump format

A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/

/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=

The .dump was created with Neo4j v3.5.

#2) .graphml format

A .zip file containing a .graphml file of the entire graph

Data Schema
-----------

The graph is a labeled property graph over business process event data. Each graph uses the following concepts

:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"

:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")

:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node

:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations

:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities

:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.

:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log

:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph

:REL relationship - placeholder for any structural relationship between two :Entity nodes

The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552

Data Contents
-------------

neo4j-bpic19-2021-02-17 (.dump|.graphml.zip)

An integrated graph describing the raw event data of the entire BPI Challenge 2019 dataset.
van Dongen, B.F. (Boudewijn) (2019): BPI Challenge 2019. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:d06aff4b-79f0-45e6-8ec8-e19730c248f1

This data originated from a large multinational company operating from The Netherlands in the area of coatings and paints and we ask participants to investigate the purchase order handling process for some of its 60 subsidiaries. In particular, the process owner has compliance questions. In the data, each purchase order (or purchase document) contains one or more line items. For each line item, there are roughly four types of flows in the data: (1) 3-way matching, invoice after goods receipt: For these items, the value of the goods receipt message should be matched against the value of an invoice receipt message and the value put during creation of the item (indicated by both the GR-based flag and the Goods Receipt flags set to true). (2) 3-way matching, invoice before goods receipt: Purchase Items that do require a goods receipt message, while they do not require GR-based invoicing (indicated by the GR-based IV flag set to false and the Goods Receipt flags set to true). For such purchase items, invoices can be entered before the goods are receipt, but they are blocked until goods are received. This unblocking can be done by a user, or by a batch process at regular intervals. Invoices should only be cleared if goods are received and the value matches with the invoice and the value at creation of the item. (3) 2-way matching (no goods receipt needed): For these items, the value of the invoice should match the value at creation (in full or partially until PO value is consumed), but there is no separate goods receipt message required (indicated by both the GR-based flag and the Goods Receipt flags set to false). (4)Consignment: For these items, there are no invoices on PO level as this is handled fully in a separate process. Here we see GR indicator is set to true but the GR IV flag is set to false and also we know by item type (consignment) that we do not expect an invoice against this item. Unfortunately, the complexity of the data goes further than just this division in four categories. For each purchase item, there can be many goods receipt messages and corresponding invoices which are subsequently paid. Consider for example the process of paying rent. There is a Purchase Document with one item for paying rent, but a total of 12 goods receipt messages with (cleared) invoices with a value equal to 1/12 of the total amount. For logistical services, there may even be hundreds of goods receipt messages for one line item. Overall, for each line item, the amounts of the line item, the goods receipt messages (if applicable) and the invoices have to match for the process to be compliant. Of course, the log is anonymized, but some semantics are left in the data, for example: The resources are split between batch users and normal users indicated by their name. The batch users are automated processes executed by different systems. The normal users refer to human actors in the process. The monetary values of each event are anonymized from the original data using a linear translation respecting 0, i.e. addition of multiple invoices for a single item should still lead to the original item worth (although there may be small rounding errors for numerical reasons). Company, vendor, system and document names and IDs are anonymized in a consistent way throughout the log. The company has the key, so any result can be translated by them to business insights about real customers and real purchase documents.

The case ID is a combination of the purchase document and the purchase item. There is a total of 76,349 purchase documents containing in total 251,734 items, i.e. there are 251,734 cases. In these cases, there are 1,595,923 events relating to 42 activities performed by 627 users (607 human users and 20 batch users). Sometimes the user field is empty, or NONE, which indicates no user was recorded in the source system. For each purchase item (or case) the following attributes are recorded: concept:name: A combination of the purchase document id and the item id, Purchasing Document: The purchasing document ID, Item: The item ID, Item Type: The type of the item, GR-Based Inv. Verif.: Flag indicating if GR-based invoicing is required (see above), Goods Receipt: Flag indicating if 3-way matching is required (see above), Source: The source system of this item, Doc. Category name: The name of the category of the purchasing document, Company: The subsidiary of the company from where the purchase originated, Spend classification text: A text explaining the class of purchase item, Spend area text: A text explaining the area for the purchase item, Sub spend area text: Another text explaining the area for the purchase item, Vendor: The vendor to which the purchase document was sent, Name: The name of the vendor, Document Type: The document type, Item Category: The category as explained above (3-way with GR-based invoicing, 3-way without, 2-way, consignment).

The data contains the following entities and their events

- PO - Purchase Order documents handled at a large multinational company operating from The Netherlands
- POItem - an item in a Purchase Order document describing a specific item to be purchased
- Resource - the user or worker handling the document or a specific item
- Vendor - the external organization from which an item is to be purchased

Data Size
---------

BPIC19, nodes: 1926651, relationships: 15082099
c
European State Finance Database; Seventeenth Century French Revenues and...
datacatalogue.cessda.eu
Updated Nov 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bonney, R., University of Leicester (2024). European State Finance Database; Seventeenth Century French Revenues and Expenditure, Malet Files [Dataset]. http://doi.org/10.5255/UKDA-SN-3068-1
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-3068-1
Dataset updated
Nov 28, 2024
Dataset provided by
Department of History
Authors
Bonney, R., University of Leicester
Time period covered
Jan 1, 1993
Area covered
France
Variables measured
National, Economic indicators
Measurement technique
Compilation or synthesis of existing material
Description
Abstract copyright UK Data Service and data collection copyright owner.
The European State Finance Database (ESFD) is an international collaborative research project for the collection of data in European fiscal history. There are no strict geographical or chronological boundaries to the collection, although data for this collection comprise the period between c.1200 to c.1815. The purpose of the ESFD was to establish a significant database of European financial and fiscal records. The data are drawn from the main extant sources of a number of European countries, as the evidence and the state of scholarship permit. The aim was to collect the data made available by scholars, whether drawing upon their published or unpublished archival research, or from other published material.
The ESFD project at the University of Leicester serves also to assist scholars working with the data by providing statistical manipulations of data and high quality graphical outputs for publication. The broad aim of the project was to act as a facilitator for a general methodological and statistical advance in the area of European fiscal history, with data capture and the interpretation of data in key publications as the measurable indicators of that advance. The data were originally deposited at the UK Data Archive in SAS transport format and as ASCII files; however, data files in this new edition have been saved as tab delimited files. Furthermore, this new edition features documentation in the form of a single file containing essential data file metadata, source details and notes of interest for particular files.

Main Topics:

The files in this dataset relate to the datafiles held in the Leicester database in the directory /rjb/malet/*.*, excluding the derived datafiles, which are held in SN 3096. These data on seventeenth century French revenues and expenditure supplied by the Project Director, Professor Richard Bonney, draw upon J.R. Malet, Comptes rendus de l'administration des finances du royaume de France (London, 1789). For a discussion of this source in English, consult Bonney, R.J., 'Jean Roland Malet: historian of the finances of the French monarchy', French History, 5 (1991), 180-233.
File Information:
g068md01.* Malet's figures for royal expenditure in France, 1600-10
g068md02.* Malet's figures for royal expenditure in France, 1611-42
g068md03.* Malet's figures for royal expenditure in France, 1643-56
g068md04.* Malet's figures for royal expenditure in France, 1661-88
g068md05.* Malet's figures for royal expenditure in France, 1689-95
g068md06.* Malet's figures for receipts from the pays d'elections, 1600-10
g068md07.* Malet's figures for receipts from the pays d'elections, 1611-42
g068md08.* Malet's figures for receipts from the pays d'elections, 1643-56
g068md09.* Malet's figures for receipts from the pays d'elections, 1661-88
g068md10.* Malet's figures for receipts from the pays d'elections, 1661-88 (charges)
g068md11.* Malet's figures for receipts from the pays d'elections, 1661-88 (net to Treasury)
g068md12.* Malet's figures for receipts from the pays d'elections, 1689-95
g068md13.* Malet's figures for receipts from the pays d'elections, 1689-95 (charges)
g068md14.* Malet's figures for receipts from the pays d'elections, 1689-95 (net to Treasury)
g068md15.* Malet's figures for receipts from the pays d'etats, 1600-10
g068md16.* Malet's figures for receipts from the pays d'etats, 1611-42
g068md17.* Malet's figures for receipts from the pays d'etats, 1643-56
g068md18.* Malet's figures for receipts from the pays d'etats, 1661-88
g068md19.* Malet's figures for receipts from the pays d'etats, 1661-88 (charges)
g068md20.* Malet's figures for receipts from the pays d'etats, 1661-88 (net to Treasury)
g068md21.* Malet's figures for receipts from the pays d'etats, 1689-95
g068md22.* Malet's figures for receipts from the pays d'etats, 1689-95 (charges)
g068md23.* Malet's figures for receipts from the pays d'etats, 1689-95 (net to Treasury)
g068md24.* Malet's figures for dons gratuits from the pays d'etats, 1661-88
g068md25.* Malet's figures for dons gratuits from the pays d'etats, 1661-88 (charges)
g068md26.* Malet's figures for dons gratuits from the pays d'etats, 1661-88 (net to Treasury)
g068md27.* Malet's figures for dons gratuits from the pays d'etats, 1689-95
g068md28.* Malet's figures for dons gratuits from the pays d'etats, 1689-95 (charges)
g068md29.* Malet's figures for dons gratuits from the pays d'etats, 1689-95 (net to Treasury)
g068md30.* Malet's figures for receipts from the revenue farms, 1600-10
g068md31.* Malet's figures for receipts from the revenue farms, 1611-42
g068md32.* Malet's figures for receipts from the revenue farms, 1643-56
g068md33.* Malet's figures for receipts from the revenue...
IDSEM Dataset
zenodo.org
zip
Updated Dec 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Javier Sánchez; Javier Sánchez; Agustín Salgado; Agustín Salgado; Alejandro García; Nelson Monzón; Nelson Monzón; Alejandro García (2022). IDSEM Dataset [Dataset]. http://doi.org/10.5281/zenodo.6373179
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6373179
Dataset updated
Dec 2, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Javier Sánchez; Javier Sánchez; Agustín Salgado; Agustín Salgado; Alejandro García; Nelson Monzón; Nelson Monzón; Alejandro García
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This database contains electricity bills related to energy consumption in Spanish households. The contents of bills are automatically generated following some statistics from official bodies. The main purpose of the dataset is for training machine learning algorithms, especially for designing new methods for extracting information from invoices. There are 86 different labels, which are related to several topics, such as the customer and marketer, the contract, energy consumption, or billing.

The total number of invoices is 75.000. The files are organized in two directories: a training directory, with six subdirectories, each containing 5.000 invoices in PDF format and the corresponding labels in JSON files; and a test directory, with nine subdirectories, each containing 5.000 invoices in PDF format.

There are two main zip files that contain the test and training sets (test.zip and training.zip). In addition, we have included separate files with a subset of the directories in each set, so it can be downloaded by parts. There is also a reduced version of the dataset with 100 invoices per directory, which is interesting for users who want to preview the content of the dataset before downloading it.

IDSEM is an acronym for "an Invoices Database for the Spanish Electricity Market". More information can be found at https://idsem.ulpgc.es/ and in the following article:

[1] Javier Sánchez, Agustín Salgado, Alejandro García, and Nelson Monzón, "IDSEM, an invoices database of the Spanish electricity market", Sci. Data, (2022).
R
Invoice Management Dataset
universe.roboflow.com
zip
Updated Dec 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CVIP Workspace (2024). Invoice Management Dataset [Dataset]. https://universe.roboflow.com/cvip-workspace/invoice-management
Explore at:
zipAvailable download formats
Dataset updated
Dec 28, 2024
Dataset authored and provided by
CVIP Workspace
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Text Bounding Boxes
Description
Intelligent Invoice Management System

Project Description:
The Intelligent Invoice Management System is an advanced AI-powered platform designed to revolutionize traditional invoice processing. By automating the extraction, validation, and management of invoice data, this system addresses the inefficiencies, inaccuracies, and high costs associated with manual methods. It enables businesses to streamline operations, reduce human error, and expedite payment cycles.

Problem Statement:
Manual invoice processing involves labor-intensive tasks such as data entry, verification, and reconciliation. These processes are time-consuming, prone to errors, and can result in financial losses and delays. The diversity of invoice formats from various vendors adds complexity, making automation a critical need for efficiency and scalability.

Proposed Solution:
The Intelligent Invoice Management System automates the end-to-end process of invoice handling using AI and machine learning techniques. Core functionalities include:
1. Invoice Generation: Automatically generate PDF invoices in at least four formats, populated with synthetic data.
2. Data Development: Leverage a dataset containing fields such as receipt numbers, company details, sales tax information, and itemized tables to create realistic invoice samples.
3. AI-Powered Labeling: Use Tesseract OCR to extract labeled data from invoice images, and train YOLO for label recognition, ensuring precise identification of fields.
4. Database Integration: Store extracted information in a structured database for seamless retrieval and analysis.
5. Web-Based Information System: Provide a user-friendly platform to upload invoices and retrieve key metrics, such as:
- Total sales within a specified duration.
- Total sales tax paid during a given timeframe.
- Detailed invoice information in tabular form for specific date ranges.

Key Features and Deliverables:
1. Invoice Generation:
- Generate 20,000 invoices using an automated script.
- Include dummy logos, company details, and itemized tables for four items per invoice.

Label Definition and Format:

Define structured labels (TBLR, CLASS Name, Recognized Text).

Provide labels in both XML and JSON formats for seamless integration.

OCR and AI Training:

Automate labeling using Tesseract OCR for high-accuracy text recognition.

Train and test YOLO to detect and classify invoice fields (TBLR and CLASS).

Database Management:

Store OCR-extracted labels and field data in a database.

Enable efficient search and aggregation of invoice data.

Web-Based Interface:

Build a responsive system for users to upload invoices and retrieve data based on company name or NTN.

Display metrics and reports for total sales, tax paid, and invoice details over custom date ranges.

Expected Outcomes: - Reduction in manual effort and operational costs.
- Improved accuracy in invoice processing and financial reporting.
- Enhanced scalability and adaptability for diverse invoice formats.
- Faster turnaround time for invoice-related tasks.

By automating critical aspects of invoice management, this system delivers a robust and intelligent solution to meet the evolving needs of businesses.
H
Hong Kong SAR, China Govt Consolidated Acc: CE: Interest & Expenses on Bonds...
ceicdata.com
Updated Mar 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2023). Hong Kong SAR, China Govt Consolidated Acc: CE: Interest & Expenses on Bonds & Notes [Dataset]. https://www.ceicdata.com/en/hong-kong/government-consolidated-account-receipts-and-payments-annual/govt-consolidated-acc-ce-interest--expenses-on-bonds--notes
Explore at:
Dataset updated
Mar 15, 2023
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 1, 2006 - Mar 1, 2017
Area covered
Hong Kong
Variables measured
Operating Statement
Description
Hong Kong Govt Consolidated Acc: CE: Interest & Expenses on Bonds & Notes data was reported at 76.669 HKD mn in 2017. This records a decrease from the previous number of 77.301 HKD mn for 2016. Hong Kong Govt Consolidated Acc: CE: Interest & Expenses on Bonds & Notes data is updated yearly, averaging 574.844 HKD mn from Mar 2005 (Median) to 2017, with 13 observations. The data reached an all-time high of 850.524 HKD mn in 2006 and a record low of 76.669 HKD mn in 2017. Hong Kong Govt Consolidated Acc: CE: Interest & Expenses on Bonds & Notes data remains active status in CEIC and is reported by The Treasury. The data is categorized under Global Database’s Hong Kong – Table HK.F003: Government Consolidated Account: Receipts and Payments: Annual.
c
European State Finance Database; French Revenues and Expenditure, Derived...
datacatalogue.cessda.eu
Updated Nov 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bonney, R., University of Leicester (2024). European State Finance Database; French Revenues and Expenditure, Derived Files, 1594-1785 [Dataset]. http://doi.org/10.5255/UKDA-SN-3096-1
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-3096-1
Dataset updated
Nov 28, 2024
Dataset provided by
Department of History
Authors
Bonney, R., University of Leicester
Area covered
France
Variables measured
National, Economic indicators
Measurement technique
Compilation or synthesis of existing material
Description
Abstract copyright UK Data Service and data collection copyright owner.
The European State Finance Database (ESFD) is an international collaborative research project for the collection of data in European fiscal history. There are no strict geographical or chronological boundaries to the collection, although data for this collection comprise the period between c.1200 to c.1815. The purpose of the ESFD was to establish a significant database of European financial and fiscal records. The data are drawn from the main extant sources of a number of European countries, as the evidence and the state of scholarship permit. The aim was to collect the data made available by scholars, whether drawing upon their published or unpublished archival research, or from other published material.
The ESFD project at the University of Leicester serves also to assist scholars working with the data by providing statistical manipulations of data and high quality graphical outputs for publication. The broad aim of the project was to act as a facilitator for a general methodological and statistical advance in the area of European fiscal history, with data capture and the interpretation of data in key publications as the measurable indicators of that advance. The data were originally deposited at the UK Data Archive in SAS transport format and as ASCII files; however, data files in this new edition have been saved as tab delimited files. Furthermore, this new edition features documentation in the form of a single file containing essential data file metadata, source details and notes of interest for particular files.

Main Topics:

The files in this dataset relate to the derived datafiles held in the Leicester database in the directory /rjb/malet/*.* . These data on seventeenth century French revenues and expenditure supplied by the Project Director, Professor Richard Bonney, draw upon J-R Malet, Comptes rendus de l'administration des finances du royaume de France, (London, 1789). For a discussion of this source in English, cosult Bonney, R.J.,Jean-Roland Malet: historian of the finances of the French monarchy', French History, 5 (1991), 180-233.
File Information
g096mm01.* Malet's figures for royal expenditure in France, 1600-95
g096mm04.* Malet's figures for receipts from the pays d'elections and the pays d'etats, 1600-95
g096mm09.* Malet's figures for receipts from the pays d'elections, 1600-1695
g096mm14.* Malet's figures for ordinary and extraordinary royal expenditure in France compared with other sources, 1600-95
g096mm15.* Malet's figures for receipts from the revenue farms, 1600-95
g096mm20.* Malet's figures for other ordinary receipts and deniers extraordinaires, 1600-95
g096mm21.* Malet's recapitulation tables, 1600-95
g096mm23.* Malet's figures for royal revenues in France, 1600-95
g096mm24.* Malet's figures for receipts from the pays d'elections, 1661-95, compared with other sources
g096mm25.* Malet's figures for receipts from the pays d'etats compared with other sources, 1661-99
g096mm26.* Malet's figures for receipts from the revenue farms compared with another source, 1661-99
g096mm27.* Malet's figures for ordinary and extraordinary royal revenue in France, compared with another source, 1600-99
g096mm28.* Malet's figures for royal expenditure in France, 1600-95
g096mm29.* Malet's figures for royal revenues in France, 1610-44
g096mm30.* Malet's figures for receipts from the revenue farms compared with rents owed by revenue farmers according to their leases, 1610-45
g096mm31.* Malet's figures for receipts from the pays d'elections and from the pays d'etats, 1600-95
g096mm32.* Malet's figures for receipts from the pays d'elections compared with tax levied, 1594-1643
g096mm33.* Richelieu's reform plan, 1640, compared with Malet's figures for the same year
g096mm34.* Figures for royal expenditure and revenue in France (total and ordinary) from various sources, 1600-1785
g096mm35.* Royal expenditure in France, 1600-1710, from various sources
g096mm36.* Ordinary revenues of the French monarchy, 1661-99
g096mm37.* Total revenues of the French monarchy set against coinage output, 1600-1715
g096mm38.* Eleven year centred moving averages of index numbers for total royal revenue in France, 1600-1785 (base index 1600-30)
g096mm39.* Eleven year centred moving averages of 'real' (ie. deflated) index numbers for total royal revenue in France, 1600-1785 (base index 1600-30)
g096mm40.* Total royal revenue in France from various sources, 1660-1775, converted into pounds sterling
g096mm41.* Royal expenditure in France, 1600-1716
g096mm42.* Eleven year centred moving averages of index numbers for total royal expenditure in France, 1600-1715 (base index 1600-30)
g096mm43.* Eleven year centred moving averages of 'real' (ie. deflated) index numbers for total royal...
Jordan Tourism Receipts: Arab
ceicdata.com
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2025). Jordan Tourism Receipts: Arab [Dataset]. https://www.ceicdata.com/en/jordan/tourism-receipts-and-expenditures/tourism-receipts-arab
Explore at:
Dataset updated
Jan 15, 2025
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2017 - Dec 1, 2017
Area covered
Jordan
Variables measured
Tourism Statistics
Description
Jordan Tourism Receipts: Arab data was reported at 65.446 JOD mn in Jun 2018. This records an increase from the previous number of 59.879 JOD mn for May 2018. Jordan Tourism Receipts: Arab data is updated monthly, averaging 49.691 JOD mn from Jan 2002 (Median) to Jun 2018, with 198 observations. The data reached an all-time high of 97.444 JOD mn in Jul 2012 and a record low of 13.600 JOD mn in Apr 2003. Jordan Tourism Receipts: Arab data remains active status in CEIC and is reported by Central Bank of Jordan. The data is categorized under Global Database’s Jordan – Table JO.Q012: Tourism Receipts and Expenditures .
Hong Kong SAR, China Govt Consolidated Acc: OR: IR: IT: SD: Contract Notes
ceicdata.com
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2025). Hong Kong SAR, China Govt Consolidated Acc: OR: IR: IT: SD: Contract Notes [Dataset]. https://www.ceicdata.com/en/hong-kong/government-consolidated-account-receipts-and-payments-annual/govt-consolidated-acc-or-ir-it-sd-contract-notes
Explore at:
Dataset updated
Jan 15, 2025
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 1, 2006 - Mar 1, 2017
Area covered
Hong Kong
Variables measured
Operating Statement
Description
Hong Kong Govt Consolidated Acc: OR: IR: IT: SD: Contract Notes data was reported at 23,567.300 HKD mn in 2017. This records a decrease from the previous number of 33,410.000 HKD mn for 2016. Hong Kong Govt Consolidated Acc: OR: IR: IT: SD: Contract Notes data is updated yearly, averaging 6,948.700 HKD mn from Mar 1991 (Median) to 2017, with 27 observations. The data reached an all-time high of 35,447.000 HKD mn in 2008 and a record low of 2,145.500 HKD mn in 1991. Hong Kong Govt Consolidated Acc: OR: IR: IT: SD: Contract Notes data remains active status in CEIC and is reported by Inland Revenue Department. The data is categorized under Global Database’s Hong Kong – Table HK.F003: Government Consolidated Account: Receipts and Payments: Annual.
Iraq IQ: BOP: Current Account: Personal Transfers: Receipts
ceicdata.com
Updated May 13, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2018). Iraq IQ: BOP: Current Account: Personal Transfers: Receipts [Dataset]. https://www.ceicdata.com/en/iraq/balance-of-payments-current-account/iq-bop-current-account-personal-transfers-receipts
Explore at:
Dataset updated
May 13, 2018
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 1, 2005 - Dec 1, 2016
Area covered
Iraq
Variables measured
Balance of Payment
Description
Iraq IQ: BOP: Current Account: Personal Transfers: Receipts data was reported at 941.300 USD mn in 2016. This records a decrease from the previous number of 954.100 USD mn for 2015. Iraq IQ: BOP: Current Account: Personal Transfers: Receipts data is updated yearly, averaging 250.100 USD mn from Dec 2005 (Median) to 2016, with 12 observations. The data reached an all-time high of 954.100 USD mn in 2015 and a record low of 2.500 USD mn in 2007. Iraq IQ: BOP: Current Account: Personal Transfers: Receipts data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Iraq – Table IQ.World Bank.WDI: Balance of Payments: Current Account. Personal transfers consist of all current transfers in cash or in kind made or received by resident households to or from nonresident households. Personal transfers thus include all current transfers between resident and nonresident individuals. Data are in current U.S. dollars.; ; International Monetary Fund, Balance of Payments Statistics Yearbook and data files.; Sum; Note: Data are based on the sixth edition of the IMF's Balance of Payments Manual (BPM6) and are only available from 2005 onwards.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Training Data (2023). OCR Receipts Text Detection - retail dataset [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/ocr-receipts-text-detection

OCR Receipts Text Detection - retail dataset

Photos of the receipts and text detection - ocr dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 19, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Training Data

License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

OCR Receipts from Grocery Stores Text Detection - retail dataset

The Grocery Store Receipts Dataset is a collection of photos captured from various grocery store receipts. This dataset is specifically designed for tasks related to Optical Character Recognition (OCR) and is useful for retail.

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset

Each image in the dataset is accompanied by bounding box annotations, indicating the precise locations of specific text segments on the receipts. The text segments are categorized into four classes: item, store, date_time and total.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F4d5c600731265119bb28668959d5c357%2FFrame%2016.png?generation=1695111877176656&alt=media" alt="">

Dataset structure

images - contains of original images of receipts
boxes - includes bounding box labeling for the original images
annotations.xml - contains coordinates of the bounding boxes and detected text, created for the original photo

Data Format

Each image from images folder is accompanied by an XML-annotation in the annotations.xml file indicating the coordinates of the bounding boxes and detected text . For each point, the x and y coordinates are provided.

Classes:

store - name of the grocery store
item - item in the receipt
date_time - date and time of the receipt
total - total price of the receipt

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F62643adde75dd6ca4e3f26909174ae40%2Fcarbon.png?generation=1695112527839805&alt=media" alt="">

Text Detection in the Receipts might be made in accordance with your requirements.

💴 Buy the Dataset: This is just an example of the data. Leave a request on https://trainingdata.pro/datasets to discuss your requirements, learn about the price and buy the dataset

TrainingData provides high-quality data annotation tailored to your needs

keywords: receipts reading, retail dataset, consumer goods dataset, grocery store dataset, supermarket dataset, deep learning, retail store management, pre-labeled dataset, annotations, text detection, text recognition, optical character recognition, document text recognition, detecting text-lines, object detection, scanned documents, deep-text-recognition, text area detection, text extraction, images dataset, image-to-text, object detection

Clear search

Close search

Google apps

Main menu

OCR Receipts Text Detection - retail dataset

OCR Receipts from Grocery Stores Text Detection - retail dataset

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset

Dataset structure

Data Format

Classes:

Text Detection in the Receipts might be made in accordance with your requirements.

💴 Buy the Dataset: This is just an example of the data. Leave a request on https://trainingdata.pro/datasets to discuss your requirements, learn about the price and buy the dataset

TrainingData provides high-quality data annotation tailored to your needs

Database of expense reports published on Ma Dada

Public Budget Database - Governmental receipts 1962-Current

Transactional E-receipt Data | Hotel, Travel, Hospitality

Sample Receipt

Hong Kong SAR, China Govt Consolidated Acc: ytd: Repayment of Bonds and...

Event Graph of BPI Challenge 2019

European State Finance Database; Seventeenth Century French Revenues and...

IDSEM Dataset

Invoice Management Dataset

Hong Kong SAR, China Govt Consolidated Acc: CE: Interest & Expenses on Bonds...

European State Finance Database; French Revenues and Expenditure, Derived...

Jordan Tourism Receipts: Arab

Hong Kong SAR, China Govt Consolidated Acc: OR: IR: IT: SD: Contract Notes

Iraq IQ: BOP: Current Account: Personal Transfers: Receipts

OCR Receipts Text Detection - retail datasetSee More Versions

Photos of the receipts and text detection - ocr dataset

OCR Receipts from Grocery Stores Text Detection - retail dataset

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset

Dataset structure

Data Format

Classes:

Text Detection in the Receipts might be made in accordance with your requirements.

💴 Buy the Dataset: This is just an example of the data. Leave a request on https://trainingdata.pro/datasets to discuss your requirements, learn about the price and buy the dataset

TrainingData provides high-quality data annotation tailored to your needs

OCR Receipts Text Detection - retail dataset