Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset comprising 813 images of invoices and receipts of a private company in the Portuguese language. It also includes text files with the transcription of relevant fields for each document – seller name, seller address, seller tax identification, buyer tax identification, invoice date, invoice total amount, invoice tax amount, and document reference.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for Invoices (Sparrow)
This dataset contains 500 invoice documents annotated and processed to be ready for Donut ML model fine-tuning. Annotation and data preparation task was done by Katana ML team. Sparrow - open-source data extraction solution by Katana ML. Original dataset info: Kozłowski, Marek; Weichbroth, Paweł (2021), “Samples of electronic invoices”, Mendeley Data, V2, doi: 10.17632/tnj49gpmtz.2
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Invoice is a dataset for object detection tasks - it contains Tables annotations for 319 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database contains electricity bills related to energy consumption in Spanish households. The contents of bills are automatically generated following some statistics from official bodies. The main purpose of the dataset is for training machine learning algorithms, especially for designing new methods for extracting information from invoices. There are 86 different labels, which are related to several topics, such as the customer and marketer, the contract, energy consumption, or billing.
The total number of invoices is 75.000. The files are organized in two directories: a training directory, with six subdirectories, each containing 5.000 invoices in PDF format and the corresponding labels in JSON files; and a test directory, with nine subdirectories, each containing 5.000 invoices in PDF format.
There are two main zip files that contain the test and training sets (test.zip and training.zip). In addition, we have included separate files with a subset of the directories in each set, so it can be downloaded by parts. There is also a reduced version of the dataset with 100 invoices per directory, which is interesting for users who want to preview the content of the dataset before downloading it.
IDSEM is an acronym for "an Invoices Database for the Spanish Electricity Market". More information can be found at https://idsem.ulpgc.es/ and in the following article:
[1] Javier Sánchez, Agustín Salgado, Alejandro García, and Nelson Monzón, "IDSEM, an invoices database of the Spanish electricity market", Sci. Data, (2022).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Inoices Sample Dataset
This is a sample dataset generated on app.parsee.ai for invoices. The goal was to evaluate different LLMs on this RAG task using the Parsee evaluation tools. A full study can be found here: https://github.com/parsee-ai/parsee-datasets/blob/main/datasets/invoices/parsee-loader/README.md parsee-core version used: 0.1.3.11 This dataset was created on the basis of 15 sample invoices (PDF files). All PDF files are publicly accessible on parsee.ai, to access them… See the full description on the dataset page: https://huggingface.co/datasets/parsee-ai/invoices-example.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The Grocery Store Receipts Dataset is a collection of photos captured from various grocery store receipts. This dataset is specifically designed for tasks related to Optical Character Recognition (OCR) and is useful for retail.
Each image in the dataset is accompanied by bounding box annotations, indicating the precise locations of specific text segments on the receipts. The text segments are categorized into four classes: item, store, date_time and total.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F4d5c600731265119bb28668959d5c357%2FFrame%2016.png?generation=1695111877176656&alt=media" alt="">
Each image from images
folder is accompanied by an XML-annotation in the annotations.xml
file indicating the coordinates of the bounding boxes and detected text . For each point, the x and y coordinates are provided.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F62643adde75dd6ca4e3f26909174ae40%2Fcarbon.png?generation=1695112527839805&alt=media" alt="">
keywords: receipts reading, retail dataset, consumer goods dataset, grocery store dataset, supermarket dataset, deep learning, retail store management, pre-labeled dataset, annotations, text detection, text recognition, optical character recognition, document text recognition, detecting text-lines, object detection, scanned documents, deep-text-recognition, text area detection, text extraction, images dataset, image-to-text, object detection
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Tax Invoice is a dataset for object detection tasks - it contains Tax Invoices annotations for 773 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
It is a collection of all e-invoices received by public sector entities for payment. The collection contains envelopes of e-invoices and an attachment to the e-style. E-invoice metadata are available for download.
Consists of a dataset with 1000 whole scanned receipt images and annotations for the competition on scanned receipts OCR and key information extraction (SROIE).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Electronic Invoice is a dataset for object detection tasks - it contains Invoice annotations for 217 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
## Overview
Fold Invoice is a dataset for object detection tasks - it contains Paper annotations for 5,939 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
R1 Freight Invoice is a dataset for object detection tasks - it contains Invoice annotations for 800 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Invoices is a dataset for classification tasks - it contains Invoices annotations for 900 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
ReceiptSense: Beyond Traditional OCR - A Dataset for Receipt Understanding
🔥 News
[2024] ReceiptSense dataset is now publicly available! [2024] Paper accepted and published
📖 Abstract
Multilingual OCR and information extraction from receipts remains challenging, particularly for complex scripts like Arabic. We introduce ReceiptSense, a comprehensive dataset designed for Arabic-English receipt understanding comprising:
20,000 annotated receipts… See the full description on the dataset page: https://huggingface.co/datasets/abdoelsayed/CORU.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Invoice Processing V2 is a dataset for object detection tasks - it contains Invoice Lines annotations for 1,000 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Patients Table:
This table stores information about individual patients, including their names and contact details.
Doctors Table:
This table contains details about healthcare providers, including their names, specializations, and contact information.
Appointments Table:
This table records scheduled appointments, linking patients to doctors.
MedicalProcedure Table:
This table stores details about medical procedures associated with specific appointments.
Billing Table:
This table maintains records of billing transactions, associating them with specific patients.
demo Table:
This table appears to be a demonstration or testing table, possibly unrelated to the healthcare management system.
This dataset schema is designed to capture comprehensive information about patients, doctors, appointments, medical procedures, and billing transactions in a healthcare management system. Adjustments can be made based on specific requirements, and additional attributes can be included as needed.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Dataset Card for "billsum"
Dataset Summary
BillSum, summarization of US Congressional and California state bills. There are several features:
text: bill text. summary: summary of the bills. title: title of the bills. features for us bills. ca bills does not have. text_len: number of chars in text. sum_len: number of chars in summary.
Supported Tasks and Leaderboards
More Information Needed
Languages
More Information Needed
Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/FiscalNote/billsum.
NIO’s percentage of invoices paid within 5 days and within 30 days of receipt. The data is in yearly quarters.
The data covers:
This data is also available on https://www.data.gov.uk/dataset/ace2523b-46de-465a-ad3f-3fe356003995/northern-ireland-office-nio-prompt-payment-data" class="govuk-link">data.gov.uk
Market Inside is a global leader in providing import-export data and analytics for the major industries and markets. We accelerate business progress by delivering essential intelligence that unlocks opportunities and fosters growth.
Our database contains: • 220+ Countries’ Global Trade Data • 2+ Billion Importer-Exporter Shipment Records • 100+ Million Import-Export Companies • 40+ Million Decision Maker Direct Phone Numbers • 50+ Million Decision Maker Direct Email Addresses
By using our dashboard, customers can access: • Bill of lading data by HS Code or product description. • Product specifications like brand, model, type, etc. • Company information like name, size, location and so on. • Companies’ business information such as market share, industry, etc. • Contacts data including name & profile of employees and phone numbers & email addresses of key decision makers. • Location data like origin country, destination and port of loading & unloading. • Our import export data can be used by investment firms and banks, private equity firms, hedge funding corporations, government agencies, insurance firms, corporations, exporters and importers, environmental studies agencies, academics, logistics, pharma, FMCG, Consulting, Law Firms and other private players to track, analyze, research and gain better insights of global trade of more than 50M+ Commodities.
For data authenticity, we collect raw data from the most genuine sources like Federal & State Government, Embassies, Customs & Taxes Departments, Ministries, Ports Authorities and filter information to remove duplicates and make every detail clean and error-free. Our database is updated frequently and customers can directly view and download shipment records on keywords of their choice in seconds anywhere and anytime
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Invoice Receipt is a dataset for object detection tasks - it contains Invoice annotations for 544 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset comprising 813 images of invoices and receipts of a private company in the Portuguese language. It also includes text files with the transcription of relevant fields for each document – seller name, seller address, seller tax identification, buyer tax identification, invoice date, invoice total amount, invoice tax amount, and document reference.