100+ datasets found

Sample Sales Data (5 million transactions)
kaggle.com
zip
Updated Jul 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chris Chua (2021). Sample Sales Data (5 million transactions) [Dataset]. https://www.kaggle.com/datasets/weitat/sample-sales
Explore at:
zip(201186399 bytes)Available download formats
Dataset updated
Jul 8, 2021
Authors
Chris Chua
Description
Dataset

This dataset was created by Chris Chua

Contents
d
Warehouse and Retail Sales
catalog.data.gov
data.montgomerycountymd.gov
+4more
Updated Nov 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.montgomerycountymd.gov (2025). Warehouse and Retail Sales [Dataset]. https://catalog.data.gov/dataset/warehouse-and-retail-sales
Explore at:
Dataset updated
Nov 8, 2025
Dataset provided by
data.montgomerycountymd.gov
Description
This dataset contains a list of sales and movement data by item and department appended monthly. Update Frequency : Monthly
o
Retail sales quality tables
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2025). Retail sales quality tables [Dataset]. https://www.ons.gov.uk/businessindustryandtrade/retailindustry/datasets/retailsalesqualitytables
Explore at:
xlsxAvailable download formats
Dataset updated
Nov 21, 2025
Dataset provided by
Office for National Statistics
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Standard error reference tables for the Retail Sales Index in Great Britain.
c
Sample Sales Dataset
cubig.ai
zip
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Sample Sales Dataset [Dataset]. https://cubig.ai/store/products/477/sample-sales-dataset
Explore at:
zipAvailable download formats
Dataset updated
Jun 15, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Sample Sales Data is a retail sales dataset of 2,823 orders and 25 columns that includes a variety of sales-related data, including order numbers, product information, quantity, unit price, sales, order date, order status, customer and delivery information.

2) Data Utilization (1) Sample Sales Data has characteristics that: • This dataset consists of numerical (sales, quantity, unit price, etc.), categorical (product, country, city, customer name, transaction size, etc.), and date (order date) variables, with missing values in some columns (STATE, ADDRESSLINE2, POSTALCODE, etc.). (2) Sample Sales Data can be used to: • Analysis of sales trends and performance by product: Key variables such as order date, product line, and country can be used to visualize and analyze monthly and yearly sales trends, the proportion of sales by product line, and top sales by country and region. • Segmentation and marketing strategies: Segmentation of customer groups based on customer information, transaction size, and regional data, and use them to design targeted marketing and customized promotion strategies.

Retail Store Sales: Dirty for Data Cleaning

kaggle.com

zip

Updated Jan 18, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Ahmed Mohamed (2025). Retail Store Sales: Dirty for Data Cleaning [Dataset]. https://www.kaggle.com/datasets/ahmedmohamed2003/retail-store-sales-dirty-for-data-cleaning

Explore at:

zip(226740 bytes)Available download formats

Dataset updated

Jan 18, 2025

Authors

Ahmed Mohamed

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dirty Retail Store Sales Dataset

Overview

The Dirty Retail Store Sales dataset contains 12,575 rows of synthetic data representing sales transactions from a retail store. The dataset includes eight product categories with 25 items per category, each having static prices. It is designed to simulate real-world sales data, including intentional "dirtiness" such as missing or inconsistent values. This dataset is suitable for practicing data cleaning, exploratory data analysis (EDA), and feature engineering.

File Information

File Name: retail_store_sales.csv
Number of Rows: 12,575
Number of Columns: 11

Columns Description

Column Name	Description	Example Values
`Transaction ID`	A unique identifier for each transaction. Always present and unique.	`TXN_1234567`
`Customer ID`	A unique identifier for each customer. 25 unique customers.	`CUST_01`
`Category`	The category of the purchased item.	`Food`, `Furniture`
`Item`	The name of the purchased item. May contain missing values or `None`.	`Item_1_FOOD`, `None`
`Price Per Unit`	The static price of a single unit of the item. May contain missing or `None` values.	`4.00`, `None`
`Quantity`	The quantity of the item purchased. May contain missing or `None` values.	`1`, `None`
`Total Spent`	The total amount spent on the transaction. Calculated as `Quantity * Price Per Unit`.	`8.00`, `None`
`Payment Method`	The method of payment used. May contain missing or invalid values.	`Cash`, `Credit Card`
`Location`	The location where the transaction occurred. May contain missing or invalid values.	`In-store`, `Online`
`Transaction Date`	The date of the transaction. Always present and valid.	`2023-01-15`
`Discount Applied`	Indicates if a discount was applied to the transaction. May contain missing values.	`True`, `False`, `None`

Categories and Items

The dataset includes the following categories, each containing 25 items with corresponding codes, names, and static prices:

Electric Household Essentials

Item Code	Item Name	Price
Item_1_EHE	Blender	5.0
Item_2_EHE	Microwave	6.5
Item_3_EHE	Toaster	8.0
Item_4_EHE	Vacuum Cleaner	9.5
Item_5_EHE	Air Purifier	11.0
Item_6_EHE	Electric Kettle	12.5
Item_7_EHE	Rice Cooker	14.0
Item_8_EHE	Iron	15.5
Item_9_EHE	Ceiling Fan	17.0
Item_10_EHE	Table Fan	18.5
Item_11_EHE	Hair Dryer	20.0
Item_12_EHE	Heater	21.5
Item_13_EHE	Humidifier	23.0
Item_14_EHE	Dehumidifier	24.5
Item_15_EHE	Coffee Maker	26.0
Item_16_EHE	Portable AC	27.5
Item_17_EHE	Electric Stove	29.0
Item_18_EHE	Pressure Cooker	30.5
Item_19_EHE	Induction Cooktop	32.0
Item_20_EHE	Water Dispenser	33.5
Item_21_EHE	Hand Blender	35.0
Item_22_EHE	Mixer Grinder	36.5
Item_23_EHE	Sandwich Maker	38.0
Item_24_EHE	Air Fryer	39.5
Item_25_EHE	Juicer	41.0

Furniture

Item Code	Item Name	Price
Item_1_FUR	Office Chair	5.0
Item_2_FUR	Sofa	6.5
Item_3_FUR	Coffee Table	8.0
Item_4_FUR	Dining Table	9.5
Item_5_FUR	Bookshelf	11.0
Item_6_FUR	Bed F...

New 1000 Sales Records Data 2
kaggle.com
zip
Updated Jan 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Calvin Oko Mensah (2023). New 1000 Sales Records Data 2 [Dataset]. https://www.kaggle.com/datasets/calvinokomensah/new-1000-sales-records-data-2
Explore at:
zip(49305 bytes)Available download formats
Dataset updated
Jan 12, 2023
Authors
Calvin Oko Mensah
Description
This is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.
Z
Dairy Supply Chain Sales Dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dimitris Iatropoulos; Konstantinos Georgakidis; Ilias Siniosoglou; Christos Chaschatzis; Anna Triantafyllou; Athanasios Liatifis; Dimitrios Pliatsios; Thomas Lagkas; Vasileios Argyriou; Panagiotis Sarigiannidis (2024). Dairy Supply Chain Sales Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7853252
Explore at:
Dataset updated
Jul 12, 2024
Authors
Dimitris Iatropoulos; Konstantinos Georgakidis; Ilias Siniosoglou; Christos Chaschatzis; Anna Triantafyllou; Athanasios Liatifis; Dimitrios Pliatsios; Thomas Lagkas; Vasileios Argyriou; Panagiotis Sarigiannidis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
1.Introduction

Sales data collection is a crucial aspect of any manufacturing industry as it provides valuable insights about the performance of products, customer behaviour, and market trends. By gathering and analysing this data, manufacturers can make informed decisions about product development, pricing, and marketing strategies in Internet of Things (IoT) business environments like the dairy supply chain.

One of the most important benefits of the sales data collection process is that it allows manufacturers to identify their most successful products and target their efforts towards those areas. For example, if a manufacturer could notice that a particular product is selling well in a certain region, this information could be utilised to develop new products, optimise the supply chain or improve existing ones to meet the changing needs of customers.

This dataset includes information about 7 of MEVGAL’s products [1]. According to the above information the data published will help researchers to understand the dynamics of the dairy market and its consumption patterns, which is creating the fertile ground for synergies between academia and industry and eventually help the industry in making informed decisions regarding product development, pricing and market strategies in the IoT playground. The use of this dataset could also aim to understand the impact of various external factors on the dairy market such as the economic, environmental, and technological factors. It could help in understanding the current state of the dairy industry and identifying potential opportunities for growth and development.

Citation

Please cite the following papers when using this dataset:

I. Siniosoglou, K. Xouveroudis, V. Argyriou, T. Lagkas, S. K. Goudos, K. E. Psannis and P. Sarigiannidis, "Evaluating the Effect of Volatile Federated Timeseries on Modern DNNs: Attention over Long/Short Memory," in the 12th International Conference on Circuits and Systems Technologies (MOCAST 2023), April 2023, Accepted

Dataset Modalities

The dataset includes data regarding the daily sales of a series of dairy product codes offered by MEVGAL. In particular, the dataset includes information gathered by the logistics division and agencies within the industrial infrastructures overseeing the production of each product code. The products included in this dataset represent the daily sales and logistics of a variety of yogurt-based stock. Each of the different files include the logistics for that product on a daily basis for three years, from 2020 to 2022.

3.1 Data Collection

The process of building this dataset involves several steps to ensure that the data is accurate, comprehensive and relevant.

The first step is to determine the specific data that is needed to support the business objectives of the industry, i.e., in this publication’s case the daily sales data.

Once the data requirements have been identified, the next step is to implement an effective sales data collection method. In MEVGAL’s case this is conducted through direct communication and reports generated each day by representatives & selling points.

It is also important for MEVGAL to ensure that the data collection process conducted is in an ethical and compliant manner, adhering to data privacy laws and regulation. The industry also has a data management plan in place to ensure that the data is securely stored and protected from unauthorised access.

The published dataset is consisted of 13 features providing information about the date and the number of products that have been sold. Finally, the dataset was anonymised in consideration to the privacy requirement of the data owner (MEVGAL).

File

Period

Number of Samples (days)

product 1 2020.xlsx

01/01/2020–31/12/2020

363

product 1 2021.xlsx

01/01/2021–31/12/2021

364

product 1 2022.xlsx

01/01/2022–31/12/2022

365

product 2 2020.xlsx

01/01/2020–31/12/2020

363

product 2 2021.xlsx

01/01/2021–31/12/2021

364

product 2 2022.xlsx

01/01/2022–31/12/2022

365

product 3 2020.xlsx

01/01/2020–31/12/2020

363

product 3 2021.xlsx

01/01/2021–31/12/2021

364

product 3 2022.xlsx

01/01/2022–31/12/2022

365

product 4 2020.xlsx

01/01/2020–31/12/2020

363

product 4 2021.xlsx

01/01/2021–31/12/2021

364

product 4 2022.xlsx

01/01/2022–31/12/2022

364

product 5 2020.xlsx

01/01/2020–31/12/2020

363

product 5 2021.xlsx

01/01/2021–31/12/2021

364

product 5 2022.xlsx

01/01/2022–31/12/2022

365

product 6 2020.xlsx

01/01/2020–31/12/2020

362

product 6 2021.xlsx

01/01/2021–31/12/2021

364

product 6 2022.xlsx

01/01/2022–31/12/2022

365

product 7 2020.xlsx

01/01/2020–31/12/2020

362

product 7 2021.xlsx

01/01/2021–31/12/2021

364

product 7 2022.xlsx

01/01/2022–31/12/2022

365

3.2 Dataset Overview

The following table enumerates and explains the features included across all of the included files.

Feature

Description

Unit

Day

day of the month

-

Month

Month

-

Year

Year

-

daily_unit_sales

Daily sales - the amount of products, measured in units, that during that specific day were sold

units

previous_year_daily_unit_sales

Previous Year’s sales - the amount of products, measured in units, that during that specific day were sold the previous year

units

percentage_difference_daily_unit_sales

The percentage difference between the two above values

%

daily_unit_sales_kg

The amount of products, measured in kilograms, that during that specific day were sold

kg

previous_year_daily_unit_sales_kg

Previous Year’s sales - the amount of products, measured in kilograms, that during that specific day were sold, the previous year

kg

percentage_difference_daily_unit_sales_kg

The percentage difference between the two above values

kg

daily_unit_returns_kg

The percentage of the products that were shipped to selling points and were returned

%

previous_year_daily_unit_returns_kg

The percentage of the products that were shipped to selling points and were returned the previous year

%

points_of_distribution

The amount of sales representatives through which the product was sold to the market for this year

previous_year_points_of_distribution

The amount of sales representatives through which the product was sold to the market for the same day for the previous year

Table 1 – Dataset Feature Description

Structure and Format

4.1 Dataset Structure

The provided dataset has the following structure:

Where:

Name

Type

Property

Readme.docx

Report

A File that contains the documentation of the Dataset.

product X

Folder

A folder containing the data of a product X.

product X YYYY.xlsx

Data file

An excel file containing the sales data of product X for year YYYY.

Table 2 - Dataset File Description

Acknowledgement

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 957406 (TERMINET).

References

[1] MEVGAL is a Greek dairy production company
sales dataset
kaggle.com
zip
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VINOTH KANNA S (2025). sales dataset [Dataset]. https://www.kaggle.com/datasets/vinothkannaece/sales-dataset
Explore at:
zip(27634 bytes)Available download formats
Dataset updated
Feb 18, 2025
Authors
VINOTH KANNA S
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Sales Data Description This dataset represents synthetic sales data generated for practice purposes only. It is not real-time or based on actual business operations, and should be used solely for educational or testing purposes. The dataset contains information that simulates sales transactions across different products, regions, and customers. Each row represents an individual sale event with various details associated with it.

Columns in the Dataset

Product_ID: Unique identifier for each product sold. Randomly generated for practice purposes.

Sale_Date: The date when the sale occurred. Randomly selected from the year 2023.

Sales_Rep: The sales representative responsible for the transaction. The dataset includes five random sales representatives (Alice, Bob, Charlie, David, Eve).

Region: The region where the sale took place. The possible regions are North, South, East, and West.

Sales_Amount: The total sales amount for the transaction, including discounts if any. Values range from 100 to 10,000 (in currency units).

Quantity_Sold: The number of units sold in that transaction, randomly generated between 1 and 50.

Product_Category: The category of the product sold. Categories include Electronics, Furniture, Clothing, and Food.

Unit_Cost: The cost per unit of the product sold, randomly generated between 50 and 5000 currency units.

Unit_Price: The selling price per unit of the product, calculated to be higher than the unit cost.

Customer_Type: Indicates whether the customer is a New or Returning customer.

Discount: The discount applied to the sale, randomly chosen between 0% and 30%.

Payment_Method: The method of payment used by the customer (e.g., Credit Card, Cash, Bank Transfer).

Sales_Channel: The channel through which the sale occurred. Either Online or Retail.

Region_and_Sales_Rep: A combined column that pairs the region and sales representative for easier tracking.

Disclaimer

Please note: This data was randomly generated and is intended solely for practice, learning, or testing. It does not reflect real-world sales, customers, or businesses, and should not be considered reliable for any real-time analysis or decision-making.
S
Annual Retail Store Data, 2000 [Canada] [Excel]
dataverse.scholarsportal.info
borealisdata.ca
pdf, xls
Updated Nov 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scholars Portal Dataverse (2021). Annual Retail Store Data, 2000 [Canada] [Excel] [Dataset]. https://dataverse.scholarsportal.info/dataset.xhtml;jsessionid=1283d69ee2dd528c9011fe4a2fe3?persistentId=hdl%3A10864%2F11351&version=&q=&fileTypeGroupFacet=&fileAccess=&fileTag=%22Tables%22&fileSortField=&fileSortOrder=
Explore at:
xls(2165760), xls(29696), xls(2920448), pdf(76787), pdf(158404), xls(34816), xls(2754048), pdf(81084), pdf(71183), xls(34304), xls(625664), xls(2707968), xls(695808), pdf(70673), pdf(72585), xls(576512), xls(609792), xls(28672), pdf(60236), pdf(30338), pdf(87181), pdf(84140), pdf(92012), xls(610304), pdf(74439), xls(2471424), pdf(73788), xls(30208), pdf(74478), pdf(53645)Available download formats
Dataset updated
Nov 17, 2021
Dataset provided by
Scholars Portal Dataverse
Area covered
Canada, Canada
Description
The annual Retail store data CD-ROM is an easy-to-use tool for quickly discovering retail trade patterns and trends. The current product presents results from the 1999 and 2000 Annual Retail Store and Annual Retail Chain surveys. This product contains numerous cross-classified data tables using the North American Industry Classification System (NAICS). The data tables provide access to a wide range of financial variables, such as revenues, expenses, inventory, sales per square footage (chain stores only) and the number of stores. Most data tables contain detailed information on industry (as low as 5-digit NAICS codes), geography (Canada, provinces and territories) and store type (chains, independents, franchises). The electronic product also contains survey metadata, questionnaires, information on industry codes and definitions, and the list of retail chain store respondents.
Z
BigMart Retail Sales
data.niaid.nih.gov
Updated May 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataman (2022). BigMart Retail Sales [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6509954
Explore at:
Dataset updated
May 2, 2022
Authors
Dataman
License
Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
Description
Nothing ever becomes real till it is experienced.

-John Keats

While we don't know the context in which John Keats mentioned this, we are sure about its implication in data science. While you would have enjoyed and gained exposure to real world problems in this challenge, here is another opportunity to get your hand dirty with this practice problem.

Problem Statement :

The data scientists at BigMart have collected 2013 sales data for 1559 products across 10 stores in different cities. Also, certain attributes of each product and store have been defined. The aim is to build a predictive model and find out the sales of each product at a particular store.

Using this model, BigMart will try to understand the properties of products and stores which play a key role in increasing sales.

Please note that the data may have missing values as some stores might not report all the data due to technical glitches. Hence, it will be required to treat them accordingly.

Data :

We have 14204 samples in data set.

Variable Description

Item Identifier: A code provided for the item of sale

Item Weight: Weight of item

Item Fat Content: A categorical column of how much fat is present in the item: ‘Low Fat’, ‘Regular’, ‘low fat’, ‘LF’, ‘reg’

Item Visibility: Numeric value for how visible the item is

Item Type: What category does the item belong to: ‘Dairy’, ‘Soft Drinks’, ‘Meat’, ‘Fruits and Vegetables’, ‘Household’, ‘Baking Goods’, ‘Snack Foods’, ‘Frozen Foods’, ‘Breakfast’, ’Health and Hygiene’, ‘Hard Drinks’, ‘Canned’, ‘Breads’, ‘Starchy Foods’, ‘Others’, ‘Seafood’.

Item MRP: The MRP price of item

Outlet Identifier: Which outlet was the item sold. This will be categorical column

Outlet Establishment Year: Which year was the outlet established

Outlet Size: A categorical column to explain size of outlet: ‘Medium’, ‘High’, ‘Small’.

Outlet Location Type: A categorical column to describe the location of the outlet: ‘Tier 1’, ‘Tier 2’, ‘Tier 3’

Outlet Type: Categorical column for type of outlet: ‘Supermarket Type1’, ‘Supermarket Type2’, ‘Supermarket Type3’, ‘Grocery Store’

Item Outlet Sales: The number of sales for an item.

Evaluation Metric:

We will use the Root Mean Square Error value to judge your response
Electrical Product Sample Sales Data
kaggle.com
zip
Updated Jan 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murat Mutlu (2022). Electrical Product Sample Sales Data [Dataset]. https://www.kaggle.com/datasets/muratmutlubi/electrical-product-sample-sales-data
Explore at:
zip(43201106 bytes)Available download formats
Dataset updated
Jan 18, 2022
Authors
Murat Mutlu
Description
Dataset

This dataset was created by Murat Mutlu

Released under Data files © Original Authors

Contents
g
Online Sales Dataset
gts.ai
json
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). Online Sales Dataset [Dataset]. https://gts.ai/dataset-download/online-sales-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Jun 25, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Online Sales Dataset provides a detailed overview of global online sales transactions across various product categories. It includes transaction details such as order ID, date, product category, product name, quantity, unit price, total price, region, and payment method.
B
Data Cleaning Sample
borealisdata.ca
dataone.org
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.

Cafe Sales - Dirty Data for Cleaning Training

kaggle.com

zip

Updated Jan 17, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Ahmed Mohamed (2025). Cafe Sales - Dirty Data for Cleaning Training [Dataset]. https://www.kaggle.com/datasets/ahmedmohamed2003/cafe-sales-dirty-data-for-cleaning-training

Explore at:

zip(113510 bytes)Available download formats

Dataset updated

Jan 17, 2025

Authors

Ahmed Mohamed

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dirty Cafe Sales Dataset

Overview

The Dirty Cafe Sales dataset contains 10,000 rows of synthetic data representing sales transactions in a cafe. This dataset is intentionally "dirty," with missing values, inconsistent data, and errors introduced to provide a realistic scenario for data cleaning and exploratory data analysis (EDA). It can be used to practice cleaning techniques, data wrangling, and feature engineering.

File Information

File Name: dirty_cafe_sales.csv
Number of Rows: 10,000
Number of Columns: 8

Columns Description

Column Name	Description	Example Values
`Transaction ID`	A unique identifier for each transaction. Always present and unique.	`TXN_1234567`
`Item`	The name of the item purchased. May contain missing or invalid values (e.g., "ERROR").	`Coffee`, `Sandwich`
`Quantity`	The quantity of the item purchased. May contain missing or invalid values.	`1`, `3`, `UNKNOWN`
`Price Per Unit`	The price of a single unit of the item. May contain missing or invalid values.	`2.00`, `4.00`
`Total Spent`	The total amount spent on the transaction. Calculated as `Quantity * Price Per Unit`.	`8.00`, `12.00`
`Payment Method`	The method of payment used. May contain missing or invalid values (e.g., `None`, "UNKNOWN").	`Cash`, `Credit Card`
`Location`	The location where the transaction occurred. May contain missing or invalid values.	`In-store`, `Takeaway`
`Transaction Date`	The date of the transaction. May contain missing or incorrect values.	`2023-01-01`

Data Characteristics

Missing Values:
- Some columns (e.g., Item, Payment Method, Location) may contain missing values represented as None or empty cells.
Invalid Values:
- Some rows contain invalid entries like "ERROR" or "UNKNOWN" to simulate real-world data issues.
Price Consistency:
- Prices for menu items are consistent but may have missing or incorrect values introduced.

Menu Items

The dataset includes the following menu items with their respective price ranges:

Item	Price($)
Coffee	2
Tea	1.5
Sandwich	4
Salad	5
Cake	3
Cookie	1
Smoothie	4
Juice	3

Use Cases

This dataset is suitable for: - Practicing data cleaning techniques such as handling missing values, removing duplicates, and correcting invalid entries. - Exploring EDA techniques like visualizations and summary statistics. - Performing feature engineering for machine learning workflows.

Cleaning Steps Suggestions

To clean this dataset, consider the following steps: 1. Handle Missing Values: - Fill missing numeric values with the median or mean. - Replace missing categorical values with the mode or "Unknown."

Handle Invalid Values:
- Replace invalid entries like "ERROR" and "UNKNOWN" with NaN or appropriate values.
Date Consistency:
- Ensure all dates are in a consistent format.
- Fill missing dates with plausible values based on nearby records.
Feature Engineering:
- Create new columns, such as Day of the Week or Transaction Month, for further analysis.

License

This dataset is released under the CC BY-SA 4.0 License. You are free to use, share, and adapt it, provided you give appropriate credit.

Feedback

If you have any questions or feedback, feel free to reach out through the dataset's discussion board on Kaggle.

c
Power BI Sample Dataset
cubig.ai
zip
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Power BI Sample Dataset [Dataset]. https://cubig.ai/store/products/389/power-bi-sample-dataset
Explore at:
zipAvailable download formats
Dataset updated
May 29, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Power BI Sample Data is a financial sample dataset provided for Power BI practice and data visualization exercises that includes a variety of financial metrics and transaction information, including sales, profits, and expenses.

2) Data Utilization (1) Power BI Sample Data has characteristics that: • This dataset consists of numerical and categorical variables such as transaction date, region, product category, sales, profit, and cost, optimized for aggregation, analysis, and visualization. (2) Power BI Sample Data can be used to: • Revenue and Revenue Analysis: Analyze sales and profit data by region, product, and period to understand business performance and trends. • Power BI Dashboard Practice: Utilize a variety of financial metrics and transaction data to design and practice dashboards, reports, visualization charts, and more directly at Power BI.
Sample Leads Dataset
kaggle.com
zip
Updated Jun 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ThatSean (2022). Sample Leads Dataset [Dataset]. https://www.kaggle.com/datasets/thatsean/sample-leads-dataset
Explore at:
zip(22640 bytes)Available download formats
Dataset updated
Jun 24, 2022
Authors
ThatSean
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is based on the Sample Leads Dataset and is intended to allow some simple filtering by lead source. I had modified this dataset to support an upcoming Towards Data Science article walking through the process. Link to be shared once published.
h
sales-transcripts
huggingface.co
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gwen Shapira (2024). sales-transcripts [Dataset]. https://huggingface.co/datasets/gwenshap/sales-transcripts
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 24, 2024
Authors
Gwen Shapira
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset was generated for use with Nile's Sales Assistant example: https://github.com/niledatabase/niledatabase/tree/main/examples/ai/sales_insight It includes:

Simulated sales conversations for 5 different fictional companies. Chunked and embedded version of these conversations (embeddings use OpenAI's text-embedding-3-small model).

The chunks and embeddings can be directly loaded to a vector databases and searched using vector similarity methods. The example's ./ingest directory… See the full description on the dataset page: https://huggingface.co/datasets/gwenshap/sales-transcripts.
y
US Retail Sales
ycharts.com
html
Updated Sep 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Census Bureau (2025). US Retail Sales [Dataset]. https://ycharts.com/indicators/us_retail_sales
Explore at:
htmlAvailable download formats
Dataset updated
Sep 16, 2025
Dataset provided by
YCharts
Authors
Census Bureau
License
https://www.ycharts.com/termshttps://www.ycharts.com/terms
Time period covered
Jan 31, 1992 - Aug 31, 2025
Area covered
United States
Variables measured
US Retail Sales
Description
View monthly updates and historical trends for US Retail Sales. from United States. Source: Census Bureau. Track economic data with YCharts analytics.
T
United States Existing Home Sales
tradingeconomics.com
ru.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Nov 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). United States Existing Home Sales [Dataset]. https://tradingeconomics.com/united-states/existing-home-sales
Explore at:
csv, json, xml, excelAvailable download formats
Dataset updated
Nov 20, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 31, 1968 - Oct 31, 2025
Area covered
United States
Description
Existing Home Sales in the United States increased to 4100 Thousand in October from 4050 Thousand in September of 2025. This dataset provides the latest reported value for - United States Existing Home Sales - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
G
Real manufacturing sales, orders, inventory owned and inventory to sales...
open.canada.ca
www150.statcan.gc.ca
+1more
csv, html, xml
Updated Nov 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2025). Real manufacturing sales, orders, inventory owned and inventory to sales ratio, 2017 dollars, seasonally adjusted [Dataset]. https://open.canada.ca/data/dataset/7bf43dd1-af41-4c6f-871e-4c653aad27d0
Explore at:
csv, html, xmlAvailable download formats
Dataset updated
Nov 14, 2025
Dataset provided by
Statistics Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
Canadian Sales of goods manufactured (shipments), new orders, unfilled orders, inventories, raw materials, goods or work in process, finished goods, and inventory to sales ratios for durable and non-durable goods by North American Industry Classification System (NAICS) for reference periods January 2002 to the current reference month. Not all combinations are available. Values are in constant dollars.

Facebook

Twitter

Click to copy link

Link copied

Cite

Chris Chua (2021). Sample Sales Data (5 million transactions) [Dataset]. https://www.kaggle.com/datasets/weitat/sample-sales

Sample Sales Data (5 million transactions)

Explore at:

zip(201186399 bytes)Available download formats

Dataset updated

Jul 8, 2021

Authors

Chris Chua

Description

Dataset

This dataset was created by Chris Chua

Clear search

Close search

Google apps

Main menu

Sample Sales Data (5 million transactions)

Dataset

Contents

Warehouse and Retail Sales

Retail sales quality tables

Sample Sales Dataset

Retail Store Sales: Dirty for Data Cleaning

Dirty Retail Store Sales Dataset

Overview

File Information

Columns Description

Categories and Items

Electric Household Essentials

Furniture

New 1000 Sales Records Data 2

Dairy Supply Chain Sales Dataset

sales dataset

Annual Retail Store Data, 2000 [Canada] [Excel]

BigMart Retail Sales

Electrical Product Sample Sales Data

Dataset

Contents

Online Sales Dataset

Data Cleaning Sample

Cafe Sales - Dirty Data for Cleaning Training

Dirty Cafe Sales Dataset

Overview

File Information

Columns Description

Data Characteristics

Menu Items

Use Cases

Cleaning Steps Suggestions

License

Feedback

Power BI Sample Dataset

Sample Leads Dataset

sales-transcripts

US Retail Sales

United States Existing Home Sales

Real manufacturing sales, orders, inventory owned and inventory to sales...

Sample Sales Data (5 million transactions)

Dataset

Contents