Facebook
TwitterThis dataset was created by Chris Chua
Facebook
TwitterThis dataset contains a list of sales and movement data by item and department appended monthly. Update Frequency : Monthly
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Standard error reference tables for the Retail Sales Index in Great Britain.
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Sample Sales Data is a retail sales dataset of 2,823 orders and 25 columns that includes a variety of sales-related data, including order numbers, product information, quantity, unit price, sales, order date, order status, customer and delivery information.
2) Data Utilization (1) Sample Sales Data has characteristics that: • This dataset consists of numerical (sales, quantity, unit price, etc.), categorical (product, country, city, customer name, transaction size, etc.), and date (order date) variables, with missing values in some columns (STATE, ADDRESSLINE2, POSTALCODE, etc.). (2) Sample Sales Data can be used to: • Analysis of sales trends and performance by product: Key variables such as order date, product line, and country can be used to visualize and analyze monthly and yearly sales trends, the proportion of sales by product line, and top sales by country and region. • Segmentation and marketing strategies: Segmentation of customer groups based on customer information, transaction size, and regional data, and use them to design targeted marketing and customized promotion strategies.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Dirty Retail Store Sales dataset contains 12,575 rows of synthetic data representing sales transactions from a retail store. The dataset includes eight product categories with 25 items per category, each having static prices. It is designed to simulate real-world sales data, including intentional "dirtiness" such as missing or inconsistent values. This dataset is suitable for practicing data cleaning, exploratory data analysis (EDA), and feature engineering.
retail_store_sales.csv| Column Name | Description | Example Values |
|---|---|---|
Transaction ID | A unique identifier for each transaction. Always present and unique. | TXN_1234567 |
Customer ID | A unique identifier for each customer. 25 unique customers. | CUST_01 |
Category | The category of the purchased item. | Food, Furniture |
Item | The name of the purchased item. May contain missing values or None. | Item_1_FOOD, None |
Price Per Unit | The static price of a single unit of the item. May contain missing or None values. | 4.00, None |
Quantity | The quantity of the item purchased. May contain missing or None values. | 1, None |
Total Spent | The total amount spent on the transaction. Calculated as Quantity * Price Per Unit. | 8.00, None |
Payment Method | The method of payment used. May contain missing or invalid values. | Cash, Credit Card |
Location | The location where the transaction occurred. May contain missing or invalid values. | In-store, Online |
Transaction Date | The date of the transaction. Always present and valid. | 2023-01-15 |
Discount Applied | Indicates if a discount was applied to the transaction. May contain missing values. | True, False, None |
The dataset includes the following categories, each containing 25 items with corresponding codes, names, and static prices:
| Item Code | Item Name | Price |
|---|---|---|
| Item_1_EHE | Blender | 5.0 |
| Item_2_EHE | Microwave | 6.5 |
| Item_3_EHE | Toaster | 8.0 |
| Item_4_EHE | Vacuum Cleaner | 9.5 |
| Item_5_EHE | Air Purifier | 11.0 |
| Item_6_EHE | Electric Kettle | 12.5 |
| Item_7_EHE | Rice Cooker | 14.0 |
| Item_8_EHE | Iron | 15.5 |
| Item_9_EHE | Ceiling Fan | 17.0 |
| Item_10_EHE | Table Fan | 18.5 |
| Item_11_EHE | Hair Dryer | 20.0 |
| Item_12_EHE | Heater | 21.5 |
| Item_13_EHE | Humidifier | 23.0 |
| Item_14_EHE | Dehumidifier | 24.5 |
| Item_15_EHE | Coffee Maker | 26.0 |
| Item_16_EHE | Portable AC | 27.5 |
| Item_17_EHE | Electric Stove | 29.0 |
| Item_18_EHE | Pressure Cooker | 30.5 |
| Item_19_EHE | Induction Cooktop | 32.0 |
| Item_20_EHE | Water Dispenser | 33.5 |
| Item_21_EHE | Hand Blender | 35.0 |
| Item_22_EHE | Mixer Grinder | 36.5 |
| Item_23_EHE | Sandwich Maker | 38.0 |
| Item_24_EHE | Air Fryer | 39.5 |
| Item_25_EHE | Juicer | 41.0 |
| Item Code | Item Name | Price |
|---|---|---|
| Item_1_FUR | Office Chair | 5.0 |
| Item_2_FUR | Sofa | 6.5 |
| Item_3_FUR | Coffee Table | 8.0 |
| Item_4_FUR | Dining Table | 9.5 |
| Item_5_FUR | Bookshelf | 11.0 |
| Item_6_FUR | Bed F... |
Facebook
TwitterThis is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1.Introduction
Sales data collection is a crucial aspect of any manufacturing industry as it provides valuable insights about the performance of products, customer behaviour, and market trends. By gathering and analysing this data, manufacturers can make informed decisions about product development, pricing, and marketing strategies in Internet of Things (IoT) business environments like the dairy supply chain.
One of the most important benefits of the sales data collection process is that it allows manufacturers to identify their most successful products and target their efforts towards those areas. For example, if a manufacturer could notice that a particular product is selling well in a certain region, this information could be utilised to develop new products, optimise the supply chain or improve existing ones to meet the changing needs of customers.
This dataset includes information about 7 of MEVGAL’s products [1]. According to the above information the data published will help researchers to understand the dynamics of the dairy market and its consumption patterns, which is creating the fertile ground for synergies between academia and industry and eventually help the industry in making informed decisions regarding product development, pricing and market strategies in the IoT playground. The use of this dataset could also aim to understand the impact of various external factors on the dairy market such as the economic, environmental, and technological factors. It could help in understanding the current state of the dairy industry and identifying potential opportunities for growth and development.
Please cite the following papers when using this dataset:
I. Siniosoglou, K. Xouveroudis, V. Argyriou, T. Lagkas, S. K. Goudos, K. E. Psannis and P. Sarigiannidis, "Evaluating the Effect of Volatile Federated Timeseries on Modern DNNs: Attention over Long/Short Memory," in the 12th International Conference on Circuits and Systems Technologies (MOCAST 2023), April 2023, Accepted
The dataset includes data regarding the daily sales of a series of dairy product codes offered by MEVGAL. In particular, the dataset includes information gathered by the logistics division and agencies within the industrial infrastructures overseeing the production of each product code. The products included in this dataset represent the daily sales and logistics of a variety of yogurt-based stock. Each of the different files include the logistics for that product on a daily basis for three years, from 2020 to 2022.
3.1 Data Collection
The process of building this dataset involves several steps to ensure that the data is accurate, comprehensive and relevant.
The first step is to determine the specific data that is needed to support the business objectives of the industry, i.e., in this publication’s case the daily sales data.
Once the data requirements have been identified, the next step is to implement an effective sales data collection method. In MEVGAL’s case this is conducted through direct communication and reports generated each day by representatives & selling points.
It is also important for MEVGAL to ensure that the data collection process conducted is in an ethical and compliant manner, adhering to data privacy laws and regulation. The industry also has a data management plan in place to ensure that the data is securely stored and protected from unauthorised access.
The published dataset is consisted of 13 features providing information about the date and the number of products that have been sold. Finally, the dataset was anonymised in consideration to the privacy requirement of the data owner (MEVGAL).
File
Period
Number of Samples (days)
product 1 2020.xlsx
01/01/2020–31/12/2020
363
product 1 2021.xlsx
01/01/2021–31/12/2021
364
product 1 2022.xlsx
01/01/2022–31/12/2022
365
product 2 2020.xlsx
01/01/2020–31/12/2020
363
product 2 2021.xlsx
01/01/2021–31/12/2021
364
product 2 2022.xlsx
01/01/2022–31/12/2022
365
product 3 2020.xlsx
01/01/2020–31/12/2020
363
product 3 2021.xlsx
01/01/2021–31/12/2021
364
product 3 2022.xlsx
01/01/2022–31/12/2022
365
product 4 2020.xlsx
01/01/2020–31/12/2020
363
product 4 2021.xlsx
01/01/2021–31/12/2021
364
product 4 2022.xlsx
01/01/2022–31/12/2022
364
product 5 2020.xlsx
01/01/2020–31/12/2020
363
product 5 2021.xlsx
01/01/2021–31/12/2021
364
product 5 2022.xlsx
01/01/2022–31/12/2022
365
product 6 2020.xlsx
01/01/2020–31/12/2020
362
product 6 2021.xlsx
01/01/2021–31/12/2021
364
product 6 2022.xlsx
01/01/2022–31/12/2022
365
product 7 2020.xlsx
01/01/2020–31/12/2020
362
product 7 2021.xlsx
01/01/2021–31/12/2021
364
product 7 2022.xlsx
01/01/2022–31/12/2022
365
3.2 Dataset Overview
The following table enumerates and explains the features included across all of the included files.
Feature
Description
Unit
Day
day of the month
-
Month
Month
-
Year
Year
-
daily_unit_sales
Daily sales - the amount of products, measured in units, that during that specific day were sold
units
previous_year_daily_unit_sales
Previous Year’s sales - the amount of products, measured in units, that during that specific day were sold the previous year
units
percentage_difference_daily_unit_sales
The percentage difference between the two above values
%
daily_unit_sales_kg
The amount of products, measured in kilograms, that during that specific day were sold
kg
previous_year_daily_unit_sales_kg
Previous Year’s sales - the amount of products, measured in kilograms, that during that specific day were sold, the previous year
kg
percentage_difference_daily_unit_sales_kg
The percentage difference between the two above values
kg
daily_unit_returns_kg
The percentage of the products that were shipped to selling points and were returned
%
previous_year_daily_unit_returns_kg
The percentage of the products that were shipped to selling points and were returned the previous year
%
points_of_distribution
The amount of sales representatives through which the product was sold to the market for this year
previous_year_points_of_distribution
The amount of sales representatives through which the product was sold to the market for the same day for the previous year
Table 1 – Dataset Feature Description
4.1 Dataset Structure
The provided dataset has the following structure:
Where:
Name
Type
Property
Readme.docx
Report
A File that contains the documentation of the Dataset.
product X
Folder
A folder containing the data of a product X.
product X YYYY.xlsx
Data file
An excel file containing the sales data of product X for year YYYY.
Table 2 - Dataset File Description
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 957406 (TERMINET).
References
[1] MEVGAL is a Greek dairy production company
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Sales Data Description This dataset represents synthetic sales data generated for practice purposes only. It is not real-time or based on actual business operations, and should be used solely for educational or testing purposes. The dataset contains information that simulates sales transactions across different products, regions, and customers. Each row represents an individual sale event with various details associated with it.
Columns in the Dataset
Disclaimer
Please note: This data was randomly generated and is intended solely for practice, learning, or testing. It does not reflect real-world sales, customers, or businesses, and should not be considered reliable for any real-time analysis or decision-making.
Facebook
TwitterThe annual Retail store data CD-ROM is an easy-to-use tool for quickly discovering retail trade patterns and trends. The current product presents results from the 1999 and 2000 Annual Retail Store and Annual Retail Chain surveys. This product contains numerous cross-classified data tables using the North American Industry Classification System (NAICS). The data tables provide access to a wide range of financial variables, such as revenues, expenses, inventory, sales per square footage (chain stores only) and the number of stores. Most data tables contain detailed information on industry (as low as 5-digit NAICS codes), geography (Canada, provinces and territories) and store type (chains, independents, franchises). The electronic product also contains survey metadata, questionnaires, information on industry codes and definitions, and the list of retail chain store respondents.
Facebook
TwitterAttribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
Nothing ever becomes real till it is experienced.
-John Keats
While we don't know the context in which John Keats mentioned this, we are sure about its implication in data science. While you would have enjoyed and gained exposure to real world problems in this challenge, here is another opportunity to get your hand dirty with this practice problem.
Problem Statement :
The data scientists at BigMart have collected 2013 sales data for 1559 products across 10 stores in different cities. Also, certain attributes of each product and store have been defined. The aim is to build a predictive model and find out the sales of each product at a particular store.
Using this model, BigMart will try to understand the properties of products and stores which play a key role in increasing sales.
Please note that the data may have missing values as some stores might not report all the data due to technical glitches. Hence, it will be required to treat them accordingly.
Data :
We have 14204 samples in data set.
Variable Description
Item Identifier: A code provided for the item of sale
Item Weight: Weight of item
Item Fat Content: A categorical column of how much fat is present in the item: ‘Low Fat’, ‘Regular’, ‘low fat’, ‘LF’, ‘reg’
Item Visibility: Numeric value for how visible the item is
Item Type: What category does the item belong to: ‘Dairy’, ‘Soft Drinks’, ‘Meat’, ‘Fruits and Vegetables’, ‘Household’, ‘Baking Goods’, ‘Snack Foods’, ‘Frozen Foods’, ‘Breakfast’, ’Health and Hygiene’, ‘Hard Drinks’, ‘Canned’, ‘Breads’, ‘Starchy Foods’, ‘Others’, ‘Seafood’.
Item MRP: The MRP price of item
Outlet Identifier: Which outlet was the item sold. This will be categorical column
Outlet Establishment Year: Which year was the outlet established
Outlet Size: A categorical column to explain size of outlet: ‘Medium’, ‘High’, ‘Small’.
Outlet Location Type: A categorical column to describe the location of the outlet: ‘Tier 1’, ‘Tier 2’, ‘Tier 3’
Outlet Type: Categorical column for type of outlet: ‘Supermarket Type1’, ‘Supermarket Type2’, ‘Supermarket Type3’, ‘Grocery Store’
Item Outlet Sales: The number of sales for an item.
Evaluation Metric:
We will use the Root Mean Square Error value to judge your response
Facebook
TwitterThis dataset was created by Murat Mutlu
Released under Data files © Original Authors
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Online Sales Dataset provides a detailed overview of global online sales transactions across various product categories. It includes transaction details such as order ID, date, product category, product name, quantity, unit price, total price, region, and payment method.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Dirty Cafe Sales dataset contains 10,000 rows of synthetic data representing sales transactions in a cafe. This dataset is intentionally "dirty," with missing values, inconsistent data, and errors introduced to provide a realistic scenario for data cleaning and exploratory data analysis (EDA). It can be used to practice cleaning techniques, data wrangling, and feature engineering.
dirty_cafe_sales.csv| Column Name | Description | Example Values |
|---|---|---|
Transaction ID | A unique identifier for each transaction. Always present and unique. | TXN_1234567 |
Item | The name of the item purchased. May contain missing or invalid values (e.g., "ERROR"). | Coffee, Sandwich |
Quantity | The quantity of the item purchased. May contain missing or invalid values. | 1, 3, UNKNOWN |
Price Per Unit | The price of a single unit of the item. May contain missing or invalid values. | 2.00, 4.00 |
Total Spent | The total amount spent on the transaction. Calculated as Quantity * Price Per Unit. | 8.00, 12.00 |
Payment Method | The method of payment used. May contain missing or invalid values (e.g., None, "UNKNOWN"). | Cash, Credit Card |
Location | The location where the transaction occurred. May contain missing or invalid values. | In-store, Takeaway |
Transaction Date | The date of the transaction. May contain missing or incorrect values. | 2023-01-01 |
Missing Values:
Item, Payment Method, Location) may contain missing values represented as None or empty cells.Invalid Values:
"ERROR" or "UNKNOWN" to simulate real-world data issues.Price Consistency:
The dataset includes the following menu items with their respective price ranges:
| Item | Price($) |
|---|---|
| Coffee | 2 |
| Tea | 1.5 |
| Sandwich | 4 |
| Salad | 5 |
| Cake | 3 |
| Cookie | 1 |
| Smoothie | 4 |
| Juice | 3 |
This dataset is suitable for: - Practicing data cleaning techniques such as handling missing values, removing duplicates, and correcting invalid entries. - Exploring EDA techniques like visualizations and summary statistics. - Performing feature engineering for machine learning workflows.
To clean this dataset, consider the following steps: 1. Handle Missing Values: - Fill missing numeric values with the median or mean. - Replace missing categorical values with the mode or "Unknown."
Handle Invalid Values:
"ERROR" and "UNKNOWN" with NaN or appropriate values.Date Consistency:
Feature Engineering:
Day of the Week or Transaction Month, for further analysis.This dataset is released under the CC BY-SA 4.0 License. You are free to use, share, and adapt it, provided you give appropriate credit.
If you have any questions or feedback, feel free to reach out through the dataset's discussion board on Kaggle.
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Power BI Sample Data is a financial sample dataset provided for Power BI practice and data visualization exercises that includes a variety of financial metrics and transaction information, including sales, profits, and expenses.
2) Data Utilization (1) Power BI Sample Data has characteristics that: • This dataset consists of numerical and categorical variables such as transaction date, region, product category, sales, profit, and cost, optimized for aggregation, analysis, and visualization. (2) Power BI Sample Data can be used to: • Revenue and Revenue Analysis: Analyze sales and profit data by region, product, and period to understand business performance and trends. • Power BI Dashboard Practice: Utilize a variety of financial metrics and transaction data to design and practice dashboards, reports, visualization charts, and more directly at Power BI.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is based on the Sample Leads Dataset and is intended to allow some simple filtering by lead source. I had modified this dataset to support an upcoming Towards Data Science article walking through the process. Link to be shared once published.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was generated for use with Nile's Sales Assistant example: https://github.com/niledatabase/niledatabase/tree/main/examples/ai/sales_insight It includes:
Simulated sales conversations for 5 different fictional companies. Chunked and embedded version of these conversations (embeddings use OpenAI's text-embedding-3-small model).
The chunks and embeddings can be directly loaded to a vector databases and searched using vector similarity methods. The example's ./ingest directory… See the full description on the dataset page: https://huggingface.co/datasets/gwenshap/sales-transcripts.
Facebook
Twitterhttps://www.ycharts.com/termshttps://www.ycharts.com/terms
View monthly updates and historical trends for US Retail Sales. from United States. Source: Census Bureau. Track economic data with YCharts analytics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Existing Home Sales in the United States increased to 4100 Thousand in October from 4050 Thousand in September of 2025. This dataset provides the latest reported value for - United States Existing Home Sales - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Canadian Sales of goods manufactured (shipments), new orders, unfilled orders, inventories, raw materials, goods or work in process, finished goods, and inventory to sales ratios for durable and non-durable goods by North American Industry Classification System (NAICS) for reference periods January 2002 to the current reference month. Not all combinations are available. Values are in constant dollars.
Facebook
TwitterThis dataset was created by Chris Chua