Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The Orders database contains information on the following variables.
• Continuous variables: Row ID, Order ID, Order Date, Ship Date, Customer ID, Product ID, Sales, Quantity, Discount, Profit, Shipping Cost
• Categorical variables: Ship Mode, Customer Name, Segment, Postal Code, City, State, Country, Region, Market, Category, Subcategory, Product Name, Order Priority
The purpose of this project: 1. To use descriptive statistics methods to assess the sales performance across various segments, markets, product categories and subcategories; 2. To use diagnostic analytics methods to understand the statistical significance of the factors that influence sales; 3. Use predictive analytics (regression) to understand the strengths of the relationship between sales and sales drivers and generate a regression formula to predict sales 4. develop a sales forecasting model based on the insights.
Descriptive analytics
Descriptive statistics for sales
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F848f47b38b7f2360163bb2221703c658%2FPicture2.png?generation=1715109635788424&alt=media" alt="">
Frequency distribution for sales
Around 44,500 transactions of value >=USD 500.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F39cfd8ffd8fdf296300bb9f1fa5243e2%2FPicture3.png?generation=1715109667755923&alt=media" alt="">
Sales values across markets
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F3385959d11b6daafae24c848b4b00f13%2FPicture4.png?generation=1715109744629587&alt=media" alt="">
We see an increase in sales across all markets and throughout 2012-2015.
We have high sales volumes in the USCA and LATAM markets:
• USCA: USD 757,108 in 2015;
• LATAM: USD 706,632 in 2015.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F4aa59b5a5b980aad6873c8a4af4cd223%2FPicture1.png?generation=1715109770510368&alt=media" alt="">
Sales across product categories
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F867cbe622bf94d25a25a1c4b9281656d%2FPicture5.png?generation=1715109794950614&alt=media" alt="">
Office supplies were the largely sold product category in 2012-2015. Technology was the least sold product category by quantity. However, the Technology category yields high sales.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F5c74664f77cce2bc2f7c77c7b01e9890%2FPicture6.png?generation=1715109834309500&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2Fd3bb766183e9f58fbf009a998c01adf6%2FPicture7.png?generation=1715109872961254&alt=media" alt="">
Further analysis of profitable products reveals that phones and copiers demonstrate high sales.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F109c4c3eab81fa581c19a5c09beff839%2FPicture9.png?generation=1715109914590660&alt=media" alt="">
Sales across segments
The data reveals that there are high sales in the Consumer segment across all product categories.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F65075cc20028a37a1aff6932fa89d3d5%2FPicture10.png?generation=1715109992655572&alt=media" alt="">
Diagnostic analytics
Two sample T-test
Using a t-test, we can evaluate how sales differ across different segments, regions, and product types. T-test allows us to evaluate the statistical significance of sales samples.
The two-sample t-test of sales numbers across markets resulted in the statistical significance of sales in USCA and LATAM markets with p-values >0.05.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F7b7264d5f44a9a79b352028b28d1c618%2FPicture11.png?generation=1715110082746375&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F4061ef38ea83d7e3bbd252a802863e8f%2FPicture12.png?generation=1715110097203251&alt=media" alt="">
The two-sample t-test of sales numbers across product categories resulted in the statistical significance of sales in Office supplies and Technology categories with p-values >0.05.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2Fd9994377d605222d77ef67af3e273771%2FPicture13.png?generation=1715110126112322&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F669779e9aad19d51a28fb44e7c484bc7%2FPicture14.png?generation=1715110140543290&alt=media" alt="">
Pearson correlation The correlation of continuous values in the dataset allows us to see the relationship between sales, quantity sold, shipping costs and profit. . The data tables provide access to a wide range of financial variables, such as revenues, expenses, inventory, sales per square footage (chain stores only) and the number of stores. Most data tables contain detailed information on industry (as low as 5-digit NAICS codes), geography (Canada, provinces and territories) and store type (chains, independents, franchises). The electronic product also contains survey metadata, questionnaires, information on industry codes and definitions, and the list of retail chain store respondents.
Facebook
TwitterThe Adventure Works dataset is a comprehensive and widely used sample database provided by Microsoft for educational and testing purposes. It's designed to represent a fictional company, Adventure Works Cycles, which is a global manufacturer of bicycles and related products. The dataset is often used for learning and practicing various data management, analysis, and reporting skills.
1. Company Overview: - Industry: Bicycle manufacturing - Operations: Global presence with various departments such as sales, production, and human resources.
2. Data Structure: - Tables: The dataset includes a variety of tables, typically organized into categories such as: - Sales: Information about sales orders, products, and customer details. - Production: Data on manufacturing processes, inventory, and product specifications. - Human Resources: Employee details, departments, and job roles. - Purchasing: Vendor information and purchase orders.
3. Sample Tables: - Sales.SalesOrderHeader: Contains information about sales orders, including order dates, customer IDs, and total amounts. - Sales.SalesOrderDetail: Details of individual items within each sales order, such as product ID, quantity, and unit price. - Production.Product: Information about the products being manufactured, including product names, categories, and prices. - Production.ProductCategory: Data on product categories, such as bicycles and accessories. - Person.Person: Contains personal information about employees and contacts, including names and addresses. - Purchasing.Vendor: Information on vendors that supply the company with materials.
4. Usage: - Training and Education: It's widely used for teaching SQL, data analysis, and database management. - Testing and Demonstrations: Useful for testing software features and demonstrating data-related functionalities.
5. Tools: - The dataset is often used with Microsoft SQL Server, but it's also compatible with other relational database systems.
The Adventure Works dataset provides a rich and realistic environment for practicing a range of data-related tasks, from querying and reporting to data modeling and analysis.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
This dataset was created by Andrés Armando Sánchez Martín
Released under Community Data License Agreement - Sharing - Version 1.0
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1.Introduction
Sales data collection is a crucial aspect of any manufacturing industry as it provides valuable insights about the performance of products, customer behaviour, and market trends. By gathering and analysing this data, manufacturers can make informed decisions about product development, pricing, and marketing strategies in Internet of Things (IoT) business environments like the dairy supply chain.
One of the most important benefits of the sales data collection process is that it allows manufacturers to identify their most successful products and target their efforts towards those areas. For example, if a manufacturer could notice that a particular product is selling well in a certain region, this information could be utilised to develop new products, optimise the supply chain or improve existing ones to meet the changing needs of customers.
This dataset includes information about 7 of MEVGAL’s products [1]. According to the above information the data published will help researchers to understand the dynamics of the dairy market and its consumption patterns, which is creating the fertile ground for synergies between academia and industry and eventually help the industry in making informed decisions regarding product development, pricing and market strategies in the IoT playground. The use of this dataset could also aim to understand the impact of various external factors on the dairy market such as the economic, environmental, and technological factors. It could help in understanding the current state of the dairy industry and identifying potential opportunities for growth and development.
Please cite the following papers when using this dataset:
I. Siniosoglou, K. Xouveroudis, V. Argyriou, T. Lagkas, S. K. Goudos, K. E. Psannis and P. Sarigiannidis, "Evaluating the Effect of Volatile Federated Timeseries on Modern DNNs: Attention over Long/Short Memory," in the 12th International Conference on Circuits and Systems Technologies (MOCAST 2023), April 2023, Accepted
The dataset includes data regarding the daily sales of a series of dairy product codes offered by MEVGAL. In particular, the dataset includes information gathered by the logistics division and agencies within the industrial infrastructures overseeing the production of each product code. The products included in this dataset represent the daily sales and logistics of a variety of yogurt-based stock. Each of the different files include the logistics for that product on a daily basis for three years, from 2020 to 2022.
3.1 Data Collection
The process of building this dataset involves several steps to ensure that the data is accurate, comprehensive and relevant.
The first step is to determine the specific data that is needed to support the business objectives of the industry, i.e., in this publication’s case the daily sales data.
Once the data requirements have been identified, the next step is to implement an effective sales data collection method. In MEVGAL’s case this is conducted through direct communication and reports generated each day by representatives & selling points.
It is also important for MEVGAL to ensure that the data collection process conducted is in an ethical and compliant manner, adhering to data privacy laws and regulation. The industry also has a data management plan in place to ensure that the data is securely stored and protected from unauthorised access.
The published dataset is consisted of 13 features providing information about the date and the number of products that have been sold. Finally, the dataset was anonymised in consideration to the privacy requirement of the data owner (MEVGAL).
File
Period
Number of Samples (days)
product 1 2020.xlsx
01/01/2020–31/12/2020
363
product 1 2021.xlsx
01/01/2021–31/12/2021
364
product 1 2022.xlsx
01/01/2022–31/12/2022
365
product 2 2020.xlsx
01/01/2020–31/12/2020
363
product 2 2021.xlsx
01/01/2021–31/12/2021
364
product 2 2022.xlsx
01/01/2022–31/12/2022
365
product 3 2020.xlsx
01/01/2020–31/12/2020
363
product 3 2021.xlsx
01/01/2021–31/12/2021
364
product 3 2022.xlsx
01/01/2022–31/12/2022
365
product 4 2020.xlsx
01/01/2020–31/12/2020
363
product 4 2021.xlsx
01/01/2021–31/12/2021
364
product 4 2022.xlsx
01/01/2022–31/12/2022
364
product 5 2020.xlsx
01/01/2020–31/12/2020
363
product 5 2021.xlsx
01/01/2021–31/12/2021
364
product 5 2022.xlsx
01/01/2022–31/12/2022
365
product 6 2020.xlsx
01/01/2020–31/12/2020
362
product 6 2021.xlsx
01/01/2021–31/12/2021
364
product 6 2022.xlsx
01/01/2022–31/12/2022
365
product 7 2020.xlsx
01/01/2020–31/12/2020
362
product 7 2021.xlsx
01/01/2021–31/12/2021
364
product 7 2022.xlsx
01/01/2022–31/12/2022
365
3.2 Dataset Overview
The following table enumerates and explains the features included across all of the included files.
Feature
Description
Unit
Day
day of the month
-
Month
Month
-
Year
Year
-
daily_unit_sales
Daily sales - the amount of products, measured in units, that during that specific day were sold
units
previous_year_daily_unit_sales
Previous Year’s sales - the amount of products, measured in units, that during that specific day were sold the previous year
units
percentage_difference_daily_unit_sales
The percentage difference between the two above values
%
daily_unit_sales_kg
The amount of products, measured in kilograms, that during that specific day were sold
kg
previous_year_daily_unit_sales_kg
Previous Year’s sales - the amount of products, measured in kilograms, that during that specific day were sold, the previous year
kg
percentage_difference_daily_unit_sales_kg
The percentage difference between the two above values
kg
daily_unit_returns_kg
The percentage of the products that were shipped to selling points and were returned
%
previous_year_daily_unit_returns_kg
The percentage of the products that were shipped to selling points and were returned the previous year
%
points_of_distribution
The amount of sales representatives through which the product was sold to the market for this year
previous_year_points_of_distribution
The amount of sales representatives through which the product was sold to the market for the same day for the previous year
Table 1 – Dataset Feature Description
4.1 Dataset Structure
The provided dataset has the following structure:
Where:
Name
Type
Property
Readme.docx
Report
A File that contains the documentation of the Dataset.
product X
Folder
A folder containing the data of a product X.
product X YYYY.xlsx
Data file
An excel file containing the sales data of product X for year YYYY.
Table 2 - Dataset File Description
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 957406 (TERMINET).
References
[1] MEVGAL is a Greek dairy production company
Facebook
TwitterThis is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Standard error reference tables for the Retail Sales Index in Great Britain.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sample Excel dataset for the "Retail Sales Performance Dashboard " use-case template. Contains real-world sample data for AI-powered data visualization with ChartGen.
Facebook
TwitterThis dataset contains a list of sales and movement data by item and department appended monthly. Update Frequency : Monthly
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
In the Europe bikes dataset, Extract the insight into sales in each country and each state of their countries using Excel.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sample Excel dataset for the "Monthly Sales Trend by Category" use-case template. Contains real-world sample data for AI-powered data visualization with ChartGen.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset illustrates sales data from a company and its three product lines - boats, cars, and planes. It contains information such as historical and sales data. This is fictional data, created and used for data exploration and profit margin analysis.
The link for the Excel project to download can be found at this GitHub Repository. It includes the raw data, statistical analysis, Pivot Tables, and a dashboard with Pivot Charts for interaction.
Below is a screenshot of the charts for ease.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10624788%2Fc945ef4223f1b0b6c2dfe7ade798e34e%2FWeekly%20Revenue%20by%20Product%20Line.png?generation=1722385095875351&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10624788%2Fd3be2fd1f741b0899e79b9c50c7e29a0%2FRevenue%20and%20Profit%20by%20Quarter.png?generation=1722385108310009&alt=media" alt="">
Facebook
TwitterThis a dataset of finances which are also available in Power BI for practice. Use this dataset to practice Power BI.
Facebook
TwitterThis dataset was created by Shiva Vashishtha
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A data analyst collects and stores data on sales numbers, market research, logistics, linguistics, or other behaviors. They bring technical expertise to ensure the quality and accuracy of that data, then process, design, and present it in ways to help people, businesses, and organizations make better decisions. To contribute to the success of business by utilizing data analysis techniques, like sales forecasting.
Download data CSV files: https://drive.google.com/drive/folders/1HDkNHNslI3rgCv9LZzGtxag8JvYzss-b
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains global sales and profit-related information along with customer, product, and regional details. It is suitable for business analytics, sales performance tracking, and profitability insights.
📊 Included Files: - Excel file (.xlsx) → Contains both the dataset (Sheet 1) and an Excel Dashboard (Sheet 2). - Power BI Dashboard (.pbix) → Built using the same dataset (shared via GitHub/Drive link below). - Screenshots → Sample visuals from the dashboards for quick preview.
📌 Columns in the Dataset: - Customer ID, Customer Name - Quantity Ordered - MSRP, Cost Price, Selling Price - Sales, Profit per Unit, Total Profit/Loss - Status (Completed/Cancelled/Returned) - Order Date, Month, Year - Product, Product Code - City, Country - Deal Size (Small/Medium/Large)
📈 Possible Use Cases: - Sales and profit trend analysis (monthly/yearly) - Customer profitability & segmentation - Regional performance (city & country-level) - Product-wise profitability and sales performance - Deal size impact on revenue and profit - Dashboard creation in Excel and Power BI
👉 Note: This dataset has been used to build both Excel and Power BI Dashboards.
- Excel Dashboard is included inside the .xlsx file.
- Power BI Dashboard (.pbix) is also provided in PDF format.
"This dataset can be used for Business Analytics, Customer Analysis, and building Dashboards in Power BI & Excel."
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sample Excel dataset for the "Top 5 Bubble Tea Monthly Sales Table" use-case template. Contains real-world sample data for AI-powered data visualization with ChartGen.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sample Excel dataset for the "Retail Multi-Region Product Sales Analysis Report (Jul-Oct 2025)" use-case template. Contains real-world sample data for AI-powered data visualization with ChartGen.
Facebook
TwitterThis is a small dataset over a number of bike sales from a bike shop. It includes columns such as the customer's income, marital status, education, etc. Afterwards, a dashboard was created to filter a number of different categories.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The Orders database contains information on the following variables.
• Continuous variables: Row ID, Order ID, Order Date, Ship Date, Customer ID, Product ID, Sales, Quantity, Discount, Profit, Shipping Cost
• Categorical variables: Ship Mode, Customer Name, Segment, Postal Code, City, State, Country, Region, Market, Category, Subcategory, Product Name, Order Priority
The purpose of this project: 1. To use descriptive statistics methods to assess the sales performance across various segments, markets, product categories and subcategories; 2. To use diagnostic analytics methods to understand the statistical significance of the factors that influence sales; 3. Use predictive analytics (regression) to understand the strengths of the relationship between sales and sales drivers and generate a regression formula to predict sales 4. develop a sales forecasting model based on the insights.
Descriptive analytics
Descriptive statistics for sales
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F848f47b38b7f2360163bb2221703c658%2FPicture2.png?generation=1715109635788424&alt=media" alt="">
Frequency distribution for sales
Around 44,500 transactions of value >=USD 500.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F39cfd8ffd8fdf296300bb9f1fa5243e2%2FPicture3.png?generation=1715109667755923&alt=media" alt="">
Sales values across markets
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F3385959d11b6daafae24c848b4b00f13%2FPicture4.png?generation=1715109744629587&alt=media" alt="">
We see an increase in sales across all markets and throughout 2012-2015.
We have high sales volumes in the USCA and LATAM markets:
• USCA: USD 757,108 in 2015;
• LATAM: USD 706,632 in 2015.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F4aa59b5a5b980aad6873c8a4af4cd223%2FPicture1.png?generation=1715109770510368&alt=media" alt="">
Sales across product categories
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F867cbe622bf94d25a25a1c4b9281656d%2FPicture5.png?generation=1715109794950614&alt=media" alt="">
Office supplies were the largely sold product category in 2012-2015. Technology was the least sold product category by quantity. However, the Technology category yields high sales.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F5c74664f77cce2bc2f7c77c7b01e9890%2FPicture6.png?generation=1715109834309500&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2Fd3bb766183e9f58fbf009a998c01adf6%2FPicture7.png?generation=1715109872961254&alt=media" alt="">
Further analysis of profitable products reveals that phones and copiers demonstrate high sales.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F109c4c3eab81fa581c19a5c09beff839%2FPicture9.png?generation=1715109914590660&alt=media" alt="">
Sales across segments
The data reveals that there are high sales in the Consumer segment across all product categories.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F65075cc20028a37a1aff6932fa89d3d5%2FPicture10.png?generation=1715109992655572&alt=media" alt="">
Diagnostic analytics
Two sample T-test
Using a t-test, we can evaluate how sales differ across different segments, regions, and product types. T-test allows us to evaluate the statistical significance of sales samples.
The two-sample t-test of sales numbers across markets resulted in the statistical significance of sales in USCA and LATAM markets with p-values >0.05.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F7b7264d5f44a9a79b352028b28d1c618%2FPicture11.png?generation=1715110082746375&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F4061ef38ea83d7e3bbd252a802863e8f%2FPicture12.png?generation=1715110097203251&alt=media" alt="">
The two-sample t-test of sales numbers across product categories resulted in the statistical significance of sales in Office supplies and Technology categories with p-values >0.05.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2Fd9994377d605222d77ef67af3e273771%2FPicture13.png?generation=1715110126112322&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F669779e9aad19d51a28fb44e7c484bc7%2FPicture14.png?generation=1715110140543290&alt=media" alt="">
Pearson correlation The correlation of continuous values in the dataset allows us to see the relationship between sales, quantity sold, shipping costs and profit. ![](https://www.googleapis.com/download/sto...