100+ datasets found

Z
Dairy Supply Chain Sales Dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Athanasios Liatifis (2024). Dairy Supply Chain Sales Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7853252
Explore at:
Dataset updated
Jul 12, 2024
Dataset provided by
Panagiotis Sarigiannidis
Christos Chaschatzis
Thomas Lagkas
Athanasios Liatifis
Ilias Siniosoglou
Anna Triantafyllou
Dimitris Iatropoulos
Dimitrios Pliatsios
Konstantinos Georgakidis
Vasileios Argyriou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
1.Introduction

Sales data collection is a crucial aspect of any manufacturing industry as it provides valuable insights about the performance of products, customer behaviour, and market trends. By gathering and analysing this data, manufacturers can make informed decisions about product development, pricing, and marketing strategies in Internet of Things (IoT) business environments like the dairy supply chain.

One of the most important benefits of the sales data collection process is that it allows manufacturers to identify their most successful products and target their efforts towards those areas. For example, if a manufacturer could notice that a particular product is selling well in a certain region, this information could be utilised to develop new products, optimise the supply chain or improve existing ones to meet the changing needs of customers.

This dataset includes information about 7 of MEVGAL’s products [1]. According to the above information the data published will help researchers to understand the dynamics of the dairy market and its consumption patterns, which is creating the fertile ground for synergies between academia and industry and eventually help the industry in making informed decisions regarding product development, pricing and market strategies in the IoT playground. The use of this dataset could also aim to understand the impact of various external factors on the dairy market such as the economic, environmental, and technological factors. It could help in understanding the current state of the dairy industry and identifying potential opportunities for growth and development.

Citation

Please cite the following papers when using this dataset:

I. Siniosoglou, K. Xouveroudis, V. Argyriou, T. Lagkas, S. K. Goudos, K. E. Psannis and P. Sarigiannidis, "Evaluating the Effect of Volatile Federated Timeseries on Modern DNNs: Attention over Long/Short Memory," in the 12th International Conference on Circuits and Systems Technologies (MOCAST 2023), April 2023, Accepted

Dataset Modalities

The dataset includes data regarding the daily sales of a series of dairy product codes offered by MEVGAL. In particular, the dataset includes information gathered by the logistics division and agencies within the industrial infrastructures overseeing the production of each product code. The products included in this dataset represent the daily sales and logistics of a variety of yogurt-based stock. Each of the different files include the logistics for that product on a daily basis for three years, from 2020 to 2022.

3.1 Data Collection

The process of building this dataset involves several steps to ensure that the data is accurate, comprehensive and relevant.

The first step is to determine the specific data that is needed to support the business objectives of the industry, i.e., in this publication’s case the daily sales data.

Once the data requirements have been identified, the next step is to implement an effective sales data collection method. In MEVGAL’s case this is conducted through direct communication and reports generated each day by representatives & selling points.

It is also important for MEVGAL to ensure that the data collection process conducted is in an ethical and compliant manner, adhering to data privacy laws and regulation. The industry also has a data management plan in place to ensure that the data is securely stored and protected from unauthorised access.

The published dataset is consisted of 13 features providing information about the date and the number of products that have been sold. Finally, the dataset was anonymised in consideration to the privacy requirement of the data owner (MEVGAL).

File

Period

Number of Samples (days)

product 1 2020.xlsx

01/01/2020–31/12/2020

363

product 1 2021.xlsx

01/01/2021–31/12/2021

364

product 1 2022.xlsx

01/01/2022–31/12/2022

365

product 2 2020.xlsx

01/01/2020–31/12/2020

363

product 2 2021.xlsx

01/01/2021–31/12/2021

364

product 2 2022.xlsx

01/01/2022–31/12/2022

365

product 3 2020.xlsx

01/01/2020–31/12/2020

363

product 3 2021.xlsx

01/01/2021–31/12/2021

364

product 3 2022.xlsx

01/01/2022–31/12/2022

365

product 4 2020.xlsx

01/01/2020–31/12/2020

363

product 4 2021.xlsx

01/01/2021–31/12/2021

364

product 4 2022.xlsx

01/01/2022–31/12/2022

364

product 5 2020.xlsx

01/01/2020–31/12/2020

363

product 5 2021.xlsx

01/01/2021–31/12/2021

364

product 5 2022.xlsx

01/01/2022–31/12/2022

365

product 6 2020.xlsx

01/01/2020–31/12/2020

362

product 6 2021.xlsx

01/01/2021–31/12/2021

364

product 6 2022.xlsx

01/01/2022–31/12/2022

365

product 7 2020.xlsx

01/01/2020–31/12/2020

362

product 7 2021.xlsx

01/01/2021–31/12/2021

364

product 7 2022.xlsx

01/01/2022–31/12/2022

365

3.2 Dataset Overview

The following table enumerates and explains the features included across all of the included files.

Feature

Description

Unit

Day

day of the month

-

Month

Month

-

Year

Year

-

daily_unit_sales

Daily sales - the amount of products, measured in units, that during that specific day were sold

units

previous_year_daily_unit_sales

Previous Year’s sales - the amount of products, measured in units, that during that specific day were sold the previous year

units

percentage_difference_daily_unit_sales

The percentage difference between the two above values

%

daily_unit_sales_kg

The amount of products, measured in kilograms, that during that specific day were sold

kg

previous_year_daily_unit_sales_kg

Previous Year’s sales - the amount of products, measured in kilograms, that during that specific day were sold, the previous year

kg

percentage_difference_daily_unit_sales_kg

The percentage difference between the two above values

kg

daily_unit_returns_kg

The percentage of the products that were shipped to selling points and were returned

%

previous_year_daily_unit_returns_kg

The percentage of the products that were shipped to selling points and were returned the previous year

%

points_of_distribution

The amount of sales representatives through which the product was sold to the market for this year

previous_year_points_of_distribution

The amount of sales representatives through which the product was sold to the market for the same day for the previous year

Table 1 – Dataset Feature Description

Structure and Format

4.1 Dataset Structure

The provided dataset has the following structure:

Where:

Name

Type

Property

Readme.docx

Report

A File that contains the documentation of the Dataset.

product X

Folder

A folder containing the data of a product X.

product X YYYY.xlsx

Data file

An excel file containing the sales data of product X for year YYYY.

Table 2 - Dataset File Description

Acknowledgement

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 957406 (TERMINET).

References

[1] MEVGAL is a Greek dairy production company
Online Sales Dataset - Popular Marketplace Data
kaggle.com
Updated May 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ShreyanshVerma27 (2024). Online Sales Dataset - Popular Marketplace Data [Dataset]. https://www.kaggle.com/datasets/shreyanshverma27/online-sales-dataset-popular-marketplace-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 25, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ShreyanshVerma27
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides a comprehensive overview of online sales transactions across different product categories. Each row represents a single transaction with detailed information such as the order ID, date, category, product name, quantity sold, unit price, total price, region, and payment method.

Columns:

Order ID: Unique identifier for each sales order.

Date:Date of the sales transaction.

Category:Broad category of the product sold (e.g., Electronics, Home Appliances, Clothing, Books, Beauty Products, Sports).

Product Name:Specific name or model of the product sold.

Quantity:Number of units of the product sold in the transaction.

Unit Price:Price of one unit of the product.

Total Price: Total revenue generated from the sales transaction (Quantity * Unit Price).

Region:Geographic region where the transaction occurred (e.g., North America, Europe, Asia).

Payment Method: Method used for payment (e.g., Credit Card, PayPal, Debit Card).

Insights:

1. Analyze sales trends over time to identify seasonal patterns or growth opportunities.

2. Explore the popularity of different product categories across regions.

3. Investigate the impact of payment methods on sales volume or revenue.

4. Identify top-selling products within each category to optimize inventory and marketing strategies.

5. Evaluate the performance of specific products or categories in different regions to tailor marketing campaigns accordingly.
d
Real Estate Sales 2001-2023 GL
catalog.data.gov
data.ct.gov
Updated Aug 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ct.gov (2025). Real Estate Sales 2001-2023 GL [Dataset]. https://catalog.data.gov/dataset/real-estate-sales-2001-2018
Explore at:
Dataset updated
Aug 23, 2025
Dataset provided by
data.ct.gov
Description
The Office of Policy and Management maintains a listing of all real estate sales with a sales price of $2,000 or greater that occur between October 1 and September 30 of each year. For each sale record, the file includes: town, property address, date of sale, property type (residential, apartment, commercial, industrial or vacant land), sales price, and property assessment. Data are collected in accordance with Connecticut General Statutes, section 10-261a and 10-261b: https://www.cga.ct.gov/current/pub/chap_172.htm#sec_10-261a and https://www.cga.ct.gov/current/pub/chap_172.htm#sec_10-261b. Annual real estate sales are reported by grand list year (October 1 through September 30 each year). For instance, sales from 2018 GL are from 10/01/2018 through 9/30/2019. Some municipalities may not report data for certain years because when a municipality implements a revaluation, they are not required to submit sales data for the twelve months following implementation.
Superstore Sales Analysis
kaggle.com
Updated Oct 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Reda Elblgihy (2023). Superstore Sales Analysis [Dataset]. https://www.kaggle.com/datasets/aliredaelblgihy/superstore-sales-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ali Reda Elblgihy
Description
Analyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:

1- Data Import and Transformation:

Gather and import relevant sales data from various sources into Excel.

Utilize Power Query to clean, transform, and structure the data for analysis.

Merge and link different data sheets to create a cohesive dataset, ensuring that all data fields are connected logically.

2- Data Quality Assessment:

Perform data quality checks to identify and address issues like missing values, duplicates, outliers, and data inconsistencies.

Standardize data formats and ensure that all data is in a consistent, usable state.

3- Calculating COGS:

Determine the Cost of Goods Sold (COGS) for each product sold by considering factors like purchase price, shipping costs, and any additional expenses.

Apply appropriate formulas and calculations to determine COGS accurately.

4- Discount Analysis:

Analyze the discount values offered on products to understand their impact on sales and profitability.

Calculate the average discount percentage, identify trends, and visualize the data using charts or graphs.

5- Sales Metrics:

Calculate and analyze various sales metrics, such as total revenue, profit margins, and sales growth.

Utilize Excel functions to compute these metrics and create visuals for better insights.

6- Visualization:

Create visualizations, such as charts, graphs, and pivot tables, to present the data in an understandable and actionable format.

Visual representations can help identify trends, outliers, and patterns in the data.

7- Report Generation:

Compile the findings and insights into a well-structured report or dashboard, making it easy for stakeholders to understand and make informed decisions.

Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.
Candy sales 2020
figshare.com
txt
Updated Oct 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
cj lortie (2020). Candy sales 2020 [Dataset]. http://doi.org/10.6084/m9.figshare.13125551.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13125551.v1
Dataset updated
Oct 22, 2020
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
cj lortie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data scraped from National Retail Federation webpage for 2020.
d
Sales Tax Collections by State
catalog.data.gov
data.bts.gov
+2more
Updated Aug 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Transportation Statistics (2024). Sales Tax Collections by State [Dataset]. https://catalog.data.gov/dataset/sales-tax-collections-by-state
Explore at:
Dataset updated
Aug 9, 2024
Dataset provided by
Bureau of Transportation Statistics
Description
Monthly state sales tax collections is an experimental dataset published by the U.S. Census Bureau. It provides data for collections from sales taxes including motor fuel taxes. Data reported for a specific month generally represent sales taxes collected on sales made during the prior month. Tax collections primarily rely on unaudited data collected from existing state reports or state data sources available from and posted on the Internet. Secondarily, states report the data via the Quarterly Survey of State and Local Tax Revenue. Data are updated monthly, but due to differing reporting cycles data for some states may lag.
Europe Bike Store Sales
kaggle.com
Updated Mar 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PrepInsta Technologies (2023). Europe Bike Store Sales [Dataset]. https://www.kaggle.com/datasets/prepinstaprime/europe-bike-store-sales
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
PrepInsta Technologies
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Europe
Description
In the Europe bikes dataset, Extract the insight into sales in each country and each state of their countries using Excel.
European plant-based foods sales data 2017-2020 (Nielsen Market Track)
zenodo.org
Updated Apr 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
www.smartproteinproject.eu; www.smartproteinproject.eu (2022). European plant-based foods sales data 2017-2020 (Nielsen Market Track) [Dataset]. http://doi.org/10.5281/zenodo.6411841
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6411841
Dataset updated
Apr 4, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
www.smartproteinproject.eu; www.smartproteinproject.eu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset consists of Excel (.xlsx) files with data on sales of plant-based food products between 2017 and 2020 in a number of European countries (i.e. Austria, Belgium, Denmark, France, Germany, Italy, the Netherlands, Poland, Romania, Spain and the UK.)

The data are clearly labelled within each file. The key variables (common across datasets) are Value in Euros, Volume in KG/LIT and Volume in Selling Units for a number of meat and dairy substitute food products specific to the retail region.

The data were originally collected by Nielsen Market Track. They were analysed on the Smart Protein project in 2021 and used to publish an extensive market data report and to host a public webinar, both entitled Plant-based foods in Europe: how big is the market?
n
Market Analysis for X-FILES
nsc.onl
Updated Aug 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Market Analysis for X-FILES [Dataset]. https://nsc.onl/cards/tag/13702/x-files
Explore at:
Dataset updated
Aug 18, 2025
Variables measured
Countries, Price Range, Median Price, Average Price, Sold Listings, Total Listings, Active Listings, Unsold Listings, Number of Sellers, Sell-Through Rate
Description
Comprehensive market data and analytics for X-FILES including pricing distribution, seller metrics, and market trends.
Realistic Sales Revenue Dataset
kaggle.com
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shoukat khan (2025). Realistic Sales Revenue Dataset [Dataset]. https://www.kaggle.com/datasets/drisrarahmad/realistic-sales-revenue-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 16, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shoukat khan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
📄 Description: This synthetic dataset is designed for practising regression tasks, particularly for predicting Sales Revenue based on product, market, and economic factors. It contains both categorical (nominal) and numerical features, simulating real-world sales data across various product categories and regions.

📌 Dataset Summary: Rows: 2000

Columns: 12 features + 1 target (SalesRevenue)

🏷️ Columns Description: Column Name Type Description ProductCategory Categorical Type of product: Electronics, Clothing, Furniture, Toys Region Categorical Sales region: North, South, East, West CustomerSegment Categorical Customer income group: Low, Middle, High IsPromotionApplied Categorical Whether promotion was applied: Yes/No ProductionCost Numerical Cost to produce the product MarketingSpend Numerical Money spent on marketing SeasonalDemandIndex Numerical Factor representing seasonal demand CompetitorPrice Numerical Average price of competing products CustomerRating Numerical Average customer rating (out of 5) EconomicIndex Numerical Indicator of overall economic conditions StoreCount Numerical Number of stores selling the product OnlinePresence Numerical Online presence score of the product SalesRevenue Numerical Target Variable: Revenue from product sales
A
‘Big Mart Sales’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Big Mart Sales’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-big-mart-sales-132a/55ae27c6/?iid=037-342&v=presentation
Explore at:
Dataset updated
Nov 12, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Big Mart Sales’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/akashdeepkuila/big-mart-sales on 12 November 2021.

--- Dataset description provided by original source is as follows ---

Context

The data scientists at Big Mart have collected 2013 sales data for 1559 products across 10 stores in different cities. Also, certain attributes of each product and store have been defined. The aim is to build a predictive model and predict the sales of each product at a particular outlet.

Using this model, Big Mart will try to understand the properties of products and outlets which play a key role in increasing sales.

Please note that the data may have missing values as some stores might not report all the data due to technical glitches. Hence, it will be required to treat them accordingly.

Content

The dataset provides the product details and the outlet information of the products purchased with their sales value split into a train set (8523) and a test (5681) set. Train file: CSV containing the item outlet information with sales value Test file: CSV containing item outlet combinations for which sales need to be forecasted

Variable Description

ProductID : unique product ID

Weight : weight of products

FatContent : specifies whether the product is low on fat or not

Visibility : percentage of total display area of all products in a store allocated to the particular product

ProductType : the category to which the product belongs

MRP : Maximum Retail Price (listed price) of the products

OutletID : unique store ID

EstablishmentYear : year of establishment of the outlets

OutletSize : the size of the store in terms of ground area covered

LocationType : the type of city in which the store is located

OutletType : specifies whether the outlet is just a grocery store or some sort of supermarket

OutletSales : (target variable) sales of the product in the particular store

Inspiration

Sales of a given product at a retail store can depend both on store attributes as well as product attributes. The dataset is ideal to explore and build a data science model to predict the future sales.

--- Original source retains full ownership of the source dataset ---
t
Evaluating FAIR Models for Rossmann Store Sales Prediction: Insights and...
test.researchdata.tuwien.ac.at
bin, csv, json +1
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dilara Çakmak; Dilara Çakmak; Dilara Çakmak; Dilara Çakmak (2025). Evaluating FAIR Models for Rossmann Store Sales Prediction: Insights and Performance Analysis [Dataset]. http://doi.org/10.70124/f5t2d-xt904
Explore at:
csv, text/markdown, json, binAvailable download formats
Unique identifier
https://doi.org/10.70124/f5t2d-xt904
Dataset updated
Apr 28, 2025
Dataset provided by
TU Wien
Authors
Dilara Çakmak; Dilara Çakmak; Dilara Çakmak; Dilara Çakmak
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 2025
Description
Context and Methodology

Research Domain:
The dataset is part of a project focused on retail sales forecasting. Specifically, it is designed to predict daily sales for Rossmann, a chain of over 3,000 drug stores operating across seven European countries. The project falls under the broader domain of time series analysis and machine learning applications for business optimization. The goal is to apply machine learning techniques to forecast future sales based on historical data, which includes factors like promotions, competition, holidays, and seasonal trends.

Purpose:
The primary purpose of this dataset is to help Rossmann store managers predict daily sales for up to six weeks in advance. By making accurate sales predictions, Rossmann can improve inventory management, staffing decisions, and promotional strategies. This dataset serves as a training set for machine learning models aimed at reducing forecasting errors and supporting decision-making processes across the company’s large network of stores.

How the Dataset Was Created:
The dataset was compiled from several sources, including historical sales data from Rossmann stores, promotional calendars, holiday schedules, and external factors such as competition. The data is split into multiple features, such as the store's location, promotion details, whether the store was open or closed, and weather information. The dataset is publicly available on platforms like Kaggle and was initially created for the Kaggle Rossmann Store Sales competition. The data is made accessible via an API for further analysis and modeling, and it is structured to help machine learning models predict future sales based on various input variables.

Technical Details

Dataset Structure:

The dataset consists of three main files, each with its specific role:

Train:
This file contains the historical sales data, which is used to train machine learning models. It includes daily sales information for each store, as well as various features that could influence the sales (e.g., promotions, holidays, store type, etc.).

https://handle.test.datacite.org/10.82556/yb6j-jw41
PID: b1c59499-9c6e-42c2-af8f-840181e809db

Test2:
The test dataset mirrors the structure of train.csv but does not include the actual sales values (i.e., the target variable). This file is used for making predictions using the trained machine learning models. It is used to evaluate the accuracy of predictions when the true sales data is unknown.

https://handle.test.datacite.org/10.82556/jerg-4b84
PID: 7cbb845c-21dd-4b60-b990-afa8754a0dd9

Store:
This file provides metadata about each store, including information such as the store’s location, type, and assortment level. This data is essential for understanding the context in which the sales data is gathered.

https://handle.test.datacite.org/10.82556/nqeg-gy34
PID: 9627ec46-4ee6-4969-b14a-bda555fe34db

Data Fields Description:

Id: A unique identifier for each (Store, Date) combination within the test set.

Store: A unique identifier for each store.

Sales: The daily turnover (target variable) for each store on a specific day (this is what you are predicting).

Customers: The number of customers visiting the store on a given day.

Open: An indicator of whether the store was open (1 = open, 0 = closed).

StateHoliday: Indicates if the day is a state holiday, with values like:

'a' = public holiday,

'b' = Easter holiday,

'c' = Christmas,

'0' = no holiday.

SchoolHoliday: Indicates whether the store is affected by school closures (1 = yes, 0 = no).

StoreType: Differentiates between four types of stores: 'a', 'b', 'c', 'd'.

Assortment: Describes the level of product assortment in the store:

'a' = basic,

'b' = extra,

'c' = extended.

CompetitionDistance: Distance (in meters) to the nearest competitor store.

CompetitionOpenSince[Month/Year]: The month and year when the nearest competitor store opened.

Promo: Indicates whether the store is running a promotion on a particular day (1 = yes, 0 = no).

Promo2: Indicates whether the store is participating in Promo2, a continuing promotion for some stores (1 = participating, 0 = not participating).

Promo2Since[Year/Week]: The year and calendar week when the store started participating in Promo2.

PromoInterval: Describes the months when Promo2 is active, e.g., "Feb,May,Aug,Nov" means the promotion starts in February, May, August, and November.

Software Requirements

To work with this dataset, you will need to have specific software installed, including:

DBRepo Authorization: This is required to access the datasets via the DBRepo API. You may need to authenticate with an API key or login credentials to retrieve the datasets.

Python Libraries: Key libraries for working with the dataset include:

pandas for data manipulation,

numpy for numerical operations,

matplotlib and seaborn for data visualization,

scikit-learn for machine learning algorithms.

Additional Resources

Several additional resources are available for working with the dataset:

Presentation:
A presentation summarizing the exploratory data analysis (EDA), feature engineering process, and key insights from the analysis is provided. This presentation also includes visualizations that help in understanding the dataset’s trends and relationships.

Jupyter Notebook:
A Jupyter notebook, titled Retail_Sales_Prediction_Capstone_Project.ipynb, is provided, which details the entire machine learning pipeline, from data loading and cleaning to model training and evaluation.

Model Evaluation Results:
The project includes a detailed evaluation of various machine learning models, including their performance metrics like training and testing scores, Mean Absolute Percentage Error (MAPE), and Root Mean Squared Error (RMSE). This allows for a comparison of model effectiveness in forecasting sales.

Trained Models (.pkl files):
The models trained during the project are saved as .pkl files. These files contain the trained machine learning models (e.g., Random Forest, Linear Regression, etc.) that can be loaded and used to make predictions without retraining the models from scratch.

sample_submission.csv:
This file is a sample submission file that demonstrates the format of predictions expected when using the trained model. The sample_submission.csv contains predictions made on the test dataset using the trained Random Forest model. It provides an example of how the output should be structured for submission.

These resources provide a comprehensive guide to implementing and analyzing the sales forecasting model, helping you understand the data, methods, and results in greater detail.
d
Connecticut Sales and Use Tax Data
catalog.data.gov
data.ct.gov
+1more
Updated Jun 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ct.gov (2025). Connecticut Sales and Use Tax Data [Dataset]. https://catalog.data.gov/dataset/connecticut-sales-and-use-tax-data
Explore at:
Dataset updated
Jun 14, 2025
Dataset provided by
data.ct.gov
Area covered
Connecticut
Description
The Sales and Use Tax is a state-imposed tax on various transactions, including the sale, rental, or lease of goods, sale of taxable services, and operation of lodging establishments within Connecticut. Individuals and businesses engaging in these activities are required to register with the Department of Revenue Services (DRS) and obtain a Sales and Use Tax Permit. Tax rates vary depending on the type of transaction, with special rates applying to certain sales, such as meals, luxury items, and other specific goods. Businesses are required to electronically file Form OS-114 to report all sales activity, regardless of whether taxes are due.
O
All Permitted Sales Tax Locations and Local Sales Tax Responsibility
data.texas.gov
s.cnmilf.com
+1more
application/rdfxml +5
Updated Sep 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Texas Comptroller of Public Accounts (2025). All Permitted Sales Tax Locations and Local Sales Tax Responsibility [Dataset]. https://data.texas.gov/dataset/All-Permitted-Sales-Tax-Locations-and-Local-Sales-/3kx8-uryv
Explore at:
tsv, csv, application/rdfxml, application/rssxml, json, xmlAvailable download formats
Dataset updated
Sep 1, 2025
Dataset authored and provided by
Texas Comptroller of Public Accounts
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
This file includes all sales tax outlets, and local tax responsibility, in Texas which have been active during the last four years. Inactive outlets will include an Out-of-Business date.
Electronic Sales
kaggle.com
Updated Dec 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anshul Pachauri (2023). Electronic Sales [Dataset]. https://www.kaggle.com/datasets/anshulpachauri/electronic-sales
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Anshul Pachauri
Description
The provided Python code is a comprehensive analysis of sales data for a business that involves the merging of monthly sales data, cleaning and augmenting the dataset, and performing various analytical tasks. Here's a breakdown of the code:

Data Preparation and Merging:

The code begins by importing necessary libraries and filtering out warnings. It merges sales data from 12 months into a single file named "all_data.csv." Data Cleaning:

Rows with NaN values are dropped, and any entries starting with 'Or' in the 'Order Date' column are removed. Columns like 'Quantity Ordered' and 'Price Each' are converted to numeric types for further analysis. Data Augmentation:

Additional columns such as 'Month,' 'Sales,' and 'City' are added to the dataset. The 'City' column is derived from the 'Purchase Address' column. Analysis:

Several analyses are conducted, answering questions such as: The best month for sales and total earnings. The city with the highest number of sales. The ideal time for advertisements based on the number of orders per hour. Products that are often sold together. The best-selling products and their correlation with price. Visualization:

Bar charts and line plots are used for visualizing the analysis results, making it easier to interpret trends and patterns. Matplotlib is employed for creating visualizations. Summary:

The code concludes with a comprehensive visualization that combines the quantity ordered and average price for each product, shedding light on product performance. This code is structured to offer insights into sales patterns, customer behavior, and product performance, providing valuable information for strategic decision-making in the business.
d
Prospect Data | 148MM+ US Contacts for B2B Sales Prospecting, Sales...
datarade.ai
.json, .csv, .xls
Updated Jul 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salutary Data (2023). Prospect Data | 148MM+ US Contacts for B2B Sales Prospecting, Sales Intelligence, and Sales Outreach [Dataset]. https://datarade.ai/data-products/salutary-data-prospect-data-62m-us-contacts-for-b2b-sale-salutary-data
Explore at:
.json, .csv, .xlsAvailable download formats
Dataset updated
Jul 15, 2023
Dataset authored and provided by
Salutary Data
Area covered
United States of America
Description
Salutary Data is a boutique, B2B contact and company data provider that's committed to delivering high quality data for sales intelligence, lead generation, marketing, recruiting / HR, identity resolution, and ML / AI. Our database currently consists of 148MM+ highly curated B2B Contacts ( US only), along with over 4MM+ companies, and is updated regularly to ensure we have the most up-to-date information.

We can enrich your in-house data ( CRM Enrichment, Lead Enrichment, etc.) and provide you with a custom dataset ( such as a lead list) tailored to your target audience specifications and data use-case. We also support large-scale data licensing to software providers and agencies that intend to redistribute our data to their customers and end-users.

What makes Salutary unique? - We offer our clients a truly unique, one-stop aggregation of the best-of-breed quality data sources. Our supplier network consists of numerous, established high quality suppliers that are rigorously vetted. - We leverage third party verification vendors to ensure phone numbers and emails are accurate and connect to the right person. Additionally, we deploy automated and manual verification techniques to ensure we have the latest job information for contacts. - We're reasonably priced and easy to work with.

Products: API Suite Web UI Full and Custom Data Feeds

Services: Data Enrichment - We assess the fill rate gaps and profile your customer file for the purpose of appending fields, updating information, and/or rendering net new “look alike” prospects for your campaigns. ABM Match & Append - Send us your domain or other company related files, and we’ll match your Account Based Marketing targets and provide you with B2B contacts to campaign. Optionally throw in your suppression file to avoid any redundant records. Verification (“Cleaning/Hygiene”) Services - Address the 2% per month aging issue on contact records! We will identify duplicate records, contacts no longer at the company, rid your email hard bounces, and update/replace titles or phones. This is right up our alley and levers our existing internal and external processes and systems.
Forecast: Office Files, Storage Units and Tables Sales in the US 2024 - 2028...
reportlinker.com
Updated Apr 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ReportLinker (2024). Forecast: Office Files, Storage Units and Tables Sales in the US 2024 - 2028 [Dataset]. https://www.reportlinker.com/dataset/257349bbb197eea280d0a9e4e03cad29fdf45a88
Explore at:
Dataset updated
Apr 11, 2024
Dataset provided by
Reportlinker
Authors
ReportLinker
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
United States
Description
Forecast: Office Files, Storage Units and Tables Sales in the US 2024 - 2028 Discover more data with ReportLinker!
UK House Price Index: data downloads September 2021
gov.uk
Updated Nov 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HM Land Registry (2021). UK House Price Index: data downloads September 2021 [Dataset]. https://www.gov.uk/government/statistical-data-sets/uk-house-price-index-data-downloads-september-2021
Explore at:
Dataset updated
Nov 17, 2021
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
HM Land Registry
Area covered
United Kingdom
Description
The UK House Price Index is a National Statistic.

Create your report

Download the full UK House Price Index data below, or use our tool to https://landregistry.data.gov.uk/app/ukhpi?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=tool&utm_term=9.30_17_11_21" class="govuk-link">create your own bespoke reports.

Download the data

Datasets are available as CSV files. Find out about republishing and making use of the data.

Google Chrome is blocking downloads of our UK HPI data files (Chrome 88 onwards). Please use another internet browser while we resolve this issue. We apologise for any inconvenience caused.

Full file

This file includes a derived back series for the new UK HPI. Under the UK HPI, data is available from 1995 for England and Wales, 2004 for Scotland and 2005 for Northern Ireland. A longer back series has been derived by using the historic path of the Office for National Statistics HPI to construct a series back to 1968.

Download the full UK HPI background file:

http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/UK-HPI-full-file-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=full_fil&utm_term=9.30_17_11_21" class="govuk-link">UK HPI full file (CSV, 58.6MB)

Individual attributes files

If you are interested in a specific attribute, we have separated them into these CSV files:

http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Average-prices-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=average_price&utm_term=9.30_17_11_21" class="govuk-link">Average price (CSV, 9.2MB)

http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Average-prices-Property-Type-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=average_price_property_price&utm_term=9.30_17_11_21" class="govuk-link">Average price by property type (CSV, 27.8MB)

http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Sales-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=sales&utm_term=9.30_17_11_21" class="govuk-link">Sales (CSV, 4.7MB)

http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Cash-mortgage-sales-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=cash_mortgage-sales&utm_term=9.30_17_11_21" class="govuk-link">Cash mortgage sales (CSV, 6.2MB)

http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/First-Time-Buyer-Former-Owner-Occupied-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=FTNFOO&utm_term=9.30_17_11_21" class="govuk-link">First time buyer and former owner occupier (CSV, 5.9MB)

http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/New-and-Old-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=new_build&utm_term=9.30_17_11_21" class="govuk-link">New build and existing resold property (CSV, 16.9MB)

http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Indices-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=index&utm_term=9.30_17_11_21" class="govuk-link">Index (CSV, 5.9MB)

http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Indices-seasonally-adjusted-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=index_season_adjusted&utm_term=9.30_17_11_21" class="govuk-link">Index seasonally adjusted (CSV, 194KB)

http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Average-price-seasonally-adjusted-2021-09.csv?utm_medium=GOV.UK&utm_source=datadownload&utm_campaign=average-price_season_adjusted&utm_term=9.30_17_11_21" class="govuk-link">Average price seasonally a
File Folders sales volume on TikTok Shop
ecommerce.aftership.com
Updated Apr 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AfterShip (2025). File Folders sales volume on TikTok Shop [Dataset]. https://ecommerce.aftership.com/product-trends/file-folders
Explore at:
Dataset updated
Apr 24, 2025
Dataset authored and provided by
AfterShiphttps://www.aftership.com/
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Outsmart competitors: Analyze File Folders sales cycles on TikTok Shop. Compare your growth trajectory against category averages (presented as logarithmic values) to identify underutilized promotion windows and stock positioning gaps.
O
Mixed Beverage Sales Receipts
data.texas.gov
datasets.ai
+2more
application/rdfxml +5
Updated Aug 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Texas Comptroller of Public Accounts (2025). Mixed Beverage Sales Receipts [Dataset]. https://data.texas.gov/dataset/Mixed-Beverage-Sales-Receipts/g5bj-yb6k
Explore at:
application/rssxml, csv, xml, application/rdfxml, tsv, jsonAvailable download formats
Dataset updated
Aug 13, 2025
Dataset authored and provided by
Texas Comptroller of Public Accounts
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
This file contains a list of taxpayers required to report mixed beverage sales tax reports under Tax Code Chapter 183, Subchapter B-1. The list provides taxpayer names, amounts reported, and other public information.

See https://comptroller.texas.gov/about/policies/privacy.php for more information on our agency’s privacy and security policies.

Facebook

Twitter

Click to copy link

Link copied

Cite

Athanasios Liatifis (2024). Dairy Supply Chain Sales Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7853252

Dairy Supply Chain Sales Dataset

Explore at:

Dataset updated

Jul 12, 2024

Dataset provided by

Panagiotis Sarigiannidis
Christos Chaschatzis
Thomas Lagkas
Athanasios Liatifis
Ilias Siniosoglou
Anna Triantafyllou
Dimitris Iatropoulos
Dimitrios Pliatsios
Konstantinos Georgakidis
Vasileios Argyriou

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

1.Introduction

Sales data collection is a crucial aspect of any manufacturing industry as it provides valuable insights about the performance of products, customer behaviour, and market trends. By gathering and analysing this data, manufacturers can make informed decisions about product development, pricing, and marketing strategies in Internet of Things (IoT) business environments like the dairy supply chain.

One of the most important benefits of the sales data collection process is that it allows manufacturers to identify their most successful products and target their efforts towards those areas. For example, if a manufacturer could notice that a particular product is selling well in a certain region, this information could be utilised to develop new products, optimise the supply chain or improve existing ones to meet the changing needs of customers.

This dataset includes information about 7 of MEVGAL’s products [1]. According to the above information the data published will help researchers to understand the dynamics of the dairy market and its consumption patterns, which is creating the fertile ground for synergies between academia and industry and eventually help the industry in making informed decisions regarding product development, pricing and market strategies in the IoT playground. The use of this dataset could also aim to understand the impact of various external factors on the dairy market such as the economic, environmental, and technological factors. It could help in understanding the current state of the dairy industry and identifying potential opportunities for growth and development.

Citation

Please cite the following papers when using this dataset:

I. Siniosoglou, K. Xouveroudis, V. Argyriou, T. Lagkas, S. K. Goudos, K. E. Psannis and P. Sarigiannidis, "Evaluating the Effect of Volatile Federated Timeseries on Modern DNNs: Attention over Long/Short Memory," in the 12th International Conference on Circuits and Systems Technologies (MOCAST 2023), April 2023, Accepted

Dataset Modalities

The dataset includes data regarding the daily sales of a series of dairy product codes offered by MEVGAL. In particular, the dataset includes information gathered by the logistics division and agencies within the industrial infrastructures overseeing the production of each product code. The products included in this dataset represent the daily sales and logistics of a variety of yogurt-based stock. Each of the different files include the logistics for that product on a daily basis for three years, from 2020 to 2022.

3.1 Data Collection

The process of building this dataset involves several steps to ensure that the data is accurate, comprehensive and relevant.

The first step is to determine the specific data that is needed to support the business objectives of the industry, i.e., in this publication’s case the daily sales data.

Once the data requirements have been identified, the next step is to implement an effective sales data collection method. In MEVGAL’s case this is conducted through direct communication and reports generated each day by representatives & selling points.

It is also important for MEVGAL to ensure that the data collection process conducted is in an ethical and compliant manner, adhering to data privacy laws and regulation. The industry also has a data management plan in place to ensure that the data is securely stored and protected from unauthorised access.

The published dataset is consisted of 13 features providing information about the date and the number of products that have been sold. Finally, the dataset was anonymised in consideration to the privacy requirement of the data owner (MEVGAL).

File

Period

Number of Samples (days)

product 1 2020.xlsx

01/01/2020–31/12/2020

363

product 1 2021.xlsx

01/01/2021–31/12/2021

364

product 1 2022.xlsx

01/01/2022–31/12/2022

365

product 2 2020.xlsx

01/01/2020–31/12/2020

363

product 2 2021.xlsx

01/01/2021–31/12/2021

364

product 2 2022.xlsx

01/01/2022–31/12/2022

365

product 3 2020.xlsx

01/01/2020–31/12/2020

363

product 3 2021.xlsx

01/01/2021–31/12/2021

364

product 3 2022.xlsx

01/01/2022–31/12/2022

365

product 4 2020.xlsx

01/01/2020–31/12/2020

363

product 4 2021.xlsx

01/01/2021–31/12/2021

364

product 4 2022.xlsx

01/01/2022–31/12/2022

364

product 5 2020.xlsx

01/01/2020–31/12/2020

363

product 5 2021.xlsx

01/01/2021–31/12/2021

364

product 5 2022.xlsx

01/01/2022–31/12/2022

365

product 6 2020.xlsx

01/01/2020–31/12/2020

362

product 6 2021.xlsx

01/01/2021–31/12/2021

364

product 6 2022.xlsx

01/01/2022–31/12/2022

365

product 7 2020.xlsx

01/01/2020–31/12/2020

362

product 7 2021.xlsx

01/01/2021–31/12/2021

364

product 7 2022.xlsx

01/01/2022–31/12/2022

365

3.2 Dataset Overview

The following table enumerates and explains the features included across all of the included files.

Feature

Description

Unit

Day

day of the month

Month

Year

daily_unit_sales

Daily sales - the amount of products, measured in units, that during that specific day were sold

units

previous_year_daily_unit_sales

Previous Year’s sales - the amount of products, measured in units, that during that specific day were sold the previous year

units

percentage_difference_daily_unit_sales

The percentage difference between the two above values

daily_unit_sales_kg

The amount of products, measured in kilograms, that during that specific day were sold

previous_year_daily_unit_sales_kg

Previous Year’s sales - the amount of products, measured in kilograms, that during that specific day were sold, the previous year

percentage_difference_daily_unit_sales_kg

The percentage difference between the two above values

daily_unit_returns_kg

The percentage of the products that were shipped to selling points and were returned

previous_year_daily_unit_returns_kg

The percentage of the products that were shipped to selling points and were returned the previous year

points_of_distribution

The amount of sales representatives through which the product was sold to the market for this year

previous_year_points_of_distribution

The amount of sales representatives through which the product was sold to the market for the same day for the previous year

Table 1 – Dataset Feature Description

Structure and Format

4.1 Dataset Structure

The provided dataset has the following structure:

Where:

Name

Type

Property

Readme.docx

Report

A File that contains the documentation of the Dataset.

product X

Folder

A folder containing the data of a product X.

product X YYYY.xlsx

Data file

An excel file containing the sales data of product X for year YYYY.

Table 2 - Dataset File Description

Acknowledgement

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 957406 (TERMINET).

References

[1] MEVGAL is a Greek dairy production company

Clear search

Close search

Google apps

Main menu

Dairy Supply Chain Sales Dataset

Online Sales Dataset - Popular Marketplace Data

Columns:

Insights:

Real Estate Sales 2001-2023 GL

Superstore Sales Analysis

Candy sales 2020

Sales Tax Collections by State

Europe Bike Store Sales

European plant-based foods sales data 2017-2020 (Nielsen Market Track)

Market Analysis for X-FILES

Realistic Sales Revenue Dataset

‘Big Mart Sales’ analyzed by Analyst-2

Context

Content

Variable Description

Inspiration

Evaluating FAIR Models for Rossmann Store Sales Prediction: Insights and...

Context and Methodology

Technical Details

Data Fields Description:

Software Requirements

Additional Resources

Connecticut Sales and Use Tax Data

All Permitted Sales Tax Locations and Local Sales Tax Responsibility

Electronic Sales

Prospect Data | 148MM+ US Contacts for B2B Sales Prospecting, Sales...

Forecast: Office Files, Storage Units and Tables Sales in the US 2024 - 2028...

UK House Price Index: data downloads September 2021

Create your report

Download the data

Full file

Individual attributes files

File Folders sales volume on TikTok Shop

Mixed Beverage Sales Receipts

Dairy Supply Chain Sales DatasetSee More Versions

Dairy Supply Chain Sales Dataset