AbhayBhan/SalesData dataset hosted on Hugging Face and contributed by the HF Datasets community
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This synthetic dataset simulates daily-level FMCG sales transactions for three consecutive years (2022, 2023, 2024), designed for practicing time series forecasting, demand planning, and machine learning in realistic business conditions.
Inspired by real-world scenarios (e.g. Nestlé, Unilever, P&G), it includes: - Product hierarchy: SKU → Brand → Segment → Category - Sales channels: Retail / Discount / E-commerce - Regions: Central, North, and South (Poland) - Daily sales quantities, prices, promotions, stock, delivery lag (lead time) - Pack types: Single / Multipack / Carton - Seasonality and product introductions: - New SKUs are introduced in 2024 only - Prices gradually increase over the years
Possible Use Cases - Weekly sales forecasting - Promotion effect analysis - Seasonality and trend modeling - New product forecasting (cold start) - Feature engineering for ML models
Created by: Beata Faron
LinkedIn profile
Data Scientist working on demand forecasting, NLP, and business-oriented ML.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides comprehensive insights into US regional sales data across different sales channels, including In-Store, Online, Distributor, and Wholesale. With a total of 17,992 rows and 15 columns, this dataset encompasses a wide range of information, from order and product details to sales performance metrics. It offers a comprehensive overview of sales transactions and customer interactions, enabling deep analysis of sales patterns, trends, and potential opportunities.
Columns in the dataset: - OrderNumber: A unique identifier for each order. - Sales Channel: The channel through which the sale was made (In-Store, Online, Distributor, Wholesale). - WarehouseCode: Code representing the warehouse involved in the order. - ProcuredDate: Date when the products were procured. - OrderDate: Date when the order was placed. - ShipDate: Date when the order was shipped. - DeliveryDate: Date when the order was delivered. - SalesTeamID: Identifier for the sales team involved. - CustomerID: Identifier for the customer. - StoreID: Identifier for the store. - ProductID: Identifier for the product. - Order Quantity: Quantity of products ordered. - Discount Applied: Applied discount for the order. - Unit Cost: Cost of a single unit of the product. - Unit Price: Price at which the product was sold.
This dataset serves as a valuable resource for analysing sales trends, identifying popular products, assessing the performance of different sales channels, and optimising pricing strategies for different regions.
Visualization Ideas:
Data Modelling and Machine Learning Ideas (Price Prediction): - Linear Regression: Build a linear regression model to predict the unit price based on features such as order quantity, discount applied, and unit cost. - Random Forest Regression: Use a random forest regression model to predict the price, taking into account multiple features and their interactions. - Neural Networks: Train a neural network to predict unit price using deep learning techniques, which can capture complex relationships in the data. - Feature Importance Analysis: Identify the most influential features affecting price prediction using techniques like feature importance scores from tree-based models. - Time Series Forecasting: Develop a time series forecasting model to predict future prices based on historical sales data. - These visualisation and modelling ideas can help you gain valuable insights from the sales data and create predictive models to optimise pricing strategies and improve sales performance.
A aggregate collection of Commercial Platforms sales across all platforms
tonyassi/clothing-sales-data dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1.Introduction
Sales data collection is a crucial aspect of any manufacturing industry as it provides valuable insights about the performance of products, customer behaviour, and market trends. By gathering and analysing this data, manufacturers can make informed decisions about product development, pricing, and marketing strategies in Internet of Things (IoT) business environments like the dairy supply chain.
One of the most important benefits of the sales data collection process is that it allows manufacturers to identify their most successful products and target their efforts towards those areas. For example, if a manufacturer could notice that a particular product is selling well in a certain region, this information could be utilised to develop new products, optimise the supply chain or improve existing ones to meet the changing needs of customers.
This dataset includes information about 7 of MEVGAL’s products [1]. According to the above information the data published will help researchers to understand the dynamics of the dairy market and its consumption patterns, which is creating the fertile ground for synergies between academia and industry and eventually help the industry in making informed decisions regarding product development, pricing and market strategies in the IoT playground. The use of this dataset could also aim to understand the impact of various external factors on the dairy market such as the economic, environmental, and technological factors. It could help in understanding the current state of the dairy industry and identifying potential opportunities for growth and development.
2. Citation
Please cite the following papers when using this dataset:
3. Dataset Modalities
The dataset includes data regarding the daily sales of a series of dairy product codes offered by MEVGAL. In particular, the dataset includes information gathered by the logistics division and agencies within the industrial infrastructures overseeing the production of each product code. The products included in this dataset represent the daily sales and logistics of a variety of yogurt-based stock. Each of the different files include the logistics for that product on a daily basis for three years, from 2020 to 2022.
3.1 Data Collection
The process of building this dataset involves several steps to ensure that the data is accurate, comprehensive and relevant.
The first step is to determine the specific data that is needed to support the business objectives of the industry, i.e., in this publication’s case the daily sales data.
Once the data requirements have been identified, the next step is to implement an effective sales data collection method. In MEVGAL’s case this is conducted through direct communication and reports generated each day by representatives & selling points.
It is also important for MEVGAL to ensure that the data collection process conducted is in an ethical and compliant manner, adhering to data privacy laws and regulation. The industry also has a data management plan in place to ensure that the data is securely stored and protected from unauthorised access.
The published dataset is consisted of 13 features providing information about the date and the number of products that have been sold. Finally, the dataset was anonymised in consideration to the privacy requirement of the data owner (MEVGAL).
File |
Period |
Number of Samples (days) |
product 1 2020.xlsx |
01/01/2020–31/12/2020 |
363 |
product 1 2021.xlsx |
01/01/2021–31/12/2021 |
364 |
product 1 2022.xlsx |
01/01/2022–31/12/2022 |
365 |
product 2 2020.xlsx |
01/01/2020–31/12/2020 |
363 |
product 2 2021.xlsx |
01/01/2021–31/12/2021 |
364 |
product 2 2022.xlsx |
01/01/2022–31/12/2022 |
365 |
product 3 2020.xlsx |
01/01/2020–31/12/2020 |
363 |
product 3 2021.xlsx |
01/01/2021–31/12/2021 |
364 |
product 3 2022.xlsx |
01/01/2022–31/12/2022 |
365 |
product 4 2020.xlsx |
01/01/2020–31/12/2020 |
363 |
product 4 2021.xlsx |
01/01/2021–31/12/2021 |
364 |
product 4 2022.xlsx |
01/01/2022–31/12/2022 |
364 |
product 5 2020.xlsx |
01/01/2020–31/12/2020 |
363 |
product 5 2021.xlsx |
01/01/2021–31/12/2021 |
364 |
product 5 2022.xlsx |
01/01/2022–31/12/2022 |
365 |
product 6 2020.xlsx |
01/01/2020–31/12/2020 |
362 |
product 6 2021.xlsx |
01/01/2021–31/12/2021 |
364 |
product 6 2022.xlsx |
01/01/2022–31/12/2022 |
365 |
product 7 2020.xlsx |
01/01/2020–31/12/2020 |
362 |
product 7 2021.xlsx |
01/01/2021–31/12/2021 |
364 |
product 7 2022.xlsx |
01/01/2022–31/12/2022 |
365 |
3.2 Dataset Overview
The following table enumerates and explains the features included across all of the included files.
Feature |
Description |
Unit |
Day |
day of the month |
- |
Month |
Month |
- |
Year |
Year |
- |
daily_unit_sales |
Daily sales - the amount of products, measured in units, that during that specific day were sold |
units |
previous_year_daily_unit_sales |
Previous Year’s sales - the amount of products, measured in units, that during that specific day were sold the previous year |
units |
percentage_difference_daily_unit_sales |
The percentage difference between the two above values |
% |
daily_unit_sales_kg |
The amount of products, measured in kilograms, that during that specific day were sold |
kg |
previous_year_daily_unit_sales_kg |
Previous Year’s sales - the amount of products, measured in kilograms, that during that specific day were sold, the previous year |
kg |
percentage_difference_daily_unit_sales_kg |
The percentage difference between the two above values |
kg |
daily_unit_returns_kg |
The percentage of the products that were shipped to selling points and were returned |
% |
previous_year_daily_unit_returns_kg |
The percentage of the products that were shipped to |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Sample Sales Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/kyanyoga/sample-sales-data on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Sample Sales Data, Order Info, Sales, Customer, Shipping, etc., Used for Segmentation, Customer Analytics, Clustering and More. Inspired for retail analytics. This was originally used for Pentaho DI Kettle, But I found the set could be useful for Sales Simulation training.
Originally Written by MarÃa Carina Roldán, Pentaho Community Member, BI consultant (Assert Solutions), Argentina. This work is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License. Modified by Gus Segura June 2014.
--- Original source retains full ownership of the source dataset ---
This table contains property sales information including sale date, price, and amounts for properties within Fairfax County. There is a one to many relationship to the parcel data. Refer to this document for descriptions of the data in the table.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Update Frequency: Yearly
Access to Residential, Condominium, Commercial, Apartment properties and vacant land sales history data.
To download XML and JSON files, click the CSV option below and click the down arrow next to the Download button in the upper right on its page.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Harry Kelleher
Released under MIT
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global Sales Data Fusion market is experiencing robust growth, driven by the increasing need for businesses to leverage disparate data sources for improved sales performance and strategic decision-making. The market's expansion is fueled by the rising adoption of cloud-based solutions, advancements in artificial intelligence (AI) and machine learning (ML) for data integration and analysis, and the growing demand for real-time sales insights. Key players like Thomson Reuters, AGT International, and LexisNexis are leading the charge, offering comprehensive platforms that consolidate data from CRM systems, marketing automation tools, and other relevant sources. This consolidation provides a holistic view of customer interactions, sales performance, and market trends, enabling businesses to optimize sales strategies, improve forecasting accuracy, and ultimately enhance revenue generation. The market is segmented by deployment (cloud, on-premise), by industry (BFSI, retail, healthcare, manufacturing), and by component (software, services). While data security and privacy concerns represent a potential restraint, the overall market outlook remains positive, indicating continued growth driven by technological advancements and the ever-increasing value placed on data-driven decision-making within organizations. The forecast period of 2025-2033 is expected to witness significant expansion, building upon a strong historical period (2019-2024). Assuming a conservative CAGR of 15% (a reasonable estimate given the growth drivers mentioned), we can expect substantial market expansion. This growth will be particularly evident in regions with high technological adoption rates and robust digital infrastructures. The competitive landscape is characterized by both established players and emerging technology companies, creating a dynamic and innovative ecosystem. Future growth will likely be shaped by advancements in big data analytics, improved data integration capabilities, and the increasing availability of sophisticated sales intelligence tools. The market will continue to attract investments as businesses recognize the critical role of effective sales data fusion in achieving a competitive advantage.
Autos include all passenger cars, including station wagons. The U.S. Bureau of Economic Analysis releases auto and truck sales data, which are used in the preparation of estimates of personal consumption expenditures.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The FCA has published the latest edition of its Product Sales Data (PSD) statistics. The FCA publishes the aggregated PSD received from firms operating in the mortgages, retail investments 1 January 2018 to 31 December 2023.
The FCA uses this data to assist it in regulating firms and to spot trends in the products sold in the UK market. It publishes this data so that consumers and market participants can see what firms are selling and understand the trends.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Total Vehicle Sales in the United States increased to 16.41 Million in July from 15.32 Million in June of 2025. This dataset provides the latest reported value for - United States Total Vehicle Sales - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Envestnet®| Yodlee®'s Ecommerce Sales Data (Aggregate/Row) Panels consist of de-identified, near-real time (T+1) USA credit/debit/ACH transaction level data – offering a wide view of the consumer activity ecosystem. The underlying data is sourced from end users leveraging the aggregation portion of the Envestnet®| Yodlee®'s financial technology platform.
Envestnet | Yodlee Consumer Panels (Aggregate/Row) include data relating to millions of transactions, including ticket size and merchant location. The dataset includes de-identified credit/debit card and bank transactions (such as a payroll deposit, account transfer, or mortgage payment). Our coverage offers insights into areas such as consumer, TMT, energy, REITs, internet, utilities, ecommerce, MBS, CMBS, equities, credit, commodities, FX, and corporate activity. We apply rigorous data science practices to deliver key KPIs daily that are focused, relevant, and ready to put into production.
We offer free trials. Our team is available to provide support for loading, validation, sample scripts, or other services you may need to generate insights from our data.
Investors, corporate researchers, and corporates can use our data to answer some key business questions such as: - How much are consumers spending with specific merchants/brands and how is that changing over time? - Is the share of consumer spend at a specific merchant increasing or decreasing? - How are consumers reacting to new products or services launched by merchants? - For loyal customers, how is the share of spend changing over time? - What is the company’s market share in a region for similar customers? - Is the company’s loyal user base increasing or decreasing? - Is the lifetime customer value increasing or decreasing?
Additional Use Cases: - Use spending data to analyze sales/revenue broadly (sector-wide) or granular (company-specific). Historically, our tracked consumer spend has correlated above 85% with company-reported data from thousands of firms. Users can sort and filter by many metrics and KPIs, such as sales and transaction growth rates and online or offline transactions, as well as view customer behavior within a geographic market at a state or city level. - Reveal cohort consumer behavior to decipher long-term behavioral consumer spending shifts. Measure market share, wallet share, loyalty, consumer lifetime value, retention, demographics, and more.) - Study the effects of inflation rates via such metrics as increased total spend, ticket size, and number of transactions. - Seek out alpha-generating signals or manage your business strategically with essential, aggregated transaction and spending data analytics.
Use Cases Categories (Our data provides an innumerable amount of use cases, and we look forward to working with new ones): 1. Market Research: Company Analysis, Company Valuation, Competitive Intelligence, Competitor Analysis, Competitor Analytics, Competitor Insights, Customer Data Enrichment, Customer Data Insights, Customer Data Intelligence, Demand Forecasting, Ecommerce Intelligence, Employee Pay Strategy, Employment Analytics, Job Income Analysis, Job Market Pricing, Marketing, Marketing Data Enrichment, Marketing Intelligence, Marketing Strategy, Payment History Analytics, Price Analysis, Pricing Analytics, Retail, Retail Analytics, Retail Intelligence, Retail POS Data Analysis, and Salary Benchmarking
Investment Research: Financial Services, Hedge Funds, Investing, Mergers & Acquisitions (M&A), Stock Picking, Venture Capital (VC)
Consumer Analysis: Consumer Data Enrichment, Consumer Intelligence
Market Data: AnalyticsB2C Data Enrichment, Bank Data Enrichment, Behavioral Analytics, Benchmarking, Customer Insights, Customer Intelligence, Data Enhancement, Data Enrichment, Data Intelligence, Data Modeling, Ecommerce Analysis, Ecommerce Data Enrichment, Economic Analysis, Financial Data Enrichment, Financial Intelligence, Local Economic Forecasting, Location-based Analytics, Market Analysis, Market Analytics, Market Intelligence, Market Potential Analysis, Market Research, Market Share Analysis, Sales, Sales Data Enrichment, Sales Enablement, Sales Insights, Sales Intelligence, Spending Analytics, Stock Market Predictions, and Trend Analysis
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Pharma sales data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/milanzdravkovic/pharma-sales-data on 28 January 2022.
--- Dataset description provided by original source is as follows ---
The dataset is built from the initial dataset consisted of 600000 transactional data collected in 6 years (period 2014-2019), indicating date and time of sale, pharmaceutical drug brand name and sold quantity, exported from Point-of-Sale system in the individual pharmacy. Selected group of drugs from the dataset (57 drugs) is classified to the following Anatomical Therapeutic Chemical (ATC) Classification System categories: - M01AB - Anti-inflammatory and antirheumatic products, non-steroids, Acetic acid derivatives and related substances - M01AE - Anti-inflammatory and antirheumatic products, non-steroids, Propionic acid derivatives - N02BA - Other analgesics and antipyretics, Salicylic acid and derivatives - N02BE/B - Other analgesics and antipyretics, Pyrazolones and Anilides - N05B - Psycholeptics drugs, Anxiolytic drugs - N05C - Psycholeptics drugs, Hypnotics and sedatives drugs - R03 - Drugs for obstructive airway diseases - R06 - Antihistamines for systemic use Sales data are resampled to the hourly, daily, weekly and monthly periods. Data is already pre-processed, where processing included outlier detection and treatment and missing data imputation.
--- Original source retains full ownership of the source dataset ---
DES is publishing Statewide Contract (Master Contract) spend as data becomes available. The spend is reported by vendors and is reported by contract and customer. Includes OMWBE, Vet and Small Business status as well.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Sample Sales Data is a retail sales dataset of 2,823 orders and 25 columns that includes a variety of sales-related data, including order numbers, product information, quantity, unit price, sales, order date, order status, customer and delivery information.
2) Data Utilization (1) Sample Sales Data has characteristics that: • This dataset consists of numerical (sales, quantity, unit price, etc.), categorical (product, country, city, customer name, transaction size, etc.), and date (order date) variables, with missing values in some columns (STATE, ADDRESSLINE2, POSTALCODE, etc.). (2) Sample Sales Data can be used to: • Analysis of sales trends and performance by product: Key variables such as order date, product line, and country can be used to visualize and analyze monthly and yearly sales trends, the proportion of sales by product line, and top sales by country and region. • Segmentation and marketing strategies: Segmentation of customer groups based on customer information, transaction size, and regional data, and use them to design targeted marketing and customized promotion strategies.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains every wholesale purchase of liquor in the State of Iowa by retailers for sale to individuals since January 1, 2012. The State of Iowa controls the wholesale distribution of liquor intended for retail sale, which means this dataset offers a complete view of retail liquor sales in the entire state. The dataset contains every wholesale order of liquor by all grocery stores, liquor stores, convenience stores, etc., with details about the store and location, the exact liquor brand and size, and the number of bottles ordered. In addition to being an excellent dataset for analyzing liquor sales, this is a large and clean public dataset of retail sales data. It can be used to explore problems like stockout prediction, retail demand forecasting, and other retail supply chain problems. The data dictionary is available from the State of Iowa's Alcoholic Beverages Division , within the Iowa Department of Commerce . There are some minor discrepancies in the data, discussed in the web view of the data . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Exploring Online Sales Data with Power BI !!
Another productive day diving into online sales dataset! Here’s a roundup of the insights I uncovered today:
Revenue by Category: Analyzed revenue distribution across different product categories to identify high-performing sectors.
Revenue by Sub-Category: Drilled down into sub-categories for a more granular view of revenue streams.
Revenue by Payment Mode: Examined revenue patterns based on payment methods to understand customer preferences.
Revenue by State: Mapped out revenue by state to pinpoint geographical strengths and opportunities.
Profit by Category: Evaluated profitability across product categories to assess which categories yield the highest profit margins.
Profit by Sub-Category: Explored profit levels at a sub-category level to identify the most profitable segments.
Profit by Payment Mode: Analyzed profit distribution across different payment methods.
Top 5 States by Revenue and Profit: Highlighted the top 5 states driving the most revenue and profit, offering insights into regional performance.
Sales Map by State: Visualized sales data on a map to provide a geographical perspective on sales distribution.
Total Quantity, Revenue, and Profit: Aggregated data to give an overview of total quantities sold, overall revenue, and total profit.
Filter by Category: Added a filter functionality to focus on specific categories and refine data analysis.
AbhayBhan/SalesData dataset hosted on Hugging Face and contributed by the HF Datasets community