Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset contains sales transactions captured at various retail stores across the United States during the Black Friday shopping event. It includes a comprehensive set of features that provide insights into customer demographics, product categories, and sales patterns. The dataset is designed to help retailers and e-commerce businesses optimize their sales strategies and maximize profits during this critical shopping period.
The dataset consists of approximately 550,000 records, providing a robust and representative sample of Black Friday sales data.
The primary goal of this dataset is to help retailers and e-commerce businesses predict sales and optimize their pricing strategies to maximize profits during the Black Friday shopping event. This dataset can be used to develop machine learning models that can accurately forecast sales and identify trends in customer behavior.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The goal of this dataset is to perform a data analysis project to investigate customer purchase behaviour during Black Friday at Walmart, specifically focusing on understanding if there are differences in spending habits between male and female customers. The analysis aims to provide insights to assist Walmart's management team in making informed business decisions.
Facebook
TwitterComprehensive dataset tracking U.S. online Black Friday sales from 2020-2024 with projections through 2026, including year-over-year growth rates, mobile commerce share, BNPL transaction volumes, and compound annual growth rate analysis. Data sourced from Adobe Analytics, National Retail Federation, and industry research.
Facebook
TwitterThe Black Friday Sales dataset is a comprehensive collection of sales transaction data from a major retail store during a Black Friday event. This dataset includes over 550,000 observations and 12 key variables, offering valuable insights into customer purchasing behavior during one of the biggest shopping days of the year.
-> Key Features: - User ID: Unique ID for each customer. - Product ID: Unique ID for each product. - Gender: Gender of the customer, either male or female. - Age: The age group of the customer, represented in categories (e.g., 18-25, 26-35, etc.). - Occupation: Occupation category code of the customer. - City_Category: The category of the city where the customer resides, classified as A, B, or C. - Stay_In_Current_City_Years: Number of years the customer has lived in the current city. - Marital_Status: Indicates whether the customer is married (1) or not (0). - Product_Category 1, 2, 3: Product categories associated with the purchased item. - Purchase: The amount spent by the customer on the product.
This dataset can be utilized for analyzing patterns in consumer behavior, demographic-based purchasing tendencies, and predicting future sales trends. It's widely used in data science projects for regression, classification, and recommendation systems, making it ideal for feature engineering, model building, and data visualization.
Facebook
TwitterComprehensive dataset tracking mobile device share of Black Friday ecommerce sales from 2020 to 2024, including conversion rates, traffic percentages, year-over-year growth, and demographic breakdowns by generation. Data sourced from Adobe Analytics, Salesforce, and Digital Commerce 360.
Facebook
TwitterComprehensive dataset tracking Black Friday online order volumes, revenue, mobile vs desktop sales share, product category performance, and year-over-year growth metrics from 2020-2024, compiled from Adobe Analytics, Salesforce Commerce Cloud, and National Retail Federation sources covering over 1 trillion retail site visits.
Facebook
TwitterComprehensive dataset analyzing Black Friday discount percentages across major retailers and product categories from 2019 to 2024, including retailer-specific averages, category breakdowns, and year-over-year trends based on verified studies from Adobe Analytics, WalletHub, and ITMAGINATION.
Facebook
TwitterThis Black Friday dataset offers a deep dive into the world of consumer shopping habits during the biggest retail event of the year. It contains thousands of records capturing customer demographics, product preferences, and purchase behavior, all designed to help data enthusiasts explore real-world patterns. Whether you're building a machine learning model, analyzing customer segments, or visualizing trends in spending, this dataset provides a rich and versatile playground. Perfect for regression, classification, and recommendation system projects, it simulates the high-stakes world of retail with clean, structured data that's ready to explore.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset History
A retail company “ABC Private Limited” wants to understand the customer purchase behaviour (specifically, purchase amount) against various products of different categories. They have shared purchase summaries of various customers for selected high-volume products from last month. The data set also contains customer demographics (age, gender, marital status, city type, stay in the current city), product details (productid and product category) and Total purchase amount from last month.
Now, they want to build a model to predict the purchase amount of customers against various products which will help them to create a personalized offer for customers against different products.
Tasks to perform
The purchase column is the Target Variable, perform Univariate Analysis and Bivariate Analysis w.r.t the Purchase.
Masked in the column description means already converted from categorical value to numerical column.
Below mentioned points are just given to get you started with the dataset, not mandatory to follow the same sequence.
DATA PREPROCESSING
Check the basic statistics of the dataset
Check for missing values in the data
Check for unique values in data
Perform EDA
Purchase Distribution
Check for outliers
Analysis by Gender, Marital Status, occupation, occupation vs purchase, purchase by city, purchase by age group, etc
Drop unnecessary fields
Convert categorical data into integer using map function (e.g 'Gender' column)
Missing value treatment
Rename columns
Fill nan values
map range variables into integers (e.g 'Age' column)
Data Visualisation
All the Best!!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Celebrate Black Friday in July with BudgetPetCare and enjoy 25% OFF on all pet supplies plus Free Shipping – no minimum required! 🐾 Whether your pet needs flea and tick treatments, heartworm preventives, or daily essentials, this is your chance to stock up at unbeatable prices.Trusted Pet BrandsNo Prescription RequiredFlat 25% DiscountFree Shipping Across the USAHurry! Offers valid for a limited time only.Shop now at budgetpetcare.com
Facebook
TwitterA retail company “ABC Private Limited” wants to understand the customer purchase behaviour (specifically, purchase amount) against various products of different categories. They have shared purchase summary of various customers for selected high volume products from last month. The data set also contains customer demographics (age, gender, marital status, city_type, stay_in_current_city), product details (product_id and product category) and Total purchase_amount from last month.
Now, they want to build a model to predict the purchase amount of customer against various products which will help them to create personalized offer for customers against different products.****
Facebook
TwitterSynthetic E-Commerce Sales Dataset (2025) Realistic, clean, and ready-to-use synthetic dataset for machine learning, forecasting, and data analysis. Overview
This dataset contains 100,000 simulated e-commerce transactions generated with Python’s Faker and NumPy libraries. It replicates realistic global online shopping behavior between 2023 and 2025, including product categories, customer feedback, payment preferences, and delivery times.
The dataset is fully synthetic — no real user data, privacy-friendly, and designed for AI, analytics, and visualization projects.
Dataset Highlights
Global coverage: Sales from six regions (Europe, Asia, North America, etc.)
Diverse payment methods: CreditCard, PayPal, BankTransfer, Cash
Product variety: 7 major categories such as Electronics, Fashion, and Home
Seasonal patterns: November sales spike (Black Friday effect)
Realistic return rates: Fashion products have a higher return ratio
Date range: January 2023 – December 2025
Suitable for: Regression, classification, feature engineering, and forecasting
| Column | Description | Example |
|---|---|---|
order_id | Unique ID for each order | 82374 |
customer_id | Random UUID per customer | e8b0-45dc-... |
product_category | Product type | Electronics |
product_price | Price per unit (€) | 249.99 |
quantity | Quantity ordered | 3 |
order_date | Order date (2023–2025) | 2024-11-25 |
region | Sales region | Europe |
payment_method | Payment type | CreditCard |
delivery_days | Days until delivery | 4 |
is_returned | Whether the product was returned (0/1) | 0 |
customer_rating | Customer satisfaction (1–5) | 4.3 |
discount_percent | Discount rate (%) | 10 |
revenue | Final revenue = price × quantity × (1 - discount/100) | 674.9 |
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a dataset about Christmas sales and trends. It contains features, such as PaymentType, TotalPrice, Events, Weather, PromotionApplied, ProductNames, Category, etc.
Date (Date of the transaction, format: YYYY-MM-DD) Time (Time of the transaction, format: HH:MM:SS) CustomerID (Unique identifier for each customer) Age (Age of the customer) Gender (Gender of the customer: Male, Female, Other) Location (City or town where the purchase was made) StoreID (Unique identifier for the store, if applicable) OnlineOrderFlag (Boolean: True if online, False if in-store) ProductID (Unique identifier for the product) ProductName (Name of the product) Category (Category of the product, e.g., Electronics, Clothing, Toys, Food, Decorations) Quantity (Number of items purchased in the transaction) UnitPrice (Price per unit of the product) TotalPrice (Total price for the product, calculated as Quantity * UnitPrice) PaymentType (Type of payment, e.g., Credit Card, Debit Card, Cash, Online Payment) PromotionApplied (Boolean: True if any promotion was applied, False otherwise) DiscountAmount (The amount of discount, if any) GiftWrap (Boolean: True if the product was gift-wrapped, False otherwise) ShippingMethod (Method of shipping, e.g., Standard, Express, Overnight, if online) DeliveryTime (Number of days taken for delivery, if online) Weather (General weather condition on the day of purchase, e.g., Snowy, Rainy, Sunny) Event (Special events on the purchase day, e.g., Christmas Market, Black Friday) CustomerSatisfaction (Customer satisfaction rating, on a scale of 1-5) ReturnFlag (Boolean: True if the product was returned, False otherwise)
Acknowledgments: The dataset was made available for the Onyx Data Challenge for December 2023.
Facebook
Twitterhttps://qtxasset.com/cdn-cgi/image/w=850,h=478,f=auto,fit=crop,g=0.5x0.5/https://qtxasset.com/quartz/qcloud5/media/image/fiercehealthcare/1570117826/shutterstock_1150637408.jpg?VersionId=eQO_ILyCwnuh4UhRlRtpBc_hEkQh3ueJ" alt="">
There are many seasons that sales are significantly higher or lower than averages. If the company does not know about these seasons, it can lose too much money. Predicting future sales is one of the most crucial plans for a company. Sales forecasting gives an idea to the company for arranging stocks, calculating revenue, and deciding to make a new investment. Another advantage of knowing future sales is that achieving predetermined targets from the beginning of the seasons can have a positive effect on stock prices and investors' perceptions. Also, not reaching the projected target could significantly damage stock prices, conversely. And, it will be a big problem especially for Walmart as a big company.
My aim in this project is to build a model which predicts sales of the stores. With this model, Walmart authorities can decide their future plans which is very important for arranging stocks, calculating revenue and deciding to make new investment or not.
With the accurate prediction company can;
Understanding, Cleaning and Exploring Data
Preparing Data to Modeling
Random Forest Regressor
ARIMA/ExponentialSmooting/ARCH Models
The metric of the competition is weighted mean absolute error (WMAE). Weight of the error changes when it is holiday.
Understanding, Cleaning and Exploring Data: The first challange of this data is that there are too much seasonal effects on sales. Some departments have higher sales in some seasons but on average the best departments are different. To analyze these effects, data divided weeks of the year and also holiday dates categorized.
Preparing Data to Modeling: Boolean and string features encoded and whole columns encoded.
Random Forest Regressor: Feature selection was done according to feature importance and as a best result 1801 error found.
ARIMA/ExponentialSmooting/ARCH Models: Second challange in this data is that it is not stationary. To make data more stationary taking difference,log and shift techniques applied. The least error was found with ExponentialSmooting as 821.
More detailed finding can be found in notebooks with explorations.
Data will be made more stationary with different techniques.
More detailed feature engineering and feature selection will be done.
More data can be found to observe holiday effects on sales and different holidays will be added like Easter, Halloween and Come Back to School times.
Markdown effects on model will be improved according to department sales.
Different models can be build for special stores or departments.
Market basket analysis can be done to find higher demand items of departments.
Facebook
TwitterA retail company “ABC Private Limited” wants to understand the customer purchase behaviour (specifically, purchase amount) against various products of different categories. They have shared purchase summary of various customers for selected high volume products from last month.The data set also contains customer demographics (age, gender, marital status, city_type, stay_in_current_city), product details (product_id and product category) and Total purchase_amount from last month.
Now, they want to build a model to predict the purchase amount of customer against various products which will help them to create personalized offer for customers against different products.s.
Variable-> Definition User_ID-> User ID Product_ID-> Product ID Gender-> Sex of User Age-> Age in bins Occupation-> Occupation (Masked) City_Category-> Category of the City (A,B,C) Stay_In_Current_City_Years-> Number of years stay in current city Marital_Status-> Marital Status Product_Category_1-> Product Category (Masked) Product_Category_2-> Product may belongs to other category also (Masked) Product_Category_3-> Product may belongs to other category also (Masked) Purchase-> Purchase Amount (Target Variable)
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset contains sales transactions captured at various retail stores across the United States during the Black Friday shopping event. It includes a comprehensive set of features that provide insights into customer demographics, product categories, and sales patterns. The dataset is designed to help retailers and e-commerce businesses optimize their sales strategies and maximize profits during this critical shopping period.
The dataset consists of approximately 550,000 records, providing a robust and representative sample of Black Friday sales data.
The primary goal of this dataset is to help retailers and e-commerce businesses predict sales and optimize their pricing strategies to maximize profits during the Black Friday shopping event. This dataset can be used to develop machine learning models that can accurately forecast sales and identify trends in customer behavior.