Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Retail_Analysis_with_Walmart/main/Wallmart1.jpg" alt="">
One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.
Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.
The dataset is taken from Kaggle.
Facebook
TwitterThis dataset focuses on predicting weekly store sales at Walmart by examining holiday effects, temporal patterns, and other influential factors. The goal is to enable efficient stock planning, revenue calculations, and strategic decision-making by understanding patterns related to seasonal sales fluctuations. This machine learning model is developed based on resources from : https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting/overview/evaluation .
1. Test Data Contains 115,064 rows with information: Store, Department, Date, IsHoliday. "IsHoliday" indicates whether the week includes a special holiday. Holidays tend to show higher average sales than non-holiday periods.
2. Train Data Also contains 115,064 rows with Store, Department, Date, IsHoliday, Weekly Sales. Weekly sales are the recorded weekly sales for specific departments at certain stores.
3. Features Data Consists of 8,190 rows with variables such as Temperature, Fuel Price, CPI, Unemployment, Markdown 1-5, IsHoliday * Temperature: Average temperature (Fahrenheit) in a region. * Fuel Price: Can impact consumer spending and sales. * Markdowns 1-5: Promotional markdowns (missing values marked as NA). * CPI: Consumer Price Index (reflects inflation/deflation). * Unemployment: Unemployment rate in a region that affects consumer spending.
4.Store Data Includes details about Walmart stores such as store numbers, store types, and store sizes. Walmart has 45 stores categorized into 3 types: * Type A: Sizes from 39.690 to 219.622 * Type B: Sizes from 34.875 to 140.167 * Type C: Sizes from 39.690 to 42.988 The target variables for prediction are weekly sales, is holiday, and date. The other features are explored to identify patterns and generate insights to build accurate prediction models.
The goal is to predict the impact of holidays on weekly store sales. To achieve this, a Time Series modeling approach was applied using variables such as date, weekly sales, is holiday, lag features, rolling averages, and XGBoost. The evaluation metric used was Weighted Mean Absolute Error (WMAE), which emphasizes periods of higher significance, such as holidays.
Final Model Metrics: * Weighted Mean Absolute Error = 211 * Error rate relative to average weekly sales = ~1.32%.
The low error percentage highlights the model's accuracy in forecasting weekly sales and assessing seasonal fluctuations.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains detailed information about 500 Walmart store records, providing insights into various store characteristics. The data is focused on store-related attributes, which are crucial for analyzing store performance and customer demographics.
Columns in Store Data Sheet:
Discount_Rate: The discount rate offered by the store (in percentage). Customer_Age: The average age of customers visiting the store. Store_Size: The size of the store (in square feet). Inventory_Level: The level of inventory available in the store. Number_of_Employees: The number of employees working in the store. Marketing_Spend: The amount spent on marketing (in USD). Family: Indicates if the store targets family customers (Yes/No). Kids: Indicates if the store targets customers with kids (Yes/No). Weekend: Indicates if the data was collected on a weekend (Yes/No). Holiday: Indicates if the data was collected during a holiday (Yes/No). Foot_Traffic: The number of customers visiting the store. Average_Transaction_Value: The average value of transactions (in USD). Online_Sales: The online sales generated by the store (in USD). Purpose: The dataset is designed to help analyze various factors that influence store performance at Walmart. It can be used for statistical analysis, machine learning, and business strategy development to optimize store operations and marketing efforts.
Usage: Researchers, analysts, and business strategists can leverage this dataset to:
Identify patterns and trends in store performance. Develop predictive models for understanding customer demographics. Evaluate the impact of different variables on store operations. Optimize inventory and staffing levels based on customer behavior.
Facebook
Twitterhttps://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15666745%2F714ec9ae87e0180165c4b629a7e83de2%2F1693432306522.jpg?generation=1715894931524755&alt=media" alt="">
stores.csv
This file contains anonymized information about the 45 stores, indicating the type and size of store.
train.csv
This is the historical training data, which covers to 2010-02-05 to 2012-11-01. Within this file you will find the following fields:
test.csv This file is identical to train.csv, except we have withheld the weekly sales. You must predict the sales for each triplet of store, department, and date in this file.
features.csv This file contains additional data related to the store, department, and regional activity for the given dates. It contains the following fields:
For convenience, the four holidays fall within the following weeks in the dataset (not all holidays are in the data):
Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13 Labor Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13 Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13 Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
π About the Dataset This dataset contains 50,000 customer transactions from Walmart, capturing essential details about consumer shopping behavior. It includes demographic information, product categories, purchase amounts, discounts, and ratings, making it useful for data analysis, customer segmentation, and sales forecasting.
π Dataset Features Customer_ID β Unique identifier for each customer. Age β Age of the customer. Gender β Gender of the customer (Male/Female/Other). City β City where the purchase was made. Category β Product category (e.g., Electronics, Clothing, Groceries). Product_Name β Name of the purchased product. Purchase_Date β Date of purchase. Purchase_Amount β Total amount spent on the purchase. Payment_Method β Mode of payment (Credit Card, Cash, Digital Wallet, etc.). Discount_Applied β Whether a discount was applied (Yes/No). Rating β Customer rating of the purchase (1-5). Repeat_Customer β Whether the customer has purchased before (Yes/No). π Potential Use Cases β Customer Segmentation β Grouping customers based on age, gender, and purchase patterns. β Market Basket Analysis β Identifying frequently purchased products together. β Sales Forecasting β Predicting future sales trends using time-series analysis. β Customer Loyalty Analysis β Understanding repeat customer behavior. β Discount Impact Analysis β Evaluating how discounts influence purchasing decisions. β Product Performance Evaluation β Analyzing ratings and sales of different products.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Walmart, the worldβs largest retailer, is facing operational challenges in managing inventory efficiently across its thousands of stores worldwide. With fluctuating customer demand, promotions, and external factors like weather impacting sales, Walmart's operations team is under pressure to ensure optimal stock levels without overspending on excess inventory or losing sales due to stockouts.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Subhadip Hensh
Released under Apache 2.0
Facebook
TwitterConventional retail stores still play a prominent role in a world dominated by Ecommerce. Retail is the process of selling consumer goods or services to customers through multiple channels of distribution to earn a profit. From groceries to clothing to electronics, customers keep flooding the gates of retail stores to satisfy their needs. As time has passed, retailers have had to evolve in order to keep up with changes in demands and the ever-changing mindset of customers. One such retail industry juggernaut that has kept up with the demands of customers as well changed the face of the retail industry for the better is Walmart Inc.
Walmart Inc is an American multinational retail corporation that operates a chain of hypermarkets, discount department stores, and grocery stores, headquartered in Bentonville, Arkansas. They have many stores across the globe and it is the largest retail company by revenue.
We have historical sales data for 45 Walmart stores located in different regions. Each store contains a number of departments. Apart from these, weekly data of Fuel price, Holiday, Temperature with some other features are also present in the data set.
In addition, Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks.
Data consists of 421570 records of weekly sales from stores spanning between β05-Feb-2010β to β26-Oct-2012β. This comprises of 143 Weeks of sales data.
Total 16 numbers of attributes are provided in the Data set including Target variable. Attribute definition is: Store: The store number Size: Size of the Store Dept: Department of the Store Date: Specifying the Week (Friday of every Week) Temperature: Average temperature in the region (in β) FuelPrice: Cost of fuel in the region MarkDown1-5: Anonymized data related to promotional markdowns that Walmart is running. Markdown data is only available after November 2011, and is not available for all stores all the time. Any missing value is marked with Null. CPI: Consumer price index Unemployment: Unemployment rate IsHoliday: Whether the week is a special holiday week
Using the above given features, we have to predict the weekly sales of the store with given parameters.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset comprises transactional information from previous 5 years from Walmart retail stores, with diverse details such as customer demographics, order specifics, product attributes, and sales logistics. It includes data on the city where purchases were made, customer age, names, and segments, along with any applied discounts and the quantity of products ordered. Each transaction is uniquely identified by an order ID, accompanied by order date, priority, and shipping details like mode, cost, and dates. Product-related information encompasses base margins, categories, containers, names, and sub-categories, enabling insights into profitability, sales, and regional performance. The dataset also provides granular details such as profit margins, unit prices, and ZIP codes, facilitating analysis at multiple levels like customer behavior, product performance, and operational efficiencies within Walmart's retail ecosystem.
The columns in dataset are:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Adarsh Siddi
Released under CC0: Public Domain
Facebook
TwitterThis dataset was created by Raman Singh
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Pritam Chakraborty
Released under Apache 2.0
Facebook
Twitterhttps://www.couponbirds.com/us/terms-of-usehttps://www.couponbirds.com/us/terms-of-use
Weekly statistics showing how many Walmart Canada coupon codes were verified by the CouponBirds team. This dataset reflects real-time coupon validation activity to ensure coupon accuracy and reliability.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
Walmart Inc. is a global US retail group that dominates a large part of the US market. Walmart consistently ranks among the top companies in the Fortune Global 500 list of the highest-revenue businesses worldwide.
As of March 2025, Walmart has a market cap of $679.5 Billion USD. This makes Walmart one of the most valuable companies in the world by market capitalization. The market capitalization, commonly called market cap, is the total market value of a publicly traded company's outstanding shares and is commonly used to measure how much a company is worth.
Content
Geography: USA
Time period: January 2000 β March 2025
Unit of analysis: Walmart Stock Data 2025
| Variable | Description |
|---|---|
| date | Date |
| open | The price at market open. |
| high | The highest price for that day. |
| low | The lowest price for that day. |
| close | The price at market close, adjusted for splits. |
| adj_close | The closing price after adjustments for all applicable splits and dividend distributions. Data is adjusted using appropriate split and dividend multipliers, adhering to Center for Research in Security Prices (CRSP) standards. |
| volume | The number of shares traded on that day. |
Acknowledgements
This dataset belongs to me. Iβm sharing it here for free. You may use it as you wish.
Facebook
TwitterThis dataset was created by Tanuja sreekanth
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Shriya Singh
Released under Apache 2.0
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Mohammed Hamza Khalifa
Released under Apache 2.0
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is just a simple dataset containing the locations for 4,639 Walmart's in the United States.
Facebook
TwitterThis dataset was created by Abdullah Al Sufi
Facebook
TwitterThis dataset was created by Vignesh Murali
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Retail_Analysis_with_Walmart/main/Wallmart1.jpg" alt="">
One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.
Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.
The dataset is taken from Kaggle.