https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Pratyusha Kar
Released under CC0: Public Domain
This dataset was created by Umesh Narkhede
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Chase Carlson
Released under MIT
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Retail_Analysis_with_Walmart/main/Wallmart1.jpg" alt="">
One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.
Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.
The dataset is taken from Kaggle.
This dataset was created by DoVanThang
This dataset was created by Ali Khalid 1
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Walmart Dataset (Retail)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rutuspatel/walmart-dataset-retail on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Dataset Description :
This is the historical data that covers sales from 2010-02-05 to 2012-11-01, in the file Walmart_Store_sales. Within this file you will find the following fields:
Store - the store number
Date - the week of sales
Weekly_Sales - sales for the given store
Holiday_Flag - whether the week is a special holiday week 1 – Holiday week 0 – Non-holiday week
Temperature - Temperature on the day of sale
Fuel_Price - Cost of fuel in the region
CPI – Prevailing consumer price index
Unemployment - Prevailing unemployment rate
Holiday Events Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13 Labour Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13 Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13 Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13
Analysis Tasks
Basic Statistics tasks
1) Which store has maximum sales
2) Which store has maximum standard deviation i.e., the sales vary a lot. Also, find out the coefficient of mean to standard deviation
3) Which store/s has good quarterly growth rate in Q3’2012
4) Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together
5) Provide a monthly and semester view of sales in units and give insights
Statistical Model
For Store 1 – Build prediction models to forecast demand
Linear Regression – Utilize variables like date and restructure dates as 1 for 5 Feb 2010 (starting from the earliest date in order). Hypothesize if CPI, unemployment, and fuel price have any impact on sales.
Change dates into days by creating new variable.
Select the model which gives best accuracy.
--- Original source retains full ownership of the source dataset ---
This dataset was created by Shrijeet16
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
BigMart Sales Prediction Challenge
BigMart, a leading retail chain, aims to enhance its sales strategy by analyzing historical sales data. The goal is to develop a predictive model that estimates the sales of various products across different outlets, helping BigMart understand the key factors influencing sales performance.
BigMart has gathered sales data from 2013 for 1,559 products sold across 10 stores in different cities. Along with sales figures, various product and store attributes have been recorded. The objective is to build a machine learning model that can accurately forecast the sales of products at specific outlets.
By leveraging this predictive model, BigMart can gain insights into product and store characteristics that drive sales growth, enabling better business decisions.
The dataset may contain missing values due to unreported data from certain stores, requiring appropriate data preprocessing techniques.
Includes both input features and the target variable (Item_Outlet_Sales
).
Product Features
Item_Identifier
: Unique product ID Item_Weight
: Weight of the product Item_Fat_Content
: Fat level (low-fat or regular) Item_Visibility
: Percentage of display area allocated to the product Item_Type
: Category of the product Item_MRP
: Maximum Retail Price Store Features
Outlet_Identifier
: Unique store ID Outlet_Establishment_Year
: Year the store was established Outlet_Size
: Store size (small, medium, large) Outlet_Location_Type
: City tier classification Outlet_Type
: Type of outlet (grocery store, supermarket, etc.) Target Variable
Item_Outlet_Sales
: Sales of the product at a particular store (to be predicted) Contains the same features as the train dataset except for Item_Outlet_Sales
, which needs to be predicted.
Your model should generate a CSV file with the following columns:
- Item_Identifier
: Unique product ID
- Outlet_Identifier
: Unique store ID
- Item_Outlet_Sales
: Predicted sales value
For more details, visit: Analytics Vidhya BigMart Sales III
One of the largest retail chains in the world wants to use their vast data source to build an efficient forecasting model to predict the sales for each SKU in its portfolio at its 76 different stores using historical sales data for the past 3 years on a week-on-week basis. Sales and promotional information is also available for each week - product and store wise.
However, no other information regarding stores and products are available. Can you still forecast accurately the sales values for every such product/SKU-store combination for the next 12 weeks accurately? If yes, then dive right in!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview: This dataset was collected and curated to support research on predicting real estate prices using machine learning algorithms, specifically Support Vector Regression (SVR) and Gradient Boosting Machine (GBM). The dataset includes comprehensive information on residential properties, enabling the development and evaluation of predictive models for accurate and transparent real estate appraisals.Data Source: The data was sourced from Department of Lands and Survey real estate listings.Features: The dataset contains the following key attributes for each property:Area (in square meters): The total living area of the property.Floor Number: The floor on which the property is located.Location: Geographic coordinates or city/region where the property is situated.Type of Apartment: The classification of the property, such as studio, one-bedroom, two-bedroom, etc.Number of Bathrooms: The total number of bathrooms in the property.Number of Bedrooms: The total number of bedrooms in the property.Property Age (in years): The number of years since the property was constructed.Property Condition: A categorical variable indicating the condition of the property (e.g., new, good, fair, needs renovation).Proximity to Amenities: The distance to nearby amenities such as schools, hospitals, shopping centers, and public transportation.Market Price (target variable): The actual sale price or listed price of the property.Data Preprocessing:Normalization: Numeric features such as area and proximity to amenities were normalized to ensure consistency and improve model performance.Categorical Encoding: Categorical features like property condition and type of apartment were encoded using one-hot encoding or label encoding, depending on the specific model requirements.Missing Values: Missing data points were handled using appropriate imputation techniques or by excluding records with significant missing information.Usage: This dataset was utilized to train and test machine learning models, aiming to predict the market price of residential properties based on the provided attributes. The models developed using this dataset demonstrated improved accuracy and transparency over traditional appraisal methods.Dataset Availability: The dataset is available for public use under the [CC BY 4.0]. Users are encouraged to cite the related publication when using the data in their research or applications.Citation: If you use this dataset in your research, please cite the following publication:[Real Estate Decision-Making: Precision in Price Prediction through Advanced Machine Learning Algorithms].
This dataset was created by Furqan Javed
Released under Data files © Original Authors
This dataset was created by Adarsh Jha
The data scientists at BigMart have collected 2013 sales data for 1559 products across 10 stores in different cities. Also, certain attributes of each product and store have been defined. The aim is to build a predictive model and predict the sales of each product at a particular outlet.
Using this model, BigMart will try to understand the properties of products and outlets which play a key role in increasing sales.
Please note that the data may have missing values as some stores might not report all the data due to technical glitches. Hence, it will be required to treat them accordingly.
Source: (2020). Retrieved 30 June 2020, from https://datahack.analyticsvidhya.com/contest/big-mart-sales-prediction/#ProblemStatement
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset includes all wholesale liquor purchases by Iowa retailers since January 1, 2012. As Iowa controls liquor distribution, it provides a complete view of retail liquor sales statewide. It details orders from grocery stores, liquor stores, and convenience stores, including store locations, liquor brands, sizes, and quantities.
There are 30,000 records spread over 24 attributes in this dataset.
The 24 attributes are:
Beyond liquor sales analysis, this clean public dataset is valuable for exploring stockout prediction, retail demand forecasting, and supply chain challenges.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Pratyusha Kar
Released under CC0: Public Domain