Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
TwitterThis dataset contains retail sales records from a superstore, including detailed information on orders, products, categories, sales, discounts, profits, customers, and regions.
It is widely used for business intelligence, data visualization, and machine learning projects. With features such as order date, ship mode, customer segment, and geographic region, the dataset is excellent for:
Sales forecasting
Profitability analysis
Market basket analysis
Customer segmentation
Data visualization practice (Tableau, Power BI, Excel, Python, R)
Inspiration:
Great dataset for learning how to build dashboards.
Commonly used in case studies for predictive analytics and decision-making.
Source: Originally inspired by a sample dataset frequently used in Tableau training and BI case studies.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This Repo contains the Customer purchase history of the Retail store chain for the United Kingdom. The dataset records every product sold, along with the invoice number, stock code, description, quantity, invoice date, price, customer ID, and country. It is ideal for projects related to:
π Market Basket Analysis
π― Recommendation Systems
π° Customer Lifetime Value & RFM Segmentation
π§ Sales Forecasting and Time-Series Modelling
We have the 3 formats of that dataset.
1) online_retail_II.xlsx:
The main Excel file contains two sheets, with each sheet containing all transactions that occurred in the span of 12 months (one year).
This is the raw and original format of the dataset.
This is ideal for anyone looking to perform Exploratory Data Analysis(EDA), Market Basket Analysis, Recommendation Systems, RFM Segmentation, Customer Lifetime Value (CLV) modeling, and various Machine Learning or Business Intelligence projects.
2) raw_data.parquet:
Raw complete dataset in parquet format, which is the ideal format to load a large Tabular dataset in an efficient format.
3) data_processed.parquet:
A completely cleaned dataset with Data cleaning + Feature Engineering.
All missing values have been imputed, column data cleaned, duplicates removed, cleared description, invalid transactions removed, etc.
A Notebook with all preprocessing and EDA is also provided in the Code section.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is a cleaned and merged version of the original UCI Online Retail and Online Retail II datasets. It contains transaction data from a UK-based online retailer, covering a period from December 2009 to December 2011.
The original UCI Online Retail II dataset contains two separate sheets: - Year 2009β2010 - Year 2010β2011
These have been merged with the original UCI Online Retail dataset to create a unified and continuous dataset.
quantitypricecustomer_idtotal_price column (quantity Γ price)is_cancelled column based on invoice format or return flaginvoicedate formatting| Column | Description |
|---|---|
invoice | Invoice number (returns start with 'C') |
stockcode | Product code |
description | Description of product |
quantity | Number of items purchased |
invoicedate | Date and time of invoice |
price | Unit price in GBP |
customer_id | Unique identifier for each customer |
country | Customerβs country |
is_cancelled | Boolean flag for cancelled transactions |
total_price | Computed total (quantity Γ price) for each line item |
| File | Type | Description |
|---|---|---|
online_retail_cleaned.csv | Data | Cleaned and merged retail transactions from 2009β2011 |
rfm_final_score.csv | Output | Final RFM scores for each customer with segment labels |
Retail_Data_Analysis_Dashboard.xlsx | Excel | Interactive Excel dashboard with KPIs, CLV, monthly trends |
Retail_Data_Analysis_Dashboard.png | Image | Visual preview of the Excel dashboard |
RFM_Segmentation.sql | SQL | SQL logic to calculate RFM scores and assign segments |
Cohort_Analysis_on_Customer.sql | SQL | Cohort analysis based on acquisition month |
Cohort_Analysis_on_Revenue.sql | SQL | Cohort revenue tracking over time |
In addition to the cleaned dataset, this dataset includes complete analysis artifacts:
These files are provided in .xlsx and .sql formats and can be used for further business analysis or modeling.
Original datasets: - UCI Online Retail II: https://archive.ics.uci.edu/ml/datasets/Online+Retail+II
This version was cleaned and merged by: Md Shah Nawaj
retail, ecommerce, customer segmentation, transactions, time series, data cleaning, rfm, python, pandas, online retail
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...