4 datasets found
  1. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  2. Store Sales Dataset

    • kaggle.com
    zip
    Updated Sep 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nimisha Davis (2025). Store Sales Dataset [Dataset]. https://www.kaggle.com/datasets/drnimishadavis/store-sales-dataset
    Explore at:
    zip(562846 bytes)Available download formats
    Dataset updated
    Sep 22, 2025
    Authors
    Nimisha Davis
    Description

    This dataset contains retail sales records from a superstore, including detailed information on orders, products, categories, sales, discounts, profits, customers, and regions.

    It is widely used for business intelligence, data visualization, and machine learning projects. With features such as order date, ship mode, customer segment, and geographic region, the dataset is excellent for:

    Sales forecasting

    Profitability analysis

    Market basket analysis

    Customer segmentation

    Data visualization practice (Tableau, Power BI, Excel, Python, R)

    Inspiration:

    Great dataset for learning how to build dashboards.

    Commonly used in case studies for predictive analytics and decision-making.

    Source: Originally inspired by a sample dataset frequently used in Tableau training and BI case studies.

  3. Retail purchase history

    • kaggle.com
    zip
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koelin (2025). Retail purchase history [Dataset]. https://www.kaggle.com/datasets/koelin/retail-purchase-history
    Explore at:
    zip(62508711 bytes)Available download formats
    Dataset updated
    Oct 29, 2025
    Authors
    Koelin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This Repo contains the Customer purchase history of the Retail store chain for the United Kingdom. The dataset records every product sold, along with the invoice number, stock code, description, quantity, invoice date, price, customer ID, and country. It is ideal for projects related to:

    • πŸ›’ Market Basket Analysis

    • 🎯 Recommendation Systems

    • πŸ’° Customer Lifetime Value & RFM Segmentation

    • 🧠 Sales Forecasting and Time-Series Modelling

    We have the 3 formats of that dataset.

    1) online_retail_II.xlsx: The main Excel file contains two sheets, with each sheet containing all transactions that occurred in the span of 12 months (one year). This is the raw and original format of the dataset. This is ideal for anyone looking to perform Exploratory Data Analysis(EDA), Market Basket Analysis, Recommendation Systems, RFM Segmentation, Customer Lifetime Value (CLV) modeling, and various Machine Learning or Business Intelligence projects.

    2) raw_data.parquet: Raw complete dataset in parquet format, which is the ideal format to load a large Tabular dataset in an efficient format.

    3) data_processed.parquet: A completely cleaned dataset with Data cleaning + Feature Engineering. All missing values have been imputed, column data cleaned, duplicates removed, cleared description, invalid transactions removed, etc.

    A Notebook with all preprocessing and EDA is also provided in the Code section.

  4. Online_Retail_II

    • kaggle.com
    zip
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shah Nawaj (2025). Online_Retail_II [Dataset]. https://www.kaggle.com/datasets/shahnawaj9/online-retail
    Explore at:
    zip(71343848 bytes)Available download formats
    Dataset updated
    Jul 2, 2025
    Authors
    Shah Nawaj
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Cleaned & Merged UCI Online Retail Dataset (Dec 2009 – Dec 2011)

    This dataset is a cleaned and merged version of the original UCI Online Retail and Online Retail II datasets. It contains transaction data from a UK-based online retailer, covering a period from December 2009 to December 2011.

    Description

    The original UCI Online Retail II dataset contains two separate sheets: - Year 2009–2010 - Year 2010–2011

    These have been merged with the original UCI Online Retail dataset to create a unified and continuous dataset.

    Cleaning and Preprocessing Performed

    • Merged all sheets into a single dataset
    • Removed:
      • Rows with negative or zero quantity
      • Rows with negative or zero price
      • Rows with missing customer_id
    • Created:
      • total_price column (quantity Γ— price)
      • is_cancelled column based on invoice format or return flag
    • Standardized:
      • invoicedate formatting
      • Column names and data types

    Column Definitions

    ColumnDescription
    invoiceInvoice number (returns start with 'C')
    stockcodeProduct code
    descriptionDescription of product
    quantityNumber of items purchased
    invoicedateDate and time of invoice
    priceUnit price in GBP
    customer_idUnique identifier for each customer
    countryCustomer’s country
    is_cancelledBoolean flag for cancelled transactions
    total_priceComputed total (quantity Γ— price) for each line item

    Included Files and Descriptions

    FileTypeDescription
    online_retail_cleaned.csvDataCleaned and merged retail transactions from 2009–2011
    rfm_final_score.csvOutputFinal RFM scores for each customer with segment labels
    Retail_Data_Analysis_Dashboard.xlsxExcelInteractive Excel dashboard with KPIs, CLV, monthly trends
    Retail_Data_Analysis_Dashboard.pngImageVisual preview of the Excel dashboard
    RFM_Segmentation.sqlSQLSQL logic to calculate RFM scores and assign segments
    Cohort_Analysis_on_Customer.sqlSQLCohort analysis based on acquisition month
    Cohort_Analysis_on_Revenue.sqlSQLCohort revenue tracking over time

    Dataset Summary

    • Time range: December 2009 – December 2011
    • Data combined from all three sheets (original and Online Retail II)
    • Most customers are from the United Kingdom
    • Fully cleaned and ready for use in analysis or modeling

    Applications

    • Market basket analysis
    • RFM segmentation
    • Cohort and retention analysis
    • Customer lifetime value modeling
    • Time series forecasting

    Included Analysis & Dashboards

    In addition to the cleaned dataset, this dataset includes complete analysis artifacts:

    1. Excel Dashboard

    • Summary metrics: Total Revenue, Orders, Customers, AOV
    • Turnover by year
    • Customer Lifetime Value segmentation (High, Medium, Low)
    • Monthly customer acquisition and churn trend
    • Country-wise revenue
    • Key business recommendations

    2. SQL-Based RFM Segmentation

    • RFM scores (1–5 scale)
    • Segment grouping (e.g., Champions, At Risk, Loyal Customers)
    • Monetary value distributions

    3. SQL-Based Cohort Analysis

    • Monthly cohorts based on acquisition date
    • Retention matrix for month-over-month analysis
    • Supports churn and lifecycle evaluation

    These files are provided in .xlsx and .sql formats and can be used for further business analysis or modeling.

    Source

    Original datasets: - UCI Online Retail II: https://archive.ics.uci.edu/ml/datasets/Online+Retail+II

    This version was cleaned and merged by: Md Shah Nawaj

    Tags

    retail, ecommerce, customer segmentation, transactions, time series, data cleaning, rfm, python, pandas, online retail

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Organization logo

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description

Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

  • Data Import
  • Data Understanding and Exploration
  • Transformation of the data – so that is ready to be consumed by the association rules algorithm
  • Running association rules
  • Exploring the rules generated
  • Filtering the generated rules
  • Visualization of Rule

Dataset Description

  • File name: Assignment-1_Data
  • List name: retaildata
  • File format: . xlsx
  • Number of Row: 522065
  • Number of Attributes: 7

    • BillNo: 6-digit number assigned to each transaction. Nominal.
    • Itemname: Product name. Nominal.
    • Quantity: The quantities of each product per transaction. Numeric.
    • Date: The day and time when each transaction was generated. Numeric.
    • Price: Product price. Numeric.
    • CustomerID: 5-digit number assigned to each customer. Nominal.
    • Country: Name of the country where each customer resides. Nominal.

imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

  • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
  • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
  • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
  • readxl - Read Excel Files in R.
  • plyr - Tools for Splitting, Applying and Combining Data.
  • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
  • knitr - Dynamic Report generation in R.
  • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
  • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
  • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

Search
Clear search
Close search
Google apps
Main menu