10 datasets found
  1. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  2. Market basket analysis

    • kaggle.com
    zip
    Updated Feb 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boopathi M (2024). Market basket analysis [Dataset]. https://www.kaggle.com/datasets/boopathi09945/market-basket-analysis
    Explore at:
    zip(72860 bytes)Available download formats
    Dataset updated
    Feb 17, 2024
    Authors
    Boopathi M
    Description

    Market basket analysis with Python as we uncover hidden patterns and relationships within transactional data. Discover how algorithms like Apriori can reveal valuable insights into customer behavior, product associations, and purchasing trends. Explore the power of data-driven decision-making in retail, marketing, and beyond, as we navigate through the fascinating realm of market basket analysis.

  3. Retail Transactions Dataset

    • kaggle.com
    zip
    Updated May 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasad Patil (2024). Retail Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/retail-transactions-dataset/code
    Explore at:
    zip(37330179 bytes)Available download formats
    Dataset updated
    May 18, 2024
    Authors
    Prasad Patil
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:

    Context:

    Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.

    Inspiration:

    The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.

    Dataset Information:

    The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:

    • Transaction_ID: A unique identifier for each transaction, represented as a 10-digit number. This column is used to uniquely identify each purchase.
    • Date: The date and time when the transaction occurred. It records the timestamp of each purchase.
    • Customer_Name: The name of the customer who made the purchase. It provides information about the customer's identity.
    • Product: A list of products purchased in the transaction. It includes the names of the products bought.
    • Total_Items: The total number of items purchased in the transaction. It represents the quantity of products bought.
    • Total_Cost: The total cost of the purchase, in currency. It represents the financial value of the transaction.
    • Payment_Method: The method used for payment in the transaction, such as credit card, debit card, cash, or mobile payment.
    • City: The city where the purchase took place. It indicates the location of the transaction.
    • Store_Type: The type of store where the purchase was made, such as a supermarket, convenience store, department store, etc.
    • Discount_Applied: A binary indicator (True/False) representing whether a discount was applied to the transaction.
    • Customer_Category: A category representing the customer's background or age group.
    • Season: The season in which the purchase occurred, such as spring, summer, fall, or winter.
    • Promotion: The type of promotion applied to the transaction, such as "None," "BOGO (Buy One Get One)," or "Discount on Selected Items."

    Use Cases:

    • Market Basket Analysis: Discover associations between products and uncover buying patterns.
    • Customer Segmentation: Group customers based on purchasing behavior.
    • Pricing Optimization: Optimize pricing strategies and identify opportunities for discounts and promotions.
    • Retail Analytics: Analyze store performance and customer trends.

    Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.

  4. Mall Customer Segmentation Data

    • kaggle.com
    Updated Aug 11, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vijay Choudhary (2018). Mall Customer Segmentation Data [Dataset]. https://www.kaggle.com/datasets/vjchoudhary7/customer-segmentation-tutorial-in-python/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 11, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vijay Choudhary
    Description

    Context

    This data set is created only for the learning purpose of the customer segmentation concepts , also known as market basket analysis . I will demonstrate this by using unsupervised ML technique (KMeans Clustering Algorithm) in the simplest form.

    Content

    You are owing a supermarket mall and through membership cards , you have some basic data about your customers like Customer ID, age, gender, annual income and spending score. Spending Score is something you assign to the customer based on your defined parameters like customer behavior and purchasing data.

    Problem Statement You own the mall and want to understand the customers like who can be easily converge [Target Customers] so that the sense can be given to marketing team and plan the strategy accordingly.

    Acknowledgements

    From Udemy's Machine Learning A-Z course.

    I am new to Data science field and want to share my knowledge to others

    https://github.com/SteffiPeTaffy/machineLearningAZ/blob/master/Machine%20Learning%20A-Z%20Template%20Folder/Part%204%20-%20Clustering/Section%2025%20-%20Hierarchical%20Clustering/Mall_Customers.csv

    Inspiration

    By the end of this case study , you would be able to answer below questions. 1- How to achieve customer segmentation using machine learning algorithm (KMeans Clustering) in Python in simplest way. 2- Who are your target customers with whom you can start marketing strategy [easy to converse] 3- How the marketing strategy works in real world

  5. Online Retail & E-Commerce Dataset

    • kaggle.com
    zip
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ertuğrul EŞOL (2025). Online Retail & E-Commerce Dataset [Dataset]. https://www.kaggle.com/datasets/ertugrulesol/online-retail-data
    Explore at:
    zip(26067 bytes)Available download formats
    Dataset updated
    Mar 20, 2025
    Authors
    Ertuğrul EŞOL
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Overview:

    This dataset contains 1000 rows of synthetic online retail sales data, mimicking transactions from an e-commerce platform. It includes information about customer demographics, product details, purchase history, and (optional) reviews. This dataset is suitable for a variety of data analysis, data visualization and machine learning tasks, including but not limited to: customer segmentation, product recommendation, sales forecasting, market basket analysis, and exploring general e-commerce trends. The data was generated using the Python Faker library, ensuring realistic values and distributions, while maintaining no privacy concerns as it contains no real customer information.

    Data Source:

    This dataset is entirely synthetic. It was generated using the Python Faker library and does not represent any real individuals or transactions.

    Data Content:

    Column NameData TypeDescription
    customer_idIntegerUnique customer identifier (ranging from 10000 to 99999)
    order_dateDateOrder date (a random date within the last year)
    product_idIntegerProduct identifier (ranging from 100 to 999)
    category_idIntegerProduct category identifier (10, 20, 30, 40, or 50)
    category_nameStringProduct category name (Electronics, Fashion, Home & Living, Books & Stationery, Sports & Outdoors)
    product_nameStringProduct name (randomly selected from a list of products within the corresponding category)
    quantityIntegerQuantity of the product ordered (ranging from 1 to 5)
    priceFloatUnit price of the product (ranging from 10.00 to 500.00, with two decimal places)
    payment_methodStringPayment method used (Credit Card, Bank Transfer, Cash on Delivery)
    cityStringCustomer's city (generated using Faker's city() method, so the locations will depend on the Faker locale you used)
    review_scoreIntegerCustomer's product rating (ranging from 1 to 5, or None with a 20% probability)
    genderStringCustomer's gender (M/F, or None with a 10% probability)
    ageIntegerCustomer's age (ranging from 18 to 75)

    Potential Use Cases (Inspiration):

    Customer Segmentation: Group customers based on demographics, purchasing behavior, and preferences.

    Product Recommendation: Build a recommendation system to suggest products to customers based on their past purchases and browsing history.

    Sales Forecasting: Predict future sales based on historical trends.

    Market Basket Analysis: Identify products that are frequently purchased together.

    Price Optimization: Analyze the relationship between price and demand.

    Geographic Analysis: Explore sales patterns across different cities.

    Time Series Analysis: Investigate sales trends over time.

    Educational Purposes: Great for practicing data cleaning, EDA, feature engineering, and modeling.

  6. Store Sales Dataset

    • kaggle.com
    zip
    Updated Sep 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nimisha Davis (2025). Store Sales Dataset [Dataset]. https://www.kaggle.com/datasets/drnimishadavis/store-sales-dataset
    Explore at:
    zip(562846 bytes)Available download formats
    Dataset updated
    Sep 22, 2025
    Authors
    Nimisha Davis
    Description

    This dataset contains retail sales records from a superstore, including detailed information on orders, products, categories, sales, discounts, profits, customers, and regions.

    It is widely used for business intelligence, data visualization, and machine learning projects. With features such as order date, ship mode, customer segment, and geographic region, the dataset is excellent for:

    Sales forecasting

    Profitability analysis

    Market basket analysis

    Customer segmentation

    Data visualization practice (Tableau, Power BI, Excel, Python, R)

    Inspiration:

    Great dataset for learning how to build dashboards.

    Commonly used in case studies for predictive analytics and decision-making.

    Source: Originally inspired by a sample dataset frequently used in Tableau training and BI case studies.

  7. E-commerce_dataset

    • kaggle.com
    zip
    Updated Nov 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhay Ayare (2025). E-commerce_dataset [Dataset]. https://www.kaggle.com/datasets/abhayayare/e-commerce-dataset
    Explore at:
    zip(644123 bytes)Available download formats
    Dataset updated
    Nov 16, 2025
    Authors
    Abhay Ayare
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    E-commerce_dataset

    This dataset is a synthetic yet realistic E-commerce retail dataset generated programmatically using Python (Faker + NumPy + Pandas).
    It is designed to closely mimic real-world online shopping behavior, user patterns, product interactions, seasonal trends, and marketplace events.
    
    

    You can use this dataset for:

    Machine Learning & Deep Learning
    Recommender Systems
    Customer Segmentation
    Sales Forecasting
    A/B Testing
    E-commerce Behaviour Analysis
    Data Cleaning / Feature Engineering Practice
    SQL practice
    

    📁**Dataset Contents**

    The dataset contains 6 CSV files: ~~~ File Rows Description users.csv ~10,000 User profiles, demographics & signup info products.csv ~2,000 Product catalog with rating and pricing orders.csv ~20,000 Order-level transactions order_items.csv ~60,000 Items purchased per order reviews.csv ~15,000 Customer-written product reviews events.csv ~80,000 User event logs: view, cart, wishlist, purchase ~~~

    🧬 Data Dictionary

    1. Users (users.csv)
    Column Description
    user_id Unique user identifier
    name  Full customer name
    email  Email (synthetic, no real emails)
    gender Male / Female / Other
    city  City of residence
    signup_date Account creation date
    
    2. Products (products.csv)
    Column Description
    product_id Unique product identifier
    product_name  Product title
    category  Electronics, Clothing, Beauty, Home, Sports, etc.
    price  Actual selling price
    rating Average product rating
    
    3. Orders (orders.csv)
    Column Description
    order_id  Unique order identifier
    user_id User who placed the order
    order_date Timestamp of the order
    order_status  Completed / Cancelled / Returned
    total_amount  Total order value
    
    4. Order Items (order_items.csv)
    Column Description
    order_item_id  Unique identifier
    order_id  Associated order
    product_id Purchased product
    quantity  Quantity purchased
    item_price Price per unit
    
    5. Reviews (reviews.csv)
    Column Description
    review_id  Unique review identifier
    user_id User who submitted review
    product_id Reviewed product
    rating 1–5 star rating
    review_text Short synthetic review
    review_date Submission date
    
    6. Events (events.csv)
    Column Description
    event_id  Unique event identifier
    user_id User performing event
    product_id Viewed/added/purchased product
    event_type view/cart/wishlist/purchase
    event_timestamp Timestamp of event
    

    🧠 Possible Use Cases (Ideas & Projects)

    🔍 Machine Learning

    Customer churn prediction
    Review sentiment analysis (NLP)
    Recommendation engines
    Price optimization models
    Demand forecasting (Time-series)
    

    📦 Business Analytics

    Market basket analysis
    RFM segmentation
    Cohort analysis
    Funnel conversion tracking
    A/B testing simulations
    

    🧮 SQL Practice

    Joins
    Window functions
    Aggregations
    CTE-based funnels
    Complex queries
    

    🛠 How the Dataset Was Generated

    The dataset was generated entirely in Python using:

    Faker for realistic user and review generation
    NumPy for probability-based event modeling
    Pandas for data processing
    

    Custom logic for:

    demand variation
    user behavior simulation
    return/cancel probabilities
    seasonal order timestamp distribution
    The dataset does not include any real personal data.
    Everything is generated synthetically.
    

    ⚠️ License

    This dataset is released under CC BY 4.0 — free to use for:
    Research
    Education
    Commercial projects
    Kaggle competitions
    Machine learning pipelines
    Just provide attribution.
    

    ⭐ If you found this dataset helpful, please:

    Upvote the dataset
    Leave a comment
    Share your notebooks/notebooks using it
    
  8. Mall_CustomerData_with_Nulls

    • kaggle.com
    zip
    Updated Nov 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    panda (2022). Mall_CustomerData_with_Nulls [Dataset]. https://www.kaggle.com/datasets/jangid6/mall-customerdata-with-nulls
    Explore at:
    zip(1603 bytes)Available download formats
    Dataset updated
    Nov 21, 2022
    Authors
    panda
    Description

    This is the extension of https://www.kaggle.com/datasets/vjchoudhary7/customer-segmentation-tutorial-in-python public dataset. Changelog : - I have explicitly added NaN/Nulls to the Annual Income & Spending Score.

    About Dataset

    Context

    This data set is created only for the learning purpose of the customer segmentation concepts , also known as market basket analysis . I will demonstrate this by using unsupervised ML technique (KMeans Clustering Algorithm) in the simplest form.

    Content

    You are owing a supermarket mall and through membership cards , you have some basic data about your customers like Customer ID, age, gender, annual income and spending score. Spending Score is something you assign to the customer based on your defined parameters like customer behavior and purchasing data.

    Problem Statement You own the mall and want to understand the customers like who can be easily converge [Target Customers] so that the sense can be given to marketing team and plan the strategy accordingly.

    Acknowledgements From Udemy's Machine Learning A-Z course.

    I am new to Data science field and want to share my knowledge to others

    https://github.com/SteffiPeTaffy/machineLearningAZ/blob/master/Machine%20Learning%20A-Z%20Template%20Folder/Part%204%20-%20Clustering/Section%2025%20-%20Hierarchical%20Clustering/Mall_Customers.csv

    Inspiration By the end of this case study , you would be able to answer below questions. 1- How to achieve customer segmentation using machine learning algorithm (KMeans Clustering) in Python in simplest way. 2- Who are your target customers with whom you can start marketing strategy [easy to converse] 3- How the marketing strategy works in real world

  9. 👟 Sneakers & Streetwear Sales (2022)

    • kaggle.com
    zip
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atharva Soundankar (2025). 👟 Sneakers & Streetwear Sales (2022) [Dataset]. https://www.kaggle.com/datasets/atharvasoundankar/sneakers-and-streetwear-sales-2022
    Explore at:
    zip(6320 bytes)Available download formats
    Dataset updated
    Jul 29, 2025
    Authors
    Atharva Soundankar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    👟 Sneakers & Streetwear Sales Data (2022)

    This dataset offers a detailed snapshot of global retail sales from the fast-growing sneaker and streetwear market between January and August 2022. It captures essential sales insights from multiple countries, spanning brands like Nike, Adidas, Supreme, Yeezy, and Off-White, along with high-demand categories such as sneakers, hoodies, joggers, and graphic tees.

    The data has been carefully simulated to mirror real-world patterns in retail e-commerce — including seasonality, gender preferences, price bands, and payment behaviors. Each record represents a successful transaction, making this dataset ideal for sales analytics, business intelligence projects, and predictive modeling.

    📦 What’s Inside?

    • 500 clean, non-null, and unique sales records
    • Covering 10 countries and 30+ product names
    • Fields include: Order Date, Country, Gender, Product, Category, Quantity Sold, Unit Price, Total Sale, and Payment Method

    💡 Why This Dataset?

    Sneakers and streetwear aren't just fashion — they're a data-rich ecosystem of global trends, influencer impact, resale value, and cultural relevance. Whether you're working on:

    • EDA & trend visualization
    • Time-series forecasting
    • Market basket analysis
    • Customer segmentation
    • Sales dashboards

    … this dataset gives you everything you need to explore, model, and tell a data story.

    ✅ Key Features

    • Realistic sales simulation for 2022
    • Useful for beginners and advanced practitioners alike
    • Cleaned and curated — ready for analysis, dashboards, and ML
    • Ideal for Power BI, Tableau, Python (Pandas, Seaborn, Plotly), and ML libraries
  10. Online Retail Transaction Records

    • kaggle.com
    zip
    Updated Dec 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Online Retail Transaction Records [Dataset]. https://www.kaggle.com/datasets/thedevastator/online-retail-transaction-records
    Explore at:
    zip(9098240 bytes)Available download formats
    Dataset updated
    Dec 21, 2023
    Authors
    The Devastator
    Description

    Online Retail Transaction Records

    Online Retail Sales: Product Transactions and Customer Details

    By Ali Prasla [source]

    About this dataset

    The Online Retail Sales Dataset, often referred to as the Online Retail.csv file, is an extensive and comprehensive collection of data points relating to e-commerce transactions. This dataset provides a detailed view of sales activities within the online retail sector, covering numerous essential attributes necessary for a quantitative understanding of consumer behavior and the overall business performance.

    One of the key elements covered in this dataset is 'InvoiceNo', which is a unique identifier for each transaction taking place in this retail environment. Given its uniqueness, it serves as a primary key for distinguishing individual transactions. It's worthwhile to note that these Invoice Numbers are numerical values.

    Another important attribute included here is 'StockCode'. Each product listed or sold on this online retail platform has been assigned with its unique identification code or StockCode. These codes are also numerical values that offer another layer to clearly classify items and distinguish one from another.

    For further understanding, every product comes with a basic description noted under the 'Description' column. In textual form, these descriptions provide insights into what exactly each product item entails. Aside from aiding identification efforts, they can potentially open avenues for text-based analysis such as sentiment analysis or keyword flagging based on product trends.

    'Moving onto details about transactions themselves', we have two crucial columns: 'Quantity' and 'UnitPrice'. As their names suggest, these show respectively how many particular units of an item were sold per transaction and at what price per unit was sold at.

    Further adding detail to our transactions information comes 'InvoiceDate', which records when each separate purchase occurred down to accurate date & time records. This data can be pivotal in recognizing sales patterns throughout different periods or predicting future trends based on historical timing behavior.

    Finally yet importantly comes our global indicator - The ‘Country’ column specifies various countries where customers reside who interacts with this particular online platform regularly by making purchases. This application allows us insights into the geographical dispersion of user base across various countries, potentially providing us insights into regional preferences or global market segmentation.

    Ith such a wealth of detailed transaction records and customer information, the Online Retail.csv dataset stands as an invaluable tool for those looking to delve deep into online retail sales data analysis. The possibilities with this dataset are vast, ranging from shaping efficient marketing strategies based on geographical data to predicting sales & growth metrics using historical behavior and much more

    How to use the dataset

    Here's how to make best use of this dataset:

    Getting Started Before you start analyzing your data – you'll have to load it into statistical software such as Python (using pandas library) or R. The dataset is saved in .csv file format which supports easy reading into most data manipulation software.

    Understand The Fields

    • InvoiceNo: Each transaction made has an associated unique numerical identifier called InvoiceNo. Consider it like a receipt code - these allow for tracking individual transactions.

    • StockCode: To identify each product uniquely during analysis, refer to each StockCode value which is essentially a product identification code.

    • Description: A brief textual description about each product that can be invaluable when dealing with categories for market-basket type analysis.

    • Quantity: Each row lists out how many units of a particular item were involved in a single transaction - watch out for very large values as they might represent bulk orders.

    • decode 3

    • code point 747

    • hidden fields exercise difficulty

    • coding dictionary letters

    • decipher hidden message codes

    • dictionary letters python

    • a word scramble solution .

    • hidden language symbols

    • unscramble words solver codes

    • descriptions quizlet game zones

    • hidden words gameplay notes

    • name that symbol solutions pack.

    11.russian alphabet chart deciphered key .

    12.writing numbers in words worksheets grade 1 difficulty

    13.cool letter symbols copy and paste trick

    14.solve the equation by factoring puzzle answers...

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Organization logo

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description

Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

  • Data Import
  • Data Understanding and Exploration
  • Transformation of the data – so that is ready to be consumed by the association rules algorithm
  • Running association rules
  • Exploring the rules generated
  • Filtering the generated rules
  • Visualization of Rule

Dataset Description

  • File name: Assignment-1_Data
  • List name: retaildata
  • File format: . xlsx
  • Number of Row: 522065
  • Number of Attributes: 7

    • BillNo: 6-digit number assigned to each transaction. Nominal.
    • Itemname: Product name. Nominal.
    • Quantity: The quantities of each product per transaction. Numeric.
    • Date: The day and time when each transaction was generated. Numeric.
    • Price: Product price. Numeric.
    • CustomerID: 5-digit number assigned to each customer. Nominal.
    • Country: Name of the country where each customer resides. Nominal.

imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

  • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
  • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
  • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
  • readxl - Read Excel Files in R.
  • plyr - Tools for Splitting, Applying and Combining Data.
  • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
  • knitr - Dynamic Report generation in R.
  • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
  • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
  • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

Search
Clear search
Close search
Google apps
Main menu