Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
TwitterMarket basket analysis with Python as we uncover hidden patterns and relationships within transactional data. Discover how algorithms like Apriori can reveal valuable insights into customer behavior, product associations, and purchasing trends. Explore the power of data-driven decision-making in retail, marketing, and beyond, as we navigate through the fascinating realm of market basket analysis.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:
Context:Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.
Inspiration:The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.
Dataset Information:The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:
Use Cases:Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.
Facebook
TwitterThis data set is created only for the learning purpose of the customer segmentation concepts , also known as market basket analysis . I will demonstrate this by using unsupervised ML technique (KMeans Clustering Algorithm) in the simplest form.
You are owing a supermarket mall and through membership cards , you have some basic data about your customers like Customer ID, age, gender, annual income and spending score. Spending Score is something you assign to the customer based on your defined parameters like customer behavior and purchasing data.
Problem Statement You own the mall and want to understand the customers like who can be easily converge [Target Customers] so that the sense can be given to marketing team and plan the strategy accordingly.
From Udemy's Machine Learning A-Z course.
I am new to Data science field and want to share my knowledge to others
By the end of this case study , you would be able to answer below questions. 1- How to achieve customer segmentation using machine learning algorithm (KMeans Clustering) in Python in simplest way. 2- Who are your target customers with whom you can start marketing strategy [easy to converse] 3- How the marketing strategy works in real world
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Overview:
This dataset contains 1000 rows of synthetic online retail sales data, mimicking transactions from an e-commerce platform. It includes information about customer demographics, product details, purchase history, and (optional) reviews. This dataset is suitable for a variety of data analysis, data visualization and machine learning tasks, including but not limited to: customer segmentation, product recommendation, sales forecasting, market basket analysis, and exploring general e-commerce trends. The data was generated using the Python Faker library, ensuring realistic values and distributions, while maintaining no privacy concerns as it contains no real customer information.
Data Source:
This dataset is entirely synthetic. It was generated using the Python Faker library and does not represent any real individuals or transactions.
Data Content:
| Column Name | Data Type | Description |
|---|---|---|
customer_id | Integer | Unique customer identifier (ranging from 10000 to 99999) |
order_date | Date | Order date (a random date within the last year) |
product_id | Integer | Product identifier (ranging from 100 to 999) |
category_id | Integer | Product category identifier (10, 20, 30, 40, or 50) |
category_name | String | Product category name (Electronics, Fashion, Home & Living, Books & Stationery, Sports & Outdoors) |
product_name | String | Product name (randomly selected from a list of products within the corresponding category) |
quantity | Integer | Quantity of the product ordered (ranging from 1 to 5) |
price | Float | Unit price of the product (ranging from 10.00 to 500.00, with two decimal places) |
payment_method | String | Payment method used (Credit Card, Bank Transfer, Cash on Delivery) |
city | String | Customer's city (generated using Faker's city() method, so the locations will depend on the Faker locale you used) |
review_score | Integer | Customer's product rating (ranging from 1 to 5, or None with a 20% probability) |
gender | String | Customer's gender (M/F, or None with a 10% probability) |
age | Integer | Customer's age (ranging from 18 to 75) |
Potential Use Cases (Inspiration):
Customer Segmentation: Group customers based on demographics, purchasing behavior, and preferences.
Product Recommendation: Build a recommendation system to suggest products to customers based on their past purchases and browsing history.
Sales Forecasting: Predict future sales based on historical trends.
Market Basket Analysis: Identify products that are frequently purchased together.
Price Optimization: Analyze the relationship between price and demand.
Geographic Analysis: Explore sales patterns across different cities.
Time Series Analysis: Investigate sales trends over time.
Educational Purposes: Great for practicing data cleaning, EDA, feature engineering, and modeling.
Facebook
TwitterThis dataset contains retail sales records from a superstore, including detailed information on orders, products, categories, sales, discounts, profits, customers, and regions.
It is widely used for business intelligence, data visualization, and machine learning projects. With features such as order date, ship mode, customer segment, and geographic region, the dataset is excellent for:
Sales forecasting
Profitability analysis
Market basket analysis
Customer segmentation
Data visualization practice (Tableau, Power BI, Excel, Python, R)
Inspiration:
Great dataset for learning how to build dashboards.
Commonly used in case studies for predictive analytics and decision-making.
Source: Originally inspired by a sample dataset frequently used in Tableau training and BI case studies.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset is a synthetic yet realistic E-commerce retail dataset generated programmatically using Python (Faker + NumPy + Pandas).
It is designed to closely mimic real-world online shopping behavior, user patterns, product interactions, seasonal trends, and marketplace events.
Machine Learning & Deep Learning
Recommender Systems
Customer Segmentation
Sales Forecasting
A/B Testing
E-commerce Behaviour Analysis
Data Cleaning / Feature Engineering Practice
SQL practice
The dataset contains 6 CSV files: ~~~ File Rows Description users.csv ~10,000 User profiles, demographics & signup info products.csv ~2,000 Product catalog with rating and pricing orders.csv ~20,000 Order-level transactions order_items.csv ~60,000 Items purchased per order reviews.csv ~15,000 Customer-written product reviews events.csv ~80,000 User event logs: view, cart, wishlist, purchase ~~~
1. Users (users.csv)
Column Description
user_id Unique user identifier
name Full customer name
email Email (synthetic, no real emails)
gender Male / Female / Other
city City of residence
signup_date Account creation date
2. Products (products.csv)
Column Description
product_id Unique product identifier
product_name Product title
category Electronics, Clothing, Beauty, Home, Sports, etc.
price Actual selling price
rating Average product rating
3. Orders (orders.csv)
Column Description
order_id Unique order identifier
user_id User who placed the order
order_date Timestamp of the order
order_status Completed / Cancelled / Returned
total_amount Total order value
4. Order Items (order_items.csv)
Column Description
order_item_id Unique identifier
order_id Associated order
product_id Purchased product
quantity Quantity purchased
item_price Price per unit
5. Reviews (reviews.csv)
Column Description
review_id Unique review identifier
user_id User who submitted review
product_id Reviewed product
rating 1–5 star rating
review_text Short synthetic review
review_date Submission date
6. Events (events.csv)
Column Description
event_id Unique event identifier
user_id User performing event
product_id Viewed/added/purchased product
event_type view/cart/wishlist/purchase
event_timestamp Timestamp of event
Customer churn prediction
Review sentiment analysis (NLP)
Recommendation engines
Price optimization models
Demand forecasting (Time-series)
Market basket analysis
RFM segmentation
Cohort analysis
Funnel conversion tracking
A/B testing simulations
Joins
Window functions
Aggregations
CTE-based funnels
Complex queries
Faker for realistic user and review generation
NumPy for probability-based event modeling
Pandas for data processing
demand variation
user behavior simulation
return/cancel probabilities
seasonal order timestamp distribution
The dataset does not include any real personal data.
Everything is generated synthetically.
This dataset is released under CC BY 4.0 — free to use for:
Research
Education
Commercial projects
Kaggle competitions
Machine learning pipelines
Just provide attribution.
Upvote the dataset
Leave a comment
Share your notebooks/notebooks using it
Facebook
TwitterThis is the extension of https://www.kaggle.com/datasets/vjchoudhary7/customer-segmentation-tutorial-in-python public dataset. Changelog : - I have explicitly added NaN/Nulls to the Annual Income & Spending Score.
About Dataset
Context
This data set is created only for the learning purpose of the customer segmentation concepts , also known as market basket analysis . I will demonstrate this by using unsupervised ML technique (KMeans Clustering Algorithm) in the simplest form.
Content
You are owing a supermarket mall and through membership cards , you have some basic data about your customers like Customer ID, age, gender, annual income and spending score. Spending Score is something you assign to the customer based on your defined parameters like customer behavior and purchasing data.
Problem Statement You own the mall and want to understand the customers like who can be easily converge [Target Customers] so that the sense can be given to marketing team and plan the strategy accordingly.
Acknowledgements From Udemy's Machine Learning A-Z course.
I am new to Data science field and want to share my knowledge to others
Inspiration By the end of this case study , you would be able to answer below questions. 1- How to achieve customer segmentation using machine learning algorithm (KMeans Clustering) in Python in simplest way. 2- Who are your target customers with whom you can start marketing strategy [easy to converse] 3- How the marketing strategy works in real world
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset offers a detailed snapshot of global retail sales from the fast-growing sneaker and streetwear market between January and August 2022. It captures essential sales insights from multiple countries, spanning brands like Nike, Adidas, Supreme, Yeezy, and Off-White, along with high-demand categories such as sneakers, hoodies, joggers, and graphic tees.
The data has been carefully simulated to mirror real-world patterns in retail e-commerce — including seasonality, gender preferences, price bands, and payment behaviors. Each record represents a successful transaction, making this dataset ideal for sales analytics, business intelligence projects, and predictive modeling.
Sneakers and streetwear aren't just fashion — they're a data-rich ecosystem of global trends, influencer impact, resale value, and cultural relevance. Whether you're working on:
… this dataset gives you everything you need to explore, model, and tell a data story.
Facebook
TwitterBy Ali Prasla [source]
The Online Retail Sales Dataset, often referred to as the Online Retail.csv file, is an extensive and comprehensive collection of data points relating to e-commerce transactions. This dataset provides a detailed view of sales activities within the online retail sector, covering numerous essential attributes necessary for a quantitative understanding of consumer behavior and the overall business performance.
One of the key elements covered in this dataset is 'InvoiceNo', which is a unique identifier for each transaction taking place in this retail environment. Given its uniqueness, it serves as a primary key for distinguishing individual transactions. It's worthwhile to note that these Invoice Numbers are numerical values.
Another important attribute included here is 'StockCode'. Each product listed or sold on this online retail platform has been assigned with its unique identification code or StockCode. These codes are also numerical values that offer another layer to clearly classify items and distinguish one from another.
For further understanding, every product comes with a basic description noted under the 'Description' column. In textual form, these descriptions provide insights into what exactly each product item entails. Aside from aiding identification efforts, they can potentially open avenues for text-based analysis such as sentiment analysis or keyword flagging based on product trends.
'Moving onto details about transactions themselves', we have two crucial columns: 'Quantity' and 'UnitPrice'. As their names suggest, these show respectively how many particular units of an item were sold per transaction and at what price per unit was sold at.
Further adding detail to our transactions information comes 'InvoiceDate', which records when each separate purchase occurred down to accurate date & time records. This data can be pivotal in recognizing sales patterns throughout different periods or predicting future trends based on historical timing behavior.
Finally yet importantly comes our global indicator - The ‘Country’ column specifies various countries where customers reside who interacts with this particular online platform regularly by making purchases. This application allows us insights into the geographical dispersion of user base across various countries, potentially providing us insights into regional preferences or global market segmentation.
Ith such a wealth of detailed transaction records and customer information, the Online Retail.csv dataset stands as an invaluable tool for those looking to delve deep into online retail sales data analysis. The possibilities with this dataset are vast, ranging from shaping efficient marketing strategies based on geographical data to predicting sales & growth metrics using historical behavior and much more
Here's how to make best use of this dataset:
Getting Started Before you start analyzing your data – you'll have to load it into statistical software such as Python (using pandas library) or R. The dataset is saved in .csv file format which supports easy reading into most data manipulation software.
Understand The Fields
InvoiceNo: Each transaction made has an associated unique numerical identifier called InvoiceNo. Consider it like a receipt code - these allow for tracking individual transactions.
StockCode: To identify each product uniquely during analysis, refer to each StockCode value which is essentially a product identification code.
Description: A brief textual description about each product that can be invaluable when dealing with categories for market-basket type analysis.
Quantity: Each row lists out how many units of a particular item were involved in a single transaction - watch out for very large values as they might represent bulk orders.
decode 3
code point 747
hidden fields exercise difficulty
coding dictionary letters
decipher hidden message codes
dictionary letters python
a word scramble solution .
hidden language symbols
unscramble words solver codes
descriptions quizlet game zones
hidden words gameplay notes
name that symbol solutions pack.
11.russian alphabet chart deciphered key .
12.writing numbers in words worksheets grade 1 difficulty
13.cool letter symbols copy and paste trick
14.solve the equation by factoring puzzle answers...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...