Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Simulated Dataset of Customer Purchase Behavior
This dataset contains simulated data representing customer purchase behavior. It includes various features such as age, gender, income, education, region, loyalty status, purchase frequency, purchase amount, product category, promotion usage, and satisfaction score.
age: Age of the customer.gender: Gender of the customer (0 for Male, 1 for Female).income: Annual income of the customer.education: Education level of the customer.region: Region where the customer resides.loyalty_status: Loyalty status of the customer.purchase_frequency: Frequency of purchases made by the customer.purchase_amount: Amount spent by the customer in each purchase.product_category: Category of the purchased product.promotion_usage: Indicates whether the customer used promotional offers (0 for No, 1 for Yes).satisfaction_score: Satisfaction score of the customer.The dataset was simulated using the simstudy package in R. Various distributions and formulas were used to generate synthetic data representing customer purchase behavior. The data is organized to mimic real-world scenarios, but it does not represent actual customer data.
Facebook
TwitterBy Marc Szafraniec [source]
The InvoiceNo column holds unique identifiers for each transaction conducted. This numerical code serves a twofold purpose: it facilitates effortless identification of individual sales or purchases while simultaneously enabling treasury management by offering a repository for record keeping.
In concordance with the invoice number is the InvoiceDate column. It provides a date-time stamp associated with every transaction, which can reveal patterns in purchasing behaviour over time and assists with record-keeping requirements.
The StockCode acts as an integral part of this dataset; it encompasses alphanumeric sequences allocated distinctively to every item in stock. Such a system aids unequivocally identifying individual products making inventory records seamless.
The Description field offers brief elucidations about each listed product, adding layers beyond just stock codes to aid potential customers' understanding of products better and make more informed choices.
Detailed logs concerning sold quantities come under the Quantity banner - it lists the units involved per transaction alongside aiding calculations regarding total costs incurred during each sale/purchase offering significant help tracking inventory levels based on products' outflow dynamics within given periods.
Retail isn't merely about what you sell but also at what price you sell- A point acknowledged via our inclusion of unit prices exerted on items sold within transactions inside our dataset's UnitPrice column which puts forth pertinent pricing details serving as pivotal factors driving metrics such as gross revenue calculation etc
Finally yet importantly is our dive into foreign waters - literally! With impressive international outreach we're looking into segmentation bases like geographical locations via documenting countries (under the name Country) where transactions are conducted & consumers reside extending opportunities for businesses to map their customer bases, track regional performance metrics, extend localization efforts and overall contributing to the formulation of efficient segmentation strategies.
All this invaluable information can be found in a sortable CSV file titled online_retail.csv. This dataset will prove incredibly advantageous for anyone interested in or researching online sales trends, developing customer profiles, or gaining insights into effective inventory management practices
Identifying Products:
StockCodeis the unique identifier for each product. You can use it to identify individual products, track their sales, or discover patterns related to specific items.Assessing Sales Volume:
Quantitycolumn tells you about the number of units of a product involved in each transaction. Along withInvoiceNo, you can analyze overall sales volume or specific purchases throughout your selected period.Observing Price Fluctuations: By using the
UnitPrice, not only can the total cost per transaction be calculated (by multiplying with Quantity), but also insightful observations like price fluctuations over time or determining most profitable items could be derived.Analyzing Description Patterns/Trends: The
Descriptionfield sheds light upon what kind of products are being traded. This could provide some inspiration for text analysis like term frequency-inverse document frequency (TF-IDF), sentiment analysis on descriptions, etc., to figure out popular trends at given times.Analysing Geographical Trends: With the help of
Countrycolumn, geographical trends in sales volumes across different nations can easily be analyzed i.e., which location has more customers or which country orders more quantity or expensive units based on unit price and quantity columns respectively.Keep in mind that proper extraction and transformation methodology should be applied while handling data from different columns as per their datatypes (textual/alphanumeric/numeric) requirements.
This dataset not only allows retailers to gain an immediate understanding into their operations but could also serve as a base dataset for those interested in machine learning regarding predicting future transactions
- Inventory Management: By tracking the 'Quantity' and 'StockCode' over time, a business could use this data to notice if certain products are frequently purchased together or in specific seasons, allowing them to better stock their inventory.
- Pricing Strategy:...
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Randomized dataset intended for trial ETL work using tools such as Microsoft Fabric, Azure Data Factory, Azure Synapse Analytics, Databricks, etc. Contains 3 CSV files: products.csv (dim), customers.csv (dim), sales.csv (facts).
Facebook
TwitterIn a business context: Clustering algorithm is a technique that assists customer segmentation which is a process of classifying similar customers into the same segment. Clustering algorithm helps to better understand customers, in terms of both static demographics and dynamic behaviors. Customer with comparable characteristics often interact with the business similarly, thus business can benefit from this technique by creating tailored marketing strategy for each segment. Based on that, customers can be provided with discounts, offers, promo codes etc. As a simple example. A bank wants to give credit card offers to its customers. Currently, they look at the details of each customer and based on this information, decide which offer should be given to which customer. Now, the bank can potentially have millions of customers. Does it make sense to look at the details of each customer separately and then make a decision? Certainly not! It is a manual process and will take a huge amount of time. So what can the bank do? One option is to segment its customers into different groups. For instance, the bank can group the customers based on their income
The bank can now make three different strategies or offers, one for each group. Here, instead of creating different strategies for individual customers, they only have to make 3 strategies. This will reduce the effort as well as the time. Clustering is the process of dividing the entire data into groups (also known as clusters) based on the patterns in the data.
The purpose of this notebook The competitive in financial industries are getting harder in the next decade. One of this industry main source of revenue are Interest Income which they could get by giving loan or credit payment facilities to customer. Therefore, the more the credit are given, the more interest they get. Since the data are collected by every credit activities, the company hope they could get some insight by processing the data. This time, we have a data contains summary of the usage behavior of about 9000 active credit card holders during the last 6 months. Data includes transaction frequency, amount, tenure... etc.The bank marketing team would like to leverage AI/ML to launch a targeted marketing ad campaign that is tailored to a specific group of customers. In order for this campaign to be successful, the bank has to divide its customers into at least 3 distinctive groups.This process is known as "marketing segmentation" and is crucial for maximizing marketing campaign conversion rate.We will process this data using unsupervised learning methodology to segmentize the customer by finding a certain pattern in hope we could find some characteristic between each customer segment. Then we will analyze each segment and plan the marketing approach that work best with each segment.
Problem Statement
This case requires to develop a customer segmentation to define marketing strategy. The problem described in this dataset requires us to extract segments of customers depending on their behaviour patterns provided in the dataset, to focus marketing strategy of the company on a particular segment.
What is customer segmentation?
One of method for the marketing team to understand their customer, is by dividing their customer by their characteristic which is called customer segmentation. Customer segmentation is the process by which you divide your customers up based on common characteristics – such as demographics or behaviours, so you can market to those customers more effectively.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Synthetic Dataset of Customer Transactions with Demographic and Shopping Behavior Information
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Beginner friendly dataset for clustering.
You can train a model to cluster customers in segments (High, Medium, Low) based on 'Avg_Order_Value' and 'Total_Spending'.
Actual segment is also provided.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by tusharatkare06
Released under Database: Open Database, Contents: Database Contents
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides comprehensive customer data suitable for segmentation analysis. It includes anonymized demographic, transactional, and behavioral attributes, allowing for detailed exploration of customer segments. Leveraging this dataset, marketers, data scientists, and business analysts can uncover valuable insights to optimize targeted marketing strategies and enhance customer engagement. Whether you're looking to understand customer behavior or improve campaign effectiveness, this dataset offers a rich resource for actionable insights and informed decision-making.
Anonymized demographic, transactional, and behavioral data. Suitable for customer segmentation analysis. Opportunities to optimize targeted marketing strategies. Valuable insights for improving campaign effectiveness. Ideal for marketers, data scientists, and business analysts.
Segmenting customers based on demographic attributes. Analyzing purchase behavior to identify high-value customer segments. Optimizing marketing campaigns for targeted engagement. Understanding customer preferences and tailoring product offerings accordingly. Evaluating the effectiveness of marketing strategies and iterating for improvement. Explore this dataset to unlock actionable insights and drive success in your marketing initiatives!
Facebook
TwitterThis dataset was created by Hassan Abbas
Released under Other (specified in description)
Facebook
TwitterBy Joseph Nowicki [source]
This dataset contains demographic information about customers who have made purchases in a store, including their name, IP address, region, age, items purchased, and total amount spent. Furthermore, this data can provide insights into customer shopping behaviour for the store in question - from their geographical information to the types of products they purchase. With detailed demographic data like this at hand it is possible to make strategic decisions regarding target customers as well as developing specific marketing campaigns or promotions tailored to meet their needs and interests. By gaining deeper understanding of customer habits through this dataset we unlock more possibilities for businesses seeking higher engagement levels with shoppers
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset includes information such as customer's names, IP address, age, items purchased and amount spent. This data can be used to uncover patterns in spending behavior of shoppers from different areas or regions across demographics like age group or gender.
- Analyze customer shopping trends based on age and region to maximize targetted advertising.
- Analyze the correlation between customer spending habits based on store versus online behavior.
- Use IP addresses to track geographical trends in items purchased from a particular online store to identify new markets for targeted expansion
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: Demographic_Data_Orig.csv | Column name | Description | |:---------------|:------------------------------------------------------------------------------------------------| | full.name | The full name of the customer. (String) | | ip.address | The IP address of the customer. (String) | | region | The region of residence of the customer. (String) | | in.store | A boolean value indicating whether the customer made the purchase in-store or online. (Boolean) | | age | The age of the customer. (Integer) | | items | The number of items purchased by the customer. (Integer) | | amount | The total amount spent by the customer. (Float) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Joseph Nowicki.
Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
https://i.imgur.com/PcSDv8A.png" alt="Imgur">
The dataset provided here is a rich compilation of various data files gathered to support diverse analytical challenges and education in data science. It is especially curated to provide researchers, data enthusiasts, and students with real-world data across different domains, including biostatistics, travel, real estate, sports, media viewership, and more.
Below is a brief overview of what each CSV file contains: - Addresses: Practical examples of string manipulation and address data formatting in CSV. - Air Travel: Historical dataset suitable for analyzing trends in air travel over a period of three years. - Biostats: A dataset of office workers' biometrics, ideal for introductory statistics and biology. - Cities: Geographic and administrative data for urban analysis or socio-demographic studies. - Car Crashes in Catalonia: Weekly traffic accident data from Catalonia, providing a base for public policy research. - De Niro's Film Ratings: Analyze trends in film ratings over time with this entertainment-focused dataset. - Ford Escort Sales: Pre-owned vehicle sales data, perfect for regression analysis or price prediction models. - Old Faithful Geyser: Geological data for pattern recognition and prediction in natural phenomena. - Freshman Year Weights and BMIs: Dataset depicting weight and BMI changes for health and lifestyle studies. - Grades: Education performance data which can be correlated with demographics or study patterns. - Home Sales: A dataset reflecting the housing market dynamics, useful for economic analysis or real estate appraisal. - Hooke's Law Demonstration: Physics data illustrating the classic principle of elasticity in springs. - Hurricanes and Storm Data: Climate data on hurricane and storm frequency for environmental risk assessments. - Height and Weight Measurements: Public health research dataset on anthropometric data. - Lead Shot Specs: Detailed engineering data for material sciences and manufacturing studies. - Alphabet Letter Frequency: Text analysis dataset for frequency distribution studies in large text samples. - MLB Player Statistics: Comprehensive athletic data set for analysis of performance metrics in sports. - MLB Teams' Seasonal Performance: A dataset combining financial and sports performance data from the 2012 MLB season. - TV News Viewership: Media consumption data which can be used to analyze viewing patterns and trends. - Historical Nile Flood Data: A unique environmental dataset for historical trend analysis in flood levels. - Oscar Winner Ages: A dataset to explore age trends among Oscar-winning actors and actresses. - Snakes and Ladders Statistics: Data from the game outcomes useful in studying probability and game theory. - Tallahassee Cab Fares: Price modeling data from the real-world pricing of taxi services. - Taxable Goods Data: A snapshot of economic data concerning taxation impact on prices. - Tree Measurements: Ecological and environmental science data related to tree growth and forest management. - Real Estate Prices from Zillow: Market analysis dataset for those interested in housing price determinants.
The enclosed data respect the comma-separated values (CSV) file format standards, ensuring compatibility with most data processing libraries in Python, R, and other languages. The datasets are ready for import into Jupyter notebooks, RStudio, or any other integrated development environment (IDE) used for data science.
The data is pre-checked for common issues such as missing values, duplicate records, and inconsistent entries, offering a clean and reliable dataset for various analytical exercises. With initial header lines in some CSV files, users can easily identify dataset fields and start their analysis without additional data cleaning for headers.
The dataset adheres to the GNU LGPL license, making it freely available for modification and distribution, provided that the original source is cited. This opens up possibilities for educators to integrate real-world data into curricula, researchers to validate models against diverse datasets, and practitioners to refine their analytical skills with hands-on data.
This dataset has been compiled from https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html, with gratitude to the authors and maintainers for their dedication to providing open data resources for educational and research purposes.
https://i.imgur.com/HOtyghv.png" alt="Imgur">
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Madhura Atmaram Bhagat
Released under Apache 2.0
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Name: Mall Customer Dataset Purpose: To analyze customer demographics, income, and spending scores for market segmentation and business decision-making. Data Format: CSV file containing customer data with labels for gender, age, income, and spending score. Fields: CustomerID: Unique identifier for customers. Gender: Gender of the customer (Male/Female). Age: Age group of the customer (e.g., 18-23, 24-28). Annual Income (k$): Annual income of the customer (in thousands of dollars). Spending Score (1-100): Customer's spending score, ranging from 1 to 100. Total Entries: 200 customers with various demographic and spending information. Missing Data: None. All fields are filled for every customer. Unique Values: Unique values for each attribute like gender, age, income range, and spending score range. Suggested Tasks: Customer Segmentation
Task: Use clustering algorithms (e.g., K-means, DBSCAN, Agglomerative Clustering) to segment customers into different groups based on their demographics and spending behaviors. Approach: Segment customers into clusters such as "High spenders," "Middle-income, low spenders," etc. Predict Spending Behavior
Task: Build a predictive model to forecast the spending score of a customer based on demographic features (age, gender, income). Approach: Train a regression model (e.g., Linear Regression, Random Forest, or Gradient Boosting) to predict the spending score. Customer Behavior Analysis
Task: Analyze spending patterns across different age groups, income brackets, or genders. Approach: Use visualization techniques (e.g., bar charts, box plots) to analyze relationships between income, age, and spending behavior. Market Basket Analysis
Task: Apply market basket analysis to identify which customer segments are likely to purchase similar products or services. Approach: Use association rule learning (e.g., Apriori) to identify frequent itemsets and customer purchase patterns. Personalized Marketing Strategy
Task: Based on customer segmentation, develop personalized marketing campaigns targeting specific customer groups. Approach: Use targeted advertising or personalized product recommendations for each customer segment. Customer Churn Prediction
Task: Use the dataset to predict the likelihood of customers leaving or reducing their spending. Approach: Use classification algorithms (e.g., Logistic Regression, Decision Trees) to predict customer churn based on demographics and spending history. Customer Lifetime Value (CLV) Prediction
Task: Estimate the future revenue a customer might bring based on their current spending behavior and demographic profile. Approach: Use machine learning models to estimate CLV, incorporating spending score, income, and age.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Customer surveys are invaluable tools for businesses to gauge satisfaction and improve their offerings. Analyzing various metrics provides a comprehensive understanding of customer experiences.
Each aspect of the survey holds significance. "Overall Satisfaction" encapsulates the holistic perception customers have toward the brand. "Product Quality" and "Service Speed" pinpoint core areas needing attention. "Support Helpfulness" measures the effectiveness of customer service, while "Website Ease of Use" reflects on user experience. "Delivery Speed" and "Price Competitiveness" directly impact customer satisfaction and loyalty. "Recommendation Likelihood" indicates potential advocacy, a key driver for growth.
The "Experience with Brand" section delves into the emotional connection customers have with the brand, which goes beyond mere transactions. Open-ended questions like "Feedback Comments" uncover nuanced insights that quantitative data might miss, providing qualitative depth. "Contact Channel" analysis highlights preferred communication methods.
Effective use of this data involves understanding correlations and trends between variables. For instance, a correlation between "Product Quality" and "Overall Satisfaction" might emphasize the importance of product excellence. Similarly, identifying a discrepancy between "Service Speed" and "Support Helpfulness" could prompt improvements in customer service training.
Moreover, trends over time help identify improvements or declines, guiding strategic decisions. If "Website Ease of Use" scores drop, it might signal the need for website optimization.
Acting on customer feedback is crucial. Resolving issues highlighted in "Feedback Comments" can improve customer experience and loyalty. Recognizing high satisfaction areas aids in emphasizing and promoting these strengths.
Ultimately, interpreting this data holistically shapes actionable strategies. Prioritizing areas that significantly impact overall satisfaction while fostering a positive emotional connection with the brand strengthens customer relationships and bolsters business growth. Regular surveys ensure continual alignment with evolving customer preferences and market dynamics, fostering a customer-centric approach.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Overview This dataset comprises detailed records of customer support tickets, providing valuable insights into various aspects of customer service operations. It is designed to aid in the analysis and modeling of customer support processes, offering a wealth of information for data scientists, machine learning practitioners, and business analysts.
Dataset Description The dataset includes the following features:
Ticket ID: Unique identifier for each support ticket. Customer Name: Name of the customer who submitted the ticket. Customer Email: Email address of the customer. Customer Age: Age of the customer. Customer Gender: Gender of the customer. Product Purchased: Product for which the customer has requested support. Date of Purchase: Date when the product was purchased. Ticket Type: Type of support ticket (e.g., Technical Issue, Billing Inquiry). Ticket Subject: Brief subject or title of the ticket. Ticket Description: Detailed description of the issue or inquiry. Ticket Status: Current status of the ticket (e.g., Open, Closed, Pending). Resolution: Description of how the ticket was resolved. Ticket Priority: Priority level of the ticket (e.g., High, Medium, Low). Ticket Channel: The Channel through which the ticket was submitted (e.g., Email, Phone, Web). First Response Time: Time taken for the first response to the ticket. Time to Resolution: Total time taken to resolve the ticket. Customer Satisfaction Rating: Customer satisfaction rating for the support received. Usage This dataset can be utilized for various analytical and modeling purposes, including but not limited to:
Customer Support Analysis: Understand trends and patterns in customer support requests, and analyze ticket volumes, response times, and resolution effectiveness. NLP for Ticket Categorization: Develop natural language processing models to automatically classify tickets based on their content. Customer Satisfaction Prediction: Build predictive models to estimate customer satisfaction based on ticket attributes. Ticket Resolution Time Prediction: Predict the time required to resolve tickets based on historical data. Customer Segmentation: Segment customers based on their support interactions and demographics. Recommender Systems: Develop systems to recommend products or solutions based on past support tickets. Potential Applications: Enhancing customer support workflows by identifying bottlenecks and areas for improvement. Automating the ticket triaging process to ensure timely responses. Improving customer satisfaction through predictive analytics. Personalizing customer support based on segmentation and past interactions. File information: The dataset is provided in CSV format and contains 8470 records and [number of columns] features.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description:
This dataset offers a rich source of information for understanding sales trends, patterns, and customer demographics. It is structured to provide detailed insights at different temporal levels (daily, monthly, and yearly) and offers a comprehensive view of various products ordered by customers across different demographics.
Key Features:
Temporal Insights:
Daily Data: Provides a granular view of sales activity on a day-to-day basis, allowing users to track daily fluctuations and identify short-term trends. Monthly Data: Summarizes sales data on a monthly basis, enabling users to observe monthly trends, seasonality, and growth patterns. Yearly Data: Offers a high-level overview of sales performance on an annual basis, making it easier to identify long-term trends and growth. Product-Level Information:
Product IDs: Unique identifiers for each product in the dataset. Sales Quantity: The number of units sold for each product. Sales Revenue: The total revenue generated by each product. Customer Demographics:
Country: The country where the customer is located. State: The state or region within the country. Age: The age of the customer.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset contains information about Product name, Product price, Rate, Reviews, Summary and Sentiment in csv format. There are 104 different types of products of flipkart.com such as electronics items, clothing of men, women and kids, Home decor items, Automated systems, so on. It has 205053 rows and 6 columns. Also, if any product doesn't have any review but summary is present then Nan value already added to its blank space.
This dataset has multiclass label as sentiment such as positive, neutral amd negative.The sentiment given was based on column called Summary using NLP and Vader model. Also, after that we manually check the label and put it into the appropriate categories like if summary has text like okay, just ok or one positive and negative we labeled as neutral for better understanding while using this dataset for human languages. On the summary and price column, data cleaning method is already performed using python module called NumPy and Pandas which are famous.You can learn it also through any online resource.
Data was collected through web scraping using the library called beautifulsoup from flipkart.com. The scraping done in december 2022.
Usage
Sentiment Analysis: The text of customer reviews and the associated labels (such as positive, negative, or neutral) can be used to train machine learning models to automatically classify the sentiment of customer reviews.
Predictive Modeling: Customer ratings, summary and reviews, along with their associated labels, can be used as features to build predictive models for various outcomes, such as customer behavior, purchasing patterns, product preferences and so on.
Text Classification: The labeled customer reviews or summary can be used to train machine learning models for text classification tasks, such as spam detection, topic classification, and intent recognition,etc.
Natural Language Processing (NLP): It can be used to train NLP algorithms, such as sentiment analysis models, for applications in other domains.
Evaluating Machine Learning Models: This dataset can be used to evaluate the performance of machine learning models for sentiment analysis and other NLP tasks.
Customer Service: Customer reviews, summary and labels can provide insight into customer complaints, issues, and suggestions, which can help companies improve their customer service.
However,the applications of this type of data will depend on the specific dataset and the problem it is being used to solve.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains 500 records of customer transactions across five distinct bakeries, providing a rich source of information for analyzing consumer behavior in the bakery industry. Each record is characterized by several key features:
This dataset is designed to facilitate various analyses, including spending patterns, payment method preferences, and overall consumer trends in the bakery sector. By utilizing this dataset, stakeholders can derive actionable insights to enhance customer engagement, optimize product offerings, and inform marketing strategies.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset download from kaggle, contains detailed customer sales transactions, including age, annual income, gender, number of purchases and product details. It is useful for analyzing buying behavior, predicting trends, and optimizing sales strategies.
Dataset Features Age_Group – Customer age category (Young, Middle Age, Old) Gender – Male/Female Annual_Income – Customer's annual income in USD Number_of_Purchases – Total purchases made Purchase_Amount – Value of the purchase in USD Product_Category – Type of product purchased
Source & Context This dataset is based on anonymized retail transaction records from an e-commerce platform. It can help businesses understand customer segmentation, analyze spending habits, and build machine learning models for customer prediction.
Potential Use Cases ✔️ Customer segmentation analysis ✔️ Sales trend forecasting ✔️ Price sensitivity analysis ✔️ Predicting customer churn ✔️ Recommender system development
Example Questions to Explore Which age group makes the most purchases? What is the average purchase amount by income level? Do certain product categories sell better to specific demographics? How does income impact buying frequency?
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The "tips" dataset is a popular dataset often used for demonstration and practice in data analysis and visualization. It contains information about various attributes of customers in a restaurant, including the total bill amount, tip amount, gender, whether the customer smokes or not, the day of the week, time of day, and the size of the party.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19517213%2F49afcbc1ea63ba5f522e6aec5a75016f%2F1lodging_housespecialty_finish1.webp?generation=1711445992879706&alt=media" alt="">
total_bill: This attribute represents the total amount of the bill paid by the customer, including the cost of the meal, taxes, and any additional charges.
tip: This attribute denotes the amount of tip left by the customer. It's typically calculated as a percentage of the total bill and is often discretionary.
sex: This attribute indicates the gender of the customer. It could be either male or female.
smoker: This attribute indicates whether the customer is a smoker or a non-smoker. It's a categorical variable with two possible values: "Yes" for smokers and "No" for non-smokers.
day: This attribute represents the day of the week when the meal was consumed. It could be any of the seven days in a week (e.g., Monday, Tuesday, etc.).
time: This attribute denotes the time of the day when the meal was consumed. It's often categorized into two values: "Lunch" for meals consumed during the day and "Dinner" for meals consumed in the evening.
size: This attribute indicates the size of the party dining together. It represents the number of people included in the bill.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Simulated Dataset of Customer Purchase Behavior
This dataset contains simulated data representing customer purchase behavior. It includes various features such as age, gender, income, education, region, loyalty status, purchase frequency, purchase amount, product category, promotion usage, and satisfaction score.
age: Age of the customer.gender: Gender of the customer (0 for Male, 1 for Female).income: Annual income of the customer.education: Education level of the customer.region: Region where the customer resides.loyalty_status: Loyalty status of the customer.purchase_frequency: Frequency of purchases made by the customer.purchase_amount: Amount spent by the customer in each purchase.product_category: Category of the purchased product.promotion_usage: Indicates whether the customer used promotional offers (0 for No, 1 for Yes).satisfaction_score: Satisfaction score of the customer.The dataset was simulated using the simstudy package in R. Various distributions and formulas were used to generate synthetic data representing customer purchase behavior. The data is organized to mimic real-world scenarios, but it does not represent actual customer data.