Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset contains information about bank customers and their churn status, which indicates whether they have exited the bank or not. It is suitable for exploring and analyzing factors influencing customer churn in banking institutions and for building predictive models to identify customers at risk of churning.
RowNumber: The sequential number assigned to each row in the dataset.
CustomerId: A unique identifier for each customer.
Surname: The surname of the customer.
CreditScore: The credit score of the customer.
Geography: The geographical location of the customer (e.g., country or region).
Gender: The gender of the customer.
Age: The age of the customer.
Tenure: The number of years the customer has been with the bank.
Balance: The account balance of the customer.
NumOfProducts: The number of bank products the customer has.
HasCrCard: Indicates whether the customer has a credit card (binary: yes/no).
IsActiveMember: Indicates whether the customer is an active member (binary: yes/no).
EstimatedSalary: The estimated salary of the customer.
Exited: Indicates whether the customer has exited the bank (binary: yes/no).
This dataset is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Facebook
Twitter"Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]
Each row represents a customer, each column contains customer’s attributes described on the column Metadata.
The data set includes information about:
To explore this type of models and learn more about the subject.
New version from IBM: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113
Facebook
TwitterContext : This dataset is part of a data science project focused on customer churn prediction for a subscription-based service. Customer churn, the rate at which customers cancel their subscriptions, is a vital metric for businesses offering subscription services. Predictive analytics techniques are employed to anticipate which customers are likely to churn, enabling companies to take proactive measures for customer retention.
Content : This dataset contains anonymized information about customer subscriptions and their interaction with the service. The data includes various features such as subscription type, payment method, viewing preferences, customer support interactions, and other relevant attributes. It consists of three files such as "test.csv", "train.csv", "data_descriptions.csv".
Columns :
CustomerID: Unique identifier for each customer
SubscriptionType: Type of subscription plan chosen by the customer (e.g., Basic, Premium, Deluxe)
PaymentMethod: Method used for payment (e.g., Credit Card, Electronic Check, PayPal)
PaperlessBilling: Whether the customer uses paperless billing (Yes/No)
ContentType: Type of content accessed by the customer (e.g., Movies, TV Shows, Documentaries)
MultiDeviceAccess: Whether the customer has access on multiple devices (Yes/No)
DeviceRegistered: Device registered by the customer (e.g., Smartphone, Smart TV, Laptop)
GenrePreference: Genre preference of the customer (e.g., Action, Drama, Comedy)
Gender: Gender of the customer (Male/Female)
ParentalControl: Whether parental control is enabled (Yes/No)
SubtitlesEnabled: Whether subtitles are enabled (Yes/No)
AccountAge: Age of the customer's subscription account (in months)
MonthlyCharges: Monthly subscription charges
TotalCharges: Total charges incurred by the customer
ViewingHoursPerWeek: Average number of viewing hours per week
SupportTicketsPerMonth: Number of customer support tickets raised per month
AverageViewingDuration: Average duration of each viewing session
ContentDownloadsPerMonth: Number of content downloads per month
UserRating: Customer satisfaction rating (1 to 5)
WatchlistSize: Size of the customer's content watchlist
Acknowledgments : The dataset used in this project is obtained from Data Science Challenge on Coursera and is used for educational and research purposes. Any resemblance to real persons or entities is purely coincidental.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Customer churn is one of the most critical challenges for subscription-based and service-oriented businesses. Retaining existing customers is significantly more cost-effective than acquiring new ones, making churn prediction a key business analytics problem.
This dataset is a synthetic but business-realistic customer churn dataset designed for machine learning, data science, and predictive analytics use cases. The data simulates real-world customer behavior by incorporating customer demographics, product usage patterns, billing and payment history, customer support interactions, and engagement metrics.
The target variable, churn, indicates whether a customer is likely to discontinue the service. Churn labels are generated using business-driven rules combined with probabilistic noise, ensuring realistic feature correlations rather than random labeling.
This dataset is ideal for:
churn (0 = No, 1 = Yes)This dataset is synthetically generated for educational, research, and portfolio purposes. While it reflects realistic business patterns, it does not represent real customer data.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a comprehensive overview of customer interactions with an online retail store, aiming to predict customer churn based on various behavioral and demographic features. It includes data on customer demographics, spending behavior, satisfaction levels, and engagement with marketing campaigns. The dataset is designed for analysis and development of predictive models to identify customers at risk of churn, enabling targeted customer retention strategies.
- Customer_ID: A unique identifier for each customer.
- Age: The customer's age.
- Gender: The customer's gender (Male, Female, Other).
- Annual_Income: The annual income of the customer in thousands of dollars.
- Total_Spend: The total amount spent by the customer in the last year.
- Years_as_Customer: The number of years the individual has been a customer of the store.
- Num_of_Purchases: The number of purchases the customer made in the last year.
- Average_Transaction_Amount: The average amount spent per transaction.
- Num_of_Returns: The number of items the customer returned in the last year.
- Num_of_Support_Contacts: The number of times the customer contacted support in the last year.
- Satisfaction_Score: A score from 1 to 5 indicating the customer's satisfaction with the store.
- Last_Purchase_Days_Ago: The number of days since the customer's last purchase.
- Email_Opt_In: Whether the customer has opted in to receive marketing emails.
- Promotion_Response: The customer's response to the last promotional campaign (Responded, Ignored, Unsubscribed).
- Target_Churn: Indicates whether the customer churned (True or False).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Customer Churn Prediction Dataset is a dataset designed to predict customer churn based on various behavioral and demographic features. The dataset contains information about 1,000 customers, and includes the following key features:
Customer_ID: A unique identifier for each customer.
Age: The age of the customer (ranging from 18 to 70 years).
Gender: The gender of the customer (0 = Male, 1 = Female).
Monthly_Spending: The amount of money spent monthly by the customer (ranging from 50 to 500 units).
Subscription_Length: The number of years the customer has been subscribed to the service (ranging from 1 to 10 years).
Support_Interactions: The number of times the customer has interacted with customer support (ranging from 0 to 5).
Churn: The target variable indicating whether the customer has churned (1) or remained (0).
Facebook
TwitterThis dataset is for ABC Multistate bank with following columns:
Aim is to Predict the Customer Churn for ABC Bank.
https://miro.medium.com/max/737/1*Xap6OxaZvD7C7eMQKkaHYQ.jpeg" alt="">
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a synthetic dataset created to simulate customer behavior in a subscription-based service. It includes 15,000 rows, with each row representing a single customer.
tenure_months
How long (in months) the customer has been using the service.
monthly_usage_hours
Average number of hours the customer uses the service per month.
has_multiple_devices
Binary value (1 = yes, 0 = no). Whether the customer uses more than one device.
customer_support_calls
Number of times the customer contacted customer support.
payment_failures
Binary value (1 = yes, 0 = no). Whether the customer had recent payment issues.
is_premium_plan
Binary value (1 = yes, 0 = no). Whether the customer is on a premium subscription.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
🇬🇧 English:
This synthetic dataset was designed for those who want to practice customer churn prediction using structured tabular data. It includes 1,000 customer records, each containing features such as age, service tenure, service type, monthly fee, and churn status.
Use this dataset to:
Features:
🇹🇷 Türkçe:
Bu sentetik veri seti, müşteri kaybı (churn) tahmini üzerine çalışmak isteyen araştırmacılar ve öğrenciler için oluşturulmuştur. 1.000 müşteriye ait yaş, hizmet süresi, hizmet türü, aylık ödeme ve abonelik durumuna dair sahte ancak gerçekçi veriler içerir.
Bu veri seti sayesinde:
🧾 Değişkenler:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
Patient churn (attrition) is a critical challenge in healthcare, costing providers billions in lost revenue and disrupting continuity of care. Studies show:
Understanding which patients are at risk of leaving—and why—enables healthcare organizations to:
Content
The dataset contains 2,000 patient records with detailed behavioral, satisfaction, and engagement metrics:
Patient Demographics:
Service Utilization:
Satisfaction Metrics:
Financial & Engagement Factors:
Target Variable:
Inspiration & Use Cases
This dataset enables healthcare organizations to:
Ideal for: - Healthcare analysts, data scientists, patient experience managers, students learning classification algorithms
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This Synthetic dataset simulates customer behavior data for an online retail company and is designed to be useful for Exploratory Data Analysis (EDA) and various machine learning tasks such as:
Customer segmentation
Churn prediction
Recommendation systems
Customer lifetime value estimation
🔍 Dataset Overview: Each row represents a unique customer, and the columns provide information on their demographics, shopping habits, engagement with the website, and satisfaction.
| Column | Description |
|---|---|
CustomerID | Unique identifier for each customer |
Age | Customer's age |
Gender | Gender of the customer |
Annual_Income_USD | Annual income in US dollars |
Spending_Score | Score based on spending behavior (1–100) |
Membership_Status | Customer loyalty level (Bronze to Platinum) |
Preferred_Payment_Method | Payment method most often used |
Region | Geographical region (e.g., North, South) |
Total_Purchases | Total number of purchases made |
Avg_Purchase_Value | Average value of each purchase |
Last_Purchase_Date | Date of the most recent purchase |
Churn | Whether the customer has churned (0 = No, 1 = Yes) |
Satisfaction_Score | Satisfaction score (1–5 scale) |
Website_Visits_Last_Month | Number of visits to the website last month |
Avg_Time_Per_Visit_Minutes | Average time spent on website per visit |
Support_Tickets_Last_6_Months | Number of support tickets raised |
Referred_Friends | Number of friends referred to the platform |
✅ Use Cases: Churn Prediction: Predict if a customer will churn based on behavior and demographics.
Segmentation: Use clustering to segment customers by behavior (e.g., income, spending, satisfaction).
Classification/Regression: Predict customer satisfaction or spending score.
Recommendation Engines: Based on purchase history and behavior patterns.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a synthetic customer churn dataset designed to simulate real-world telecom customer behavior. It is generated using business-driven rules based on customer tenure, billing amount, contract type, service usage, and support interactions. Controlled randomness and noise are added to avoid perfect patterns and make the dataset suitable for realistic machine learning classification tasks. The dataset is ideal for beginners to practice exploratory data analysis, feature engineering, and customer churn prediction using machine learning models.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
"TechFlow" is a fictional SaaS company providing project management software. Like many SaaS companies, they are experiencing customer churn (users cancelling subscriptions). The company has collected data on user usage, account age, and the textual content of their latest customer support interaction.
The dataset contains 2,500 records (split into train.csv and test.csv). Each row represents a unique customer.
Daily, Weekly, Rarely).1 = Churned/Left, 0 = Retained/Stayed).Most churn datasets are purely numerical. This dataset challenges you to combine numerical analysis with Natural Language Processing (NLP).
1. EDA: Does Login_Frequency correlate with Churn?
2. NLP: Can you perform sentiment analysis on Last_Support_Ticket to see if angry customers are more likely to churn?
3. Modeling: Build a model that uses both the usage metrics and the text data to predict churn with high accuracy.
This is a synthetic dataset generated using Python's Faker library. Real-world patterns (e.g., angry support tickets leading to higher churn) were simulated using weighted probabilities to make the data useful for machine learning practice.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains synthetic data simulating customer behavior for a Netflix-like video streaming service. It includes 5,000 records with 14 carefully engineered features designed for churn prediction modeling, business insights, and customer segmentation.
The dataset is ideal for:
Machine learning classification tasks (churn vs. non-churn)
Exploratory data analysis (EDA)
Customer behavior modeling in OTT platforms
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A comprehensive, production-scale synthetic dataset for customer churn prediction in the telecommunications industry. This dataset contains 1,000,000 customer records with 28 features designed to simulate real-world churn behavior while maintaining privacy compliance. Perfect for exploring machine learning, feature engineering, and business analytics in customer retention scenarios.
Telecommunications companies face significant customer attrition (churn) that directly impacts revenue. Predicting which customers are likely to churn allows businesses to implement targeted retention strategies, optimize marketing spend, and improve customer lifetime value.
| Column | Type | Description | Missing Values |
|---|---|---|---|
customer_id | String | Unique customer identifier | 0% |
signup_date | Date | Customer registration date | 0% |
age | Integer | Customer age (18-90) | 0% |
gender | Categorical | Gender: Male, Female, Other | 0% |
annual_income | Float | Annual income in USD | 3% |
education | Categorical | Education level | 0% |
marital_status | Categorical | Marital status | 0% |
dependents | Integer | Number of dependents (0-5) | 0% |
senior_citizen | Binary | 1 if age ≥ 65, else 0 | 0% |
| Column | Type | Description | Missing Values |
|---|---|---|---|
tenure | Integer | Months with company (1-72) | 0% |
contract | Categorical | Contract type: month_to_month, one_year, two_year | 0% |
payment_method | Categorical | Payment method | 0% |
paperless_billing | Categorical | Paperless billing: Yes/No | 0% |
| Column | Type | Description | Missing Values |
|---|---|---|---|
monthlycharges | Float | Monthly service charges ($20-$200) | 0% |
totalcharges | Float | Total charges to date | 0% |
num_services | Integer | Number of subscribed services (1-6) | 0% |
has_phone_service | Binary | 1 if has phone service | 0% |
has_internet_service | Binary | 1 if has internet service | 0% |
has_online_security | Binary | 1 if has online security | 0% |
has_online_backup | Binary | 1 if has online backup | 0% |
has_device_protection | Binary | 1 if has device protection | 0% |
has_tech_support | Binary | 1 if has tech support | 0% |
has_streaming_tv | Binary | 1 if has streaming TV | 0% |
has_streaming_movies | Binary | 1 if has streaming movies | 0% |
| Column | Type | Description | Missing Values |
|---|---|---|---|
customer_satisfaction | Integer | Satisfaction score (1-10) | 2% |
num_complaints | Integer | Complaints in last year (0-8) | 3% |
num_service_calls | Integer | Service calls last month (0-12) | 0% |
late_payments | Integer | Late payments last 3 months (0-5) | 0% |
avg_monthly_gb | Float | Average monthly data usage (GB) | 5% |
days_since_last_interaction | Integer | Days since last contact (1-365) | 0% |
credit_score | Integer | Credit score (300-850) | 4% |
| Column | Type | Description | Missing Values |
|---|---|---|---|
churn | Binary | Target: 1 if churned, 0 if retained | 0% |
totalcharges ≈ monthlycharges × tenure
Facebook
TwitterThis dataset was created by Study Mart
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The "Real World Customer Churn Dataset in Telco Domain" is a comprehensive collection of anonymized data that provides insights into customer behavior and churn prediction within the telecommunications industry.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F6361330%2F860271e0362e6c10503889f289201402%2FCustomer-churn.jpg?generation=1698182677600097&alt=media" alt="Dataset Image">
The dataset contains data on over 60,000 customers across more than 10+ distinct usage categories. Some of the key usage categories include:
The dataset consists of the following key files:
The "Real World Customer Churn Dataset in Telco Domain" offers a range of potential use cases, including:
This dataset's real-world aspect is of significant importance. It reflects actual customer interactions with a major telecommunications company in Sri Lanka, offering insights that can be directly applied to real-world scenarios. The dataset is sourced from one of the largest telco companies in the country, adding credibility and relevance to the insights it provides.
Understanding customer churn and usage behavior is pivotal for the telecommunications industry, and this dataset empowers researchers, data scientists, and businesses to gain deeper insights into these aspects.
The dataset is anonymized to protect customer privacy, and all data used is in compliance with privacy regulations and agreements. Users are encouraged to explore and contribute to the "Real World Customer Churn Dataset in Telco Domain."
Thank you for your valuable contributions to this dataset.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The customer churn dataset is a collection of customer data that focuses on predicting customer churn, which refers to the tendency of customers to stop using a company's products or services. The dataset contains various features that describe each customer, such as their credit score, country, gender, age, tenure, balance, number of products, credit card status, active membership, estimated salary, and churn status. The churn status indicates whether a customer has churned or not. The dataset is used to analyze and understand factors that contribute to customer churn and to build predictive models to identify customers at risk of churning. The goal is to develop strategies and interventions to reduce churn and improve customer retention
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A Tour & Travels Company Wants To Predict Whether A Customer Will Churn Or Not Based On Indicators Given Below. Help Build Predictive Models And Save The Company's Money. Perform Fascinating EDAs. The Data Was Used For Practice Purposes And Also During A Mini Hackathon, Its Completely Free To Use
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
"Telecom Customer Churn Prediction Dataset" is a synthetic dataset designed to simulate customer data for a telecommunications company. This dataset is created for the purpose of predicting customer churn, which refers to the phenomenon of customers discontinuing their services with the company. The dataset contains a variety of features that capture different aspects of customer behavior and characteristics.
The dataset includes information such as customer age, gender, contract type, monthly charges, total amount spent, number of devices connected, and the number of customer support calls made. The key focus of this dataset is the binary target variable "Churn," which indicates whether a customer has churned (1) or not (0). This variable is essential for training and evaluating predictive models aimed at identifying customers who are likely to leave the service.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset contains information about bank customers and their churn status, which indicates whether they have exited the bank or not. It is suitable for exploring and analyzing factors influencing customer churn in banking institutions and for building predictive models to identify customers at risk of churning.
RowNumber: The sequential number assigned to each row in the dataset.
CustomerId: A unique identifier for each customer.
Surname: The surname of the customer.
CreditScore: The credit score of the customer.
Geography: The geographical location of the customer (e.g., country or region).
Gender: The gender of the customer.
Age: The age of the customer.
Tenure: The number of years the customer has been with the bank.
Balance: The account balance of the customer.
NumOfProducts: The number of bank products the customer has.
HasCrCard: Indicates whether the customer has a credit card (binary: yes/no).
IsActiveMember: Indicates whether the customer is an active member (binary: yes/no).
EstimatedSalary: The estimated salary of the customer.
Exited: Indicates whether the customer has exited the bank (binary: yes/no).
This dataset is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.