63 datasets found

Predictive Analytics for Customer Churn: Dataset
kaggle.com
Updated Oct 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Safrin S (2023). Predictive Analytics for Customer Churn: Dataset [Dataset]. https://www.kaggle.com/datasets/safrin03/predictive-analytics-for-customer-churn-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 6, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Safrin S
Description
Context : This dataset is part of a data science project focused on customer churn prediction for a subscription-based service. Customer churn, the rate at which customers cancel their subscriptions, is a vital metric for businesses offering subscription services. Predictive analytics techniques are employed to anticipate which customers are likely to churn, enabling companies to take proactive measures for customer retention.

Content : This dataset contains anonymized information about customer subscriptions and their interaction with the service. The data includes various features such as subscription type, payment method, viewing preferences, customer support interactions, and other relevant attributes. It consists of three files such as "test.csv", "train.csv", "data_descriptions.csv".

Columns :

CustomerID: Unique identifier for each customer

SubscriptionType: Type of subscription plan chosen by the customer (e.g., Basic, Premium, Deluxe)

PaymentMethod: Method used for payment (e.g., Credit Card, Electronic Check, PayPal)

PaperlessBilling: Whether the customer uses paperless billing (Yes/No)

ContentType: Type of content accessed by the customer (e.g., Movies, TV Shows, Documentaries)

MultiDeviceAccess: Whether the customer has access on multiple devices (Yes/No)

DeviceRegistered: Device registered by the customer (e.g., Smartphone, Smart TV, Laptop)

GenrePreference: Genre preference of the customer (e.g., Action, Drama, Comedy)

Gender: Gender of the customer (Male/Female)

ParentalControl: Whether parental control is enabled (Yes/No)

SubtitlesEnabled: Whether subtitles are enabled (Yes/No)

AccountAge: Age of the customer's subscription account (in months)

MonthlyCharges: Monthly subscription charges

TotalCharges: Total charges incurred by the customer

ViewingHoursPerWeek: Average number of viewing hours per week

SupportTicketsPerMonth: Number of customer support tickets raised per month

AverageViewingDuration: Average duration of each viewing session

ContentDownloadsPerMonth: Number of content downloads per month

UserRating: Customer satisfaction rating (1 to 5)

WatchlistSize: Size of the customer's content watchlist

Acknowledgments : The dataset used in this project is obtained from Data Science Challenge on Coursera and is used for educational and research purposes. Any resemblance to real persons or entities is purely coincidental.
Customer Churn Prediction Dataset
kaggle.com
Updated Apr 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AJ (2025). Customer Churn Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/smayanj/customer-churn-prediction-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 12, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
AJ
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This is a synthetic dataset created to simulate customer behavior in a subscription-based service. It includes 15,000 rows, with each row representing a single customer.

Features:

tenure_months
How long (in months) the customer has been using the service.

monthly_usage_hours
Average number of hours the customer uses the service per month.

has_multiple_devices
Binary value (1 = yes, 0 = no). Whether the customer uses more than one device.

customer_support_calls
Number of times the customer contacted customer support.

payment_failures
Binary value (1 = yes, 0 = no). Whether the customer had recent payment issues.

is_premium_plan
Binary value (1 = yes, 0 = no). Whether the customer is on a premium subscription.

Target:

churn
Binary value (1 = customer will leave, 0 = customer will stay).
This is calculated based on a rule-based formula that considers factors like low tenure, low usage, support calls, and payment issues. Some randomness is added to mimic real-world uncertainty.
Data from: Customer Churn
kaggle.com
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
willian oliveira gibin (2024). Customer Churn [Dataset]. http://doi.org/10.34740/kaggle/dsv/9626375
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/9626375
Dataset updated
Oct 14, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
willian oliveira gibin
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Customer Churn Classification dataset is a vital resource for businesses seeking to understand and predict customer churn, a critical metric that represents the rate at which customers stop doing business with a company over a given period. Understanding churn is essential for any customer-focused company, as retaining customers is generally more cost-effective than acquiring new ones. The dataset is designed to provide a detailed view of customer characteristics and behaviors that could potentially lead to churn, allowing companies to take preemptive action to improve customer retention.

Breakdown of Dataset Features This dataset includes several features, each contributing valuable information for analyzing customer behaviors and identifying potential churn risks:

Customer ID: A unique identifier for each customer. This column is useful for keeping track of individual customers without revealing personal details like names or contact information. It is essential for organizing data and ensuring that individual records can be tracked over time.

Surname: This column contains the surname of the customer. While it might not directly influence churn, it could be used in personalized marketing strategies. For example, companies could address customers by their last names in emails or other forms of communication to foster a sense of personal connection.

Credit Score: A key financial indicator, the credit score reflects a customer's creditworthiness and financial health. A low credit score might indicate a higher likelihood of churn, as these customers may be more prone to financial difficulties or more likely to switch to competitors offering better financial terms.

Geography: The geographical location of customers. This feature helps businesses understand regional patterns in customer behavior, such as churn rates varying between different countries or cities. Geographic data might reveal that certain areas have more competitive markets, which could lead to higher churn.

Gender: This feature identifies the gender of customers, which can be useful in understanding churn trends across different demographics. Some studies suggest that churn rates can differ between men and women due to varying expectations, needs, and preferences in service.

Age: Age plays a significant role in customer churn, as different age groups tend to have distinct purchasing habits and loyalty tendencies. Younger customers might be more open to exploring competitor options, while older customers might exhibit more loyalty but could churn if they feel underappreciated.

Tenure: This feature reflects how long a customer has been with the company. Longer tenure typically correlates with greater loyalty, as these customers have built a more robust relationship with the company. However, if long-tenured customers churn, it could signal deeper issues with service quality or product offerings.

Balance: The account balance of customers, which provides insight into their financial involvement with the company. Customers with higher balances may be less likely to churn, as they are more financially invested in the company, while customers with lower balances may have less at stake and are more likely to switch to competitors.

Number of Products Held: The number of products or services the customer is subscribed to. Generally, customers who use multiple products are more likely to remain loyal, as switching would involve more effort and a higher cost in terms of time and disruption to their routine.

Credit Card Status: This feature identifies whether the customer has a credit card issued by the company. Customers who own a credit card might have a stronger financial relationship with the company and, as a result, could exhibit lower churn rates. However, if customers are dissatisfied with their credit card, it might lead to a higher chance of churn.

Active Membership Status: Indicates whether the customer is actively using their membership or account. Customers with active accounts are usually more engaged with the company's products or services and are less likely to churn. In contrast, customers with inactive memberships might be at risk of churn due to disinterest or dissatisfaction.

Estimated Salary: A customer's estimated salary provides an indication of their financial well-being. Higher-income customers may have different expectations of service quality and could churn if they feel that the company isn't meeting their standards. Conversely, lower-income customers might be more sensitive to pricing and more prone to switch for better deals.

Exited: This is the target column, which indicates whether the customer has churned (1 for churned and 0 for not churned). This is the dependent variable that is predicted based on the other features, and it forms the basis of churn prediction models.

Importance of Churn Prediction The Custo...
A
‘JB Link Telco Customer Churn’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘JB Link Telco Customer Churn’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-jb-link-telco-customer-churn-742f/5fbf9511/?iid=042-751&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘JB Link Telco Customer Churn’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/johnflag/jb-link-telco-customer-churn on 28 January 2022.

--- Dataset description provided by original source is as follows ---

This is a customized version of the widely known IBM Telco Customer Churn dataset. I've added a few more columns and modified others in order to make it a little more realistic.

My customizations are based on the following version: Telco customer churn (11.1.3+)

Below you may find a fictional business problem I created. You may use it in order to start developing something around this dataset.

JB Link Customer Churn Problem

JB Link is a small size telecom company located in the state of California that provides Phone and Internet services to customers on more than a 1,000 cities and 1,600 zip codes.

The company is in the market for just 6 years and has quickly grown by investing on infrastructure to bring internet and phone networks to regions that had poor or no coverage.

The company also has a very skilled sales team that is always performing well on attracting new customers. The number of new customers acquired in the past quarter represent 15% over the total.

However, by the end of this same period, only 43% of this customers stayed with the company and most of them decided on not renewing their contracts after a few months, meaning the customer churn rate is very high and the company is now facing a big challenge on retaining its customers.

The total customer churn rate last quarter was around 27%, resulting in a decrease of almost 12% in the total number of customers.

The executive leadership of JB Link is aware that some competitors are investing on new technologies and on the expansion of their network coverage and they believe this is one of the main drivers of the high customer churn rate.

Therefore, as an action plan, they have decided to created a task force inside the company that will be responsible to work on a customer retention strategy.

The task force will involve members from different areas of the company, including Sales, Finance, Marketing, Customer Service, Tech Support and a recent formed Data Science team.

The data science team will play a key role on this process and was assigned some very important tasks that will support on the decisions and actions the other teams will be taking : - Gather insights from the data to understand what is driving the high customer churn rate. - Develop a Machine Learning model that can accurately predict the customers that are more likely to churn. - Prescribe customized actions that could be taken in order to retain each of those customers.

The Data Science team was given a dataset with a random sample of 7,043 customers that can help on achieving this task.

The executives are aware that the cost of acquiring a new customer can be up to five times higher than the cost of retaining a customer, so they are expecting that the results of this project will save a lot of money to the company and make it start growing again.

--- Original source retains full ownership of the source dataset ---
Auto Insurance churn analysis dataset
kaggle.com
Updated Apr 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Merishna Singh Suwal (2023). Auto Insurance churn analysis dataset [Dataset]. https://www.kaggle.com/datasets/merishnasuwal/auto-insurance-churn-analysis-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 30, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Merishna Singh Suwal
Description
The provided data asset is relational and consists of four distinct data files.

1. address.csv: contains address information

2. customer.csv: contains customer information.

3. demographic.csv: contains demographic data

4. termination.csv: includes customer termination information.

5. autoinsurance_churn.csv: includes merged customer churn data generated from this notebook.

All data sets are linked using either ADDRESS_ID or INDIVIDUAL_ID. The ADDRESS_ID pertains to a specific postal service address, while the INDIVIDUAL_ID is unique to each individual. It is important to note that multiple customers may be assigned to the same address, and not all customers have demographic information available.

Size of the data set

The data set includes 1,536,673 unique addresses and 2,280,321 unique customers, of which 2,112,579 have demographic information. Additionally, 269,259 customers cancelled their policies within the previous year.

Note

Please note that the data is synthetic, and all customer information provided is fictitious. While the latitude-longitude information can be mapped at a high level and generally refers to the Dallas-Fort Worth Metroplex in North Texas, it is important to note that drilling down too far may result in some data points that are located in the middle of Jerry World, DFW Airport, or Lake Grapevine. The physical addresses provided are fake and are unrelated to the corresponding lat/long.

The termination table includes the ACCT_SUSPD_DATE field, which can be used to derive a binary churn/did not churn variable. The data set is modelable, meaning that the other data available can be used to predict which customers churned and which did not. The underlying logic used to make these predictions should align with predicting auto insurance churn in the real world.
A
‘Customer Churn’ analyzed by Analyst-2
analyst-2.ai
Updated Mar 5, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2018). ‘Customer Churn’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-customer-churn-4f0b/a31eb722/?iid=005-065&v=presentation
Explore at:
Dataset updated
Mar 5, 2018
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Customer Churn’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hassanamin/customer-churn on 14 February 2022.

--- Dataset description provided by original source is as follows ---

Binary Customer Churn

A marketing agency has many customers that use their service to produce ads for the client/customer websites. They've noticed that they have quite a bit of churn in clients. They basically randomly assign account managers right now, but want you to create a machine learning model that will help predict which customers will churn (stop buying their service) so that they can correctly assign the customers most at risk to churn an account manager. Luckily they have some historical data, can you help them out? Create a classification algorithm that will help classify whether or not a customer churned. Then the company can test this against incoming data for future customers to predict which customers will churn and assign them an account manager.

Content

The data is saved as customer_churn.csv. Here are the fields and their definitions:

Name : Name of the latest contact at Company

Age: Customer Age

Total_Purchase: Total Ads Purchased

Account_Manager: Binary 0=No manager, 1= Account manager assigned

Years: Totaly Years as a customer

Num_sites: Number of websites that use the service.

Onboard_date: Date that the name of the latest contact was onboarded

Location: Client HQ Address

Company: Name of Client Company

Once you've created the model and evaluated it, test out the model on some new data (you can think of this almost like a hold-out set) that your client has provided, saved under new_customers.csv. The client wants to know which customers are most likely to churn given this data (they don't have the label yet).

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?

--- Original source retains full ownership of the source dataset ---
A
‘Telco Customer Churn’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Telco Customer Churn’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-telco-customer-churn-d8d8/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Telco Customer Churn’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/blastchar/telco-customer-churn on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

"Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]

Content

Each row represents a customer, each column contains customer’s attributes described on the column Metadata.

The data set includes information about:

Customers who left within the last month – the column is called Churn

Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies

Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges

Demographic info about customers – gender, age range, and if they have partners and dependents

Inspiration

To explore this type of models and learn more about the subject.

New version from IBM: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113

--- Original source retains full ownership of the source dataset ---
Data from: CUSTOMER CHURN DATASET
kaggle.com
Updated Jun 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
joy11117 (2024). CUSTOMER CHURN DATASET [Dataset]. https://www.kaggle.com/datasets/joy11117/customer-churn-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 4, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
joy11117
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The Customer log dataset is a 12.5 GB JSON file and it contains 18 columns and 26,259,199 records. There are 12 string columns and 6 numeric columns, which may also contain null or NaN values. The columns include userId, artist, auth, firstName, gender, itemInSession, lastName, length, level, location, method, page, registration, sessionId, song,status, ts and userAgent. As evident from the column names, the dataset contains various user-related information, such as user identifiers, demographic details (firstName, lastName, gender), interaction details (artist, song, length, itemInSession, sessionId, registration, lastinteraction) and technical details (userAgent, method, page, location, status, level, auth).

Acknowledgment

Current Dataset has been collected from the Udacity Data Science Nanodegree program. So big thanks to them.

Related paper

https://ieeexplore.ieee.org/abstract/document/10530632/

Citation

IEEE Dataport Link: https://dx.doi.org/10.21227/wc9d-b672

IEEE

Usman JOY, June 1, 2023, "Customer Churn Dataset", IEEE Dataport, doi: https://dx.doi.org/10.21227/wc9d-b672.

Bibtex:

@data{wc9d-b672-23, doi = {10.21227/wc9d-b672}, url = {https://dx.doi.org/10.21227/wc9d-b672}, author = {JOY, Usman}, publisher = {IEEE Dataport}, title = {Customer Churn Dataset}, year = {2023} }
A
‘Internet Service Provider Customer Churn’ analyzed by Analyst-2
analyst-2.ai
Updated Sep 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Internet Service Provider Customer Churn’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-internet-service-provider-customer-churn-3c5a/d575cd68/?iid=005-018&v=presentation
Explore at:
Dataset updated
Sep 28, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Internet Service Provider Customer Churn’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mehmetsabrikunt/internet-service-churn on 28 January 2022.

--- Dataset description provided by original source is as follows ---

if u like the dataset, please upvoted it.

Context

There is a big competition between Internet providers. If a providers want to increase its revenue they needs more subscriber but keep existing customer is more important than having new ones. So providers want to know which customer should cancel his service. we call this as churn. if the know who will go, maybe they can catch them with promotions.

Content

we collect data for customer who use internet services and labeling the data if the customer is churn or not. U can use this dataset for create a churn model and predict the churn probability

if u use and like the dataset please give feedback me. thanks

--- Original source retains full ownership of the source dataset ---
A
‘Bank Turnover Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Bank Turnover Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-bank-turnover-dataset-db8f/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Bank Turnover Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/barelydedicated/bank-customer-churn-modeling on 28 January 2022.

--- No further description of dataset provided by original source ---

--- Original source retains full ownership of the source dataset ---
A
‘Churn for Bank Customers’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Churn for Bank Customers’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-churn-for-bank-customers-2e90/7961ea42/?iid=013-142&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Churn for Bank Customers’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mathchi/churn-for-bank-customers on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Content

RowNumber—corresponds to the record (row) number and has no effect on the output.

CustomerId—contains random values and has no effect on customer leaving the bank.

Surname—the surname of a customer has no impact on their decision to leave the bank.

CreditScore—can have an effect on customer churn, since a customer with a higher credit score is less likely to leave the bank.

Geography—a customer’s location can affect their decision to leave the bank.

Gender—it’s interesting to explore whether gender plays a role in a customer leaving the bank.

Age—this is certainly relevant, since older customers are less likely to leave their bank than younger ones.

Tenure—refers to the number of years that the customer has been a client of the bank. Normally, older clients are more loyal and less likely to leave a bank.

Balance—also a very good indicator of customer churn, as people with a higher balance in their accounts are less likely to leave the bank compared to those with lower balances.

NumOfProducts—refers to the number of products that a customer has purchased through the bank.

HasCrCard—denotes whether or not a customer has a credit card. This column is also relevant, since people with a credit card are less likely to leave the bank.

IsActiveMember—active customers are less likely to leave the bank.

EstimatedSalary—as with balance, people with lower salaries are more likely to leave the bank compared to those with higher salaries.

Exited—whether or not the customer left the bank.

Acknowledgements

As we know, it is much more expensive to sign in a new client than keeping an existing one.

It is advantageous for banks to know what leads a client towards the decision to leave the company.

Churn prevention allows companies to develop loyalty programs and retention campaigns to keep as many customers as possible.

--- Original source retains full ownership of the source dataset ---
f
Details of feature variables of the data set.
plos.figshare.com
xls
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ke Peng; Yan Peng; Wenguang Li (2023). Details of feature variables of the data set. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0289724.t002
Dataset updated
Dec 8, 2023
Dataset provided by
PLOS ONE
Authors
Ke Peng; Yan Peng; Wenguang Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
f
Comparison of GA-XGBoost with XGBoost and LightGBM test results.
figshare.com
xls
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ke Peng; Yan Peng; Wenguang Li (2023). Comparison of GA-XGBoost with XGBoost and LightGBM test results. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t008
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0289724.t008
Dataset updated
Dec 8, 2023
Dataset provided by
PLOS ONE
Authors
Ke Peng; Yan Peng; Wenguang Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of GA-XGBoost with XGBoost and LightGBM test results.
A
‘Client churn rate in Telecom sector’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 18, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2016). ‘Client churn rate in Telecom sector’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-client-churn-rate-in-telecom-sector-72d0/latest
Explore at:
Dataset updated
Feb 18, 2016
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Client churn rate in Telecom sector’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sagnikpatra/edadata on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context "Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs."

Content The Orange Telecom's Churn Dataset, which consists of cleaned customer activity data (features), along with a churn label specifying whether a customer canceled the subscription, will be used to develop predictive models. Two datasets are made available here: The churn-80 and churn-20 datasets can be downloaded.

The two sets are from the same batch, but have been split by an 80/20 ratio. As more data is often desirable for developing ML models, let's use the larger set (that is, churn-80) for training and cross-validation purposes, and the smaller set (that is, churn-20) for final testing and model performance evaluation.

Inspiration To explore this type of models and learn more about the subject.

--- Original source retains full ownership of the source dataset ---
Customer Churn Telecom Industry Dataset
kaggle.com
Updated May 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ShyamBhojak1994 (2025). Customer Churn Telecom Industry Dataset [Dataset]. https://www.kaggle.com/datasets/shyambhojak1994/customer-churn-telecom-industry-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 28, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ShyamBhojak1994
Description
Dataset

This dataset was created by Shyam Bhojak

Contents
f
Confusion matrix.
plos.figshare.com
xls
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ke Peng; Yan Peng; Wenguang Li (2023). Confusion matrix. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0289724.t004
Dataset updated
Dec 8, 2023
Dataset provided by
PLOS ONE
Authors
Ke Peng; Yan Peng; Wenguang Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
Churn Classification
kaggle.com
Updated Feb 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
synful (2023). Churn Classification [Dataset]. https://www.kaggle.com/datasets/synful/churn-classification
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
synful
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Customer Churn Analysis

Customer churn, also known as customer attrition, is when a customer essentially stops being a customer- ie, they choose to stop using your products or services. Customer Churn is one of the most important and challenging problems for businesses such as Credit Card companies, cable service providers, SASS and telecommunication companies worldwide.

What is Churn Analysis? Customer churn analysis is the process of using your churn data to understand:

Which customers are leaving? Why are they leaving? What can you do to reduce churn? As you may have guessed, churn analysis goes beyond just looking at your customer churn rate. It’s about discovering the underlying causes behind your numbers.

Ultimately, successful churn analysis will give you the valuable insights you need to start reducing your business’s customer attrition rate.
Data from: Customer Churn Dataset
kaggle.com
Updated Mar 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Panda-monium (2024). Customer Churn Dataset [Dataset]. https://www.kaggle.com/datasets/divanshu22/customer-churn-dataset/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 2, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Panda-monium
Description
Dataset

This dataset was created by Panda-monium

Contents
Customer Churn Analysis of Kiwibank
kaggle.com
Updated Jun 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
smmmmmmmmmmmm (2024). Customer Churn Analysis of Kiwibank [Dataset]. https://www.kaggle.com/datasets/smmmmmmmmmmmm/customer-churn-analysis-of-kiwibank
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
smmmmmmmmmmmm
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides insights into customer churn patterns and behaviors for Kiwibank, a leading New Zealand-owned financial institution. It includes demographic information (such as age, gender, geography), banking metrics (credit score, balance, products), and customer activity indicators. The dataset is suitable for predictive modeling tasks (e.g., predicting customer churn using machine learning algorithms like Naive Bayes, Random Forest, and Decision Tree) and clustering analysis (e.g., K-Means clustering to identify customer segments). Analyzing this dataset can help financial analysts, data scientists, and business strategists understand factors influencing customer retention and optimize strategies to improve customer satisfaction and loyalty. Key Features: Customer demographics: Age, gender, geography. Banking metrics: Credit score, balance, number of products. Customer activity: Tenure, usage of credit cards, activity level. Target variable: Churn (1 if the customer has churned, 0 otherwise). Potential Use Cases: Predictive modeling for customer churn prevention. Segmentation analysis to target marketing campaigns. Insights for enhancing customer retention strategies.
Bank Customer Churn
kaggle.com
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CAT Reloaded || Data Science circle (2025). Bank Customer Churn [Dataset]. https://www.kaggle.com/datasets/cat-reloaded-data-science/bank-customer-churn
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 14, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
CAT Reloaded || Data Science circle
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Bank Customer Churn Dataset is a collection of data related to customers of a bank who have either left (churned) or stayed with the bank. This dataset is typically used for predictive modeling to identify patterns and factors that lead to customer churn, enabling banks to take proactive measures to retain customers.

id: Unique identifier for each customer.

CustomerId: Unique identifier for the customer account.

Surname: Last name of the customer.

CreditScore: Numeric representation of the customer's creditworthiness.

Geography:str, Gender:str:Country or region where the customer resides ,Gender of the customer (e.g., Male, Female).

Age: Age of the customer.

Tenure: Number of years the customer has been with the bank.

Balance: Current balance in the customer's account.

NumOfProducts: Number of bank products the customer uses.

HasCrCard: Binary indicator (0 or 1) for whether the customer has a credit card.

IsActiveMember: Binary indicator (0 or 1) for whether the customer is an active member.

EstimatedSalary: Estimated salary of the customer.

Exited: Binary indicator (0 or 1) for whether the customer has churned (the target).

Facebook

Twitter

Click to copy link

Link copied

Cite

Safrin S (2023). Predictive Analytics for Customer Churn: Dataset [Dataset]. https://www.kaggle.com/datasets/safrin03/predictive-analytics-for-customer-churn-dataset

Predictive Analytics for Customer Churn: Dataset

Analyzing Customer Behavior to Predict Churn: A Subscription Service Case Study

Explore at:

19 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 6, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Safrin S

Description

Context : This dataset is part of a data science project focused on customer churn prediction for a subscription-based service. Customer churn, the rate at which customers cancel their subscriptions, is a vital metric for businesses offering subscription services. Predictive analytics techniques are employed to anticipate which customers are likely to churn, enabling companies to take proactive measures for customer retention.

Content : This dataset contains anonymized information about customer subscriptions and their interaction with the service. The data includes various features such as subscription type, payment method, viewing preferences, customer support interactions, and other relevant attributes. It consists of three files such as "test.csv", "train.csv", "data_descriptions.csv".

Columns :

CustomerID: Unique identifier for each customer

SubscriptionType: Type of subscription plan chosen by the customer (e.g., Basic, Premium, Deluxe)

PaymentMethod: Method used for payment (e.g., Credit Card, Electronic Check, PayPal)

PaperlessBilling: Whether the customer uses paperless billing (Yes/No)

ContentType: Type of content accessed by the customer (e.g., Movies, TV Shows, Documentaries)

MultiDeviceAccess: Whether the customer has access on multiple devices (Yes/No)

DeviceRegistered: Device registered by the customer (e.g., Smartphone, Smart TV, Laptop)

GenrePreference: Genre preference of the customer (e.g., Action, Drama, Comedy)

Gender: Gender of the customer (Male/Female)

ParentalControl: Whether parental control is enabled (Yes/No)

SubtitlesEnabled: Whether subtitles are enabled (Yes/No)

AccountAge: Age of the customer's subscription account (in months)

MonthlyCharges: Monthly subscription charges

TotalCharges: Total charges incurred by the customer

ViewingHoursPerWeek: Average number of viewing hours per week

SupportTicketsPerMonth: Number of customer support tickets raised per month

AverageViewingDuration: Average duration of each viewing session

ContentDownloadsPerMonth: Number of content downloads per month

UserRating: Customer satisfaction rating (1 to 5)

WatchlistSize: Size of the customer's content watchlist

Acknowledgments : The dataset used in this project is obtained from Data Science Challenge on Coursera and is used for educational and research purposes. Any resemblance to real persons or entities is purely coincidental.

Clear search

Close search

Google apps

Main menu

Predictive Analytics for Customer Churn: Dataset

Customer Churn Prediction Dataset

Features:

Target:

Data from: Customer Churn

‘JB Link Telco Customer Churn’ analyzed by Analyst-2

JB Link Customer Churn Problem

Auto Insurance churn analysis dataset

Size of the data set

Note

‘Customer Churn’ analyzed by Analyst-2

Binary Customer Churn

Content

Acknowledgements

Inspiration

‘Telco Customer Churn’ analyzed by Analyst-2

Context

Content

Inspiration

Data from: CUSTOMER CHURN DATASET

Acknowledgment

Related paper

Citation

IEEE

Bibtex:

‘Internet Service Provider Customer Churn’ analyzed by Analyst-2

if u like the dataset, please upvoted it.

Context

Content

‘Bank Turnover Dataset’ analyzed by Analyst-2

‘Churn for Bank Customers’ analyzed by Analyst-2

Content

Acknowledgements

Details of feature variables of the data set.

Comparison of GA-XGBoost with XGBoost and LightGBM test results.

‘Client churn rate in Telecom sector’ analyzed by Analyst-2

Customer Churn Telecom Industry Dataset

Dataset

Contents

Confusion matrix.

Churn Classification

Data from: Customer Churn Dataset

Dataset

Contents

Customer Churn Analysis of Kiwibank

Bank Customer Churn

Predictive Analytics for Customer Churn: Dataset

Analyzing Customer Behavior to Predict Churn: A Subscription Service Case Study