65 datasets found

c
Data from: Telco Customer Churn Dataset
cubig.ai
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Telco Customer Churn Dataset [Dataset]. https://cubig.ai/store/products/312/telco-customer-churn-dataset
Explore at:
Dataset updated
May 28, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Telco Customer Churn Dataset includes carrier customer service usage, account information, demographics and churn, which can be used to predict and analyze customer churn.

2) Data Utilization (1) Telco Customer Churn Dataset has characteristics that: • This dataset includes a variety of customer and service characteristics, including gender, age group, partner and dependents, service subscription status (telephone, Internet, security, backup, device protection, technical support, streaming, etc.), contract type, payment method, monthly fee, total fee, and departure. (2) Telco Customer Churn Dataset can be used to: • Development of customer churn prediction model: Using customer service usage patterns and account information, we can build a machine learning-based churn prediction model to proactively identify customers at risk of churn.
Customer Churn - Decision Tree & Random Forest
kaggle.com
Updated Jul 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
vikram amin (2023). Customer Churn - Decision Tree & Random Forest [Dataset]. https://www.kaggle.com/datasets/vikramamin/customer-churn-decision-tree-and-random-forest
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 6, 2023
Dataset provided by
Kaggle
Authors
vikram amin
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Main objective: Find out customers who will churn and who will not.

Methodology: It is a classification problem. We will use decision tree and random forest to predict the outcome.

Steps Involved

Read the data

Check for data types https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F1ffb600d8a4b4b36bc25e957524a3524%2FPicture1.png?generation=1688638600831386&alt=media" alt="">

Change character vector to factor vector as this is as classification problem

Drop the variable which is not significant for the analysis. We drop "customerID".

Check for missing values. None are found.

Split the data into train and test so we can use the train data for building the model and use test data for prediction. We split this into 80-20 ratio (train/test) using the sample function.

Install and run libraries (rpart, rpart.plot, rattle, RColorBrewer, caret)

Run decision tree using rpart function. The dependent variable is Churn and 19 other independent variables

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8d3442e6c82d8026c6a448e4780ab38c%2FPicture2.png?generation=1688638685268853&alt=media" alt=""> 9. Plot the decision tree

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F9ab0591e323dc30fe116c79f6d014d06%2FPicture3.png?generation=1688638747644320&alt=media" alt="">

Average customer churn is 27%. The churn can take place if the tenure is more than >=7.5 and there is no internet service

Tuning the model

Define the search grid using the expand.grid function

Set up the control parameters through 5 fold cross validation

When we print the model we get the best CP = 0.01 and an accuracy of 79.00%

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F16080ac04d3743ec238227e1ef2c8269%2FPicture4.png?generation=1688639197455166&alt=media" alt="">

Predict the model

Find out the variables which are most and least significant. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F61beb4224e9351cfc772147c43800502%2FPicture5.png?generation=1688639468638950&alt=media" alt="">

Significant variables are Internet Service, Tenure and the least significant are Streaming Movies, Tech Support.

USE RANDOM FOREST

Run library(randomForest). Here we are using the default ntree (500) and mtry (p/3) where p is the number of independent variables. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc27fe7e83f0b53b7e067371b69c7f4a7%2FPicture6.png?generation=1688640478682685&alt=media" alt="">

Through confusion matrix, accuracy is coming 79.27%. The accuracy is marginally higher than that of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and much higher when predicting "Yes".

Plot the model showing which variables reduce the gini impunity the most and least. Total charges and tenure reduce the gini impunity the most while phone service has the least impact.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fec25fc3ba74ab9cef1a81188209512b1%2FPicture7.png?generation=1688640726235724&alt=media" alt="">

Predict the model and create a new data frame showing the actuals vs predicted values

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F50aa40e5dd676c8285020fd2fe627bf1%2FPicture8.png?generation=1688640896763066&alt=media" alt="">

Plot the model so as to find out where the OOB (out of bag ) error stops decreasing or becoming constant. As we can see that the error stops decreasing between 100 to 200 trees. So we decide to take ntree = 200 when we tune the model.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F87211e1b218c595911fbe6ea2806e27a%2FPicture9.png?generation=1688641103367564&alt=media" alt="">

Tune the model mtry=2 has the lowest OOB error rate

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F6057af5bb0719b16f1a97a58c3d4aa1d%2FPicture10.png?generation=1688641391027971&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc7045eba4ee298c58f1bd0230c24c00d%2FPicture11.png?generation=1688641605829830&alt=media" alt="">

Use random forest with mtry = 2 and ntree = 200

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F01541eff1f9c6303591aa50dd707b5f5%2FPicture12.png?generation=1688641634979403&alt=media" alt="">

Through confusion matrix, accuracy is coming 79.71%. The accuracy is marginally higher than that of default (when ntree was 500 and mtry was 4) i.e 79.27% and of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and m...
t
Telco_Customer_churn_Data
test.researchdata.tuwien.at
bin, csv, png
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erum Naz; Erum Naz; Erum Naz; Erum Naz (2025). Telco_Customer_churn_Data [Dataset]. http://doi.org/10.82556/b0ch-cn44
Explore at:
png, csv, binAvailable download formats
Unique identifier
https://doi.org/10.82556/b0ch-cn44
Dataset updated
Apr 28, 2025
Dataset provided by
TU Wien
Authors
Erum Naz; Erum Naz; Erum Naz; Erum Naz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 28, 2025
Description
Context and Methodology

The dataset originates from the research domain of Customer Churn Prediction in the Telecom Industry. It was created as part of the project "Data-Driven Churn Prediction: ML Solutions for the Telecom Industry," completed within the Data Stewardship course (Master programme Data Science, TU Wien).

The primary purpose of this dataset is to support machine learning model development for predicting customer churn based on customer demographics, service usage, and account information.
The dataset enables the training, testing, and evaluation of classification algorithms, allowing researchers and practitioners to explore techniques for customer retention optimization.

The dataset was originally obtained from the IBM Accelerator Catalog and adapted for academic use. It was uploaded to TU Wien’s DBRepo test system and accessed via SQLAlchemy connections to the MariaDB environment.

Technical Details

The dataset has a tabular structure and was initially stored in CSV format. It contains:

Rows: 7,043 customer records

Columns: 21 features including customer attributes (gender, senior citizen status, partner status), account information (tenure, contract type, payment method), service usage (internet service, streaming TV, tech support), and the target variable (Churn: Yes/No).

Naming Convention:

The table in the database is named telco_customer_churn_data.

Software Requirements:

To open and work with the dataset, any standard database client or programming language supporting MariaDB connections can be used (e.g., Python etc).

For machine learning applications, libraries such as pandas, scikit-learn, and joblib are typically used.

Additional Resources:

Source code for data loading, preprocessing, model training, and evaluation is available at the associated GitHub repository: https://github.com/nazerum/fair-ml-customer-churn

Further Details

When reusing the dataset, users should be aware:

Licensing: The dataset is shared under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Use Case Suitability: The dataset is best suited for classification tasks, particularly binary classification (churn vs. no churn).

Metadata Standards: Metadata describing the dataset adheres to FAIR principles and is supplemented by CodeMeta and Croissant standards for improved interoperability.
Synthetic Telecom Customer Churn Data
kaggle.com
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdulrahman Qaten (2025). Synthetic Telecom Customer Churn Data [Dataset]. https://www.kaggle.com/datasets/abdulrahmanqaten/synthetic-customer-churn/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 27, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abdulrahman Qaten
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
If you found the dataset useful, your upvote will help others discover it. Thanks for your support!

This dataset simulates customer behavior for a fictional telecommunications company. It contains demographic information, account details, services subscribed to, and whether the customer ultimately churned (stopped using the service) or not. The data is synthetically generated but designed to reflect realistic patterns often found in telecom churn scenarios.

Purpose:

The primary goal of this dataset is to provide a clean and straightforward resource for beginners learning about:

Exploratory Data Analysis (EDA): Understanding customer characteristics and identifying potential drivers of churn through visualization and statistical summaries.

Data Preprocessing: Handling categorical features (like converting text to numbers) and scaling numerical features.

Classification Modeling: Building and evaluating simple machine learning models (like Logistic Regression or Decision Trees) to predict customer churn.

Features:

The dataset includes the following columns:

CustomerID: Unique identifier for each customer.

Age: Customer's age in years.

Gender: Customer's gender (Male/Female).

Location: General location of the customer (e.g., New York, Los Angeles).

SubscriptionDurationMonths: How many months the customer has been subscribed.

MonthlyCharges: The amount the customer is charged each month.

TotalCharges: The total amount the customer has been charged over their subscription period.

ContractType: The type of contract the customer has (Month-to-month, One year, Two year).

PaymentMethod: How the customer pays their bill (e.g., Electronic check, Credit card).

OnlineSecurity: Whether the customer has online security service (Yes, No, No internet service).

TechSupport: Whether the customer has tech support service (Yes, No, No internet service).

StreamingTV: Whether the customer has TV streaming service (Yes, No, No internet service).

StreamingMovies: Whether the customer has movie streaming service (Yes, No, No internet service).

Churn: (Target Variable) Whether the customer churned (1 = Yes, 0 = No).

Data Quality:

This dataset is intentionally clean with no missing values, making it easy for beginners to focus on analysis and modeling concepts without complex data cleaning steps.

Inspiration:

Understanding customer churn is crucial for many businesses. This dataset provides a sandbox environment to practice the fundamental techniques used in churn analysis and prediction.
f
Comparison of the proposed algorithms with other ensemble models.
plos.figshare.com
xls
Updated Jun 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari (2024). Comparison of the proposed algorithms with other ensemble models. [Dataset]. http://doi.org/10.1371/journal.pone.0303881.t009
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303881.t009
Dataset updated
Jun 6, 2024
Dataset provided by
PLOS ONE
Authors
Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of the proposed algorithms with other ensemble models.
Synthetic Customer Churn Prediction Dataset
opendatabay.com
.undefined
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Opendatabay Labs (2025). Synthetic Customer Churn Prediction Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/5d7ef013-5848-4367-bf3b-2ce359587b43
Explore at:
.undefinedAvailable download formats
Dataset updated
May 6, 2025
Dataset provided by
Buy & Sell Data | Opendatabay - AI & Synthetic Data Marketplace
Authors
Opendatabay Labs
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Retail & Consumer Behavior
Description
This Synthetic Customer Churn Prediction Dataset has been designed as an educational resource for exploring data science, machine learning, and predictive modelling techniques in a customer retention context. The dataset simulates key attributes relevant to customer churn analysis, such as service usage, contract details, and customer demographics. It allows users to practice data manipulation, visualization, and the development of models to predict churn behaviour in industries like telecommunications, subscription services, or utilities.

Dataset Features:

Customer_Id: Unique identifier for each customer (not included in this dataset for privacy).

Gender: Gender of the customer (e.g., "Male," "Female").

Partner: Whether the customer has a partner (e.g., "Yes," "No").

Dependents: Whether the customer has dependents (e.g., "Yes," "No").

Tenure (Months): The number of months the customer has been with the company.

PhoneService: Whether the customer has a phone service (e.g., "Yes," "No").

MultipleLines: Whether the customer has multiple phone lines (e.g., "Yes," "No phone service").

InternetService: Type of internet service (e.g., "DSL," "Fiber optic," "No").

OnlineSecurity: Whether the customer has online security services (e.g., "Yes," "No," "No internet service").

OnlineBackup: Whether the customer has online backup services (e.g., "Yes," "No," "No internet service").

DeviceProtection: Whether the customer has device protection services (e.g., "Yes," "No," "No internet service").

TechSupport: Whether the customer has tech support services (e.g., "Yes," "No," "No internet service").

StreamingTV: Whether the customer has streaming TV services (e.g., "Yes," "No," "No internet service").

StreamingMovies: Whether the customer has streaming movies services (e.g., "Yes," "No," "No internet service").

Contract: Type of contract the customer has (e.g., "Month-to-month," "One year," "Two year").

PaperlessBilling: Whether the customer uses paperless billing (e.g., "Yes," "No").

PaymentMethod: The payment method used by the customer (e.g., "Electronic check," "Credit card," "Bank transfer").

MonthlyCharges: Monthly charges billed to the customer.

TotalCharges: Total charges incurred by the customer over their tenure.

Churn: Whether the customer has churned (e.g., "Yes," "No").

Distribution:

https://storage.googleapis.com/opendatabay_public/images/churn_c4aae9d4-3939-4866-a249-35d81c5965dc.png" alt="Synthetic Customer Churn Prediction Dataset Distribution">

Usage:

This dataset is useful for a variety of applications, including:

Customer Behavior Analysis: To understand factors influencing customer retention and churn.

Educational Training: To practice data cleaning, feature engineering, and visualization techniques in customer analytics.

Predictive Modeling: To build machine learning models for predicting customer churn based on service usage patterns and demographic information.

Coverage:

This dataset is synthetic and anonymized, making it a safe tool for experimentation and learning without compromising real patient privacy.

License:

CCO (Public Domain)

Who can use it:

Data scientists and enthusiasts: For developing customer analytics skills and predictive modelling expertise.

Business analysts: To understand customer churn drivers and improve retention strategies.

Educators and students: For teaching and learning applications in data science and machine learning.
f
Literature review of papers on churn prediction in telecommunication.
plos.figshare.com
xls
Updated Jun 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari (2024). Literature review of papers on churn prediction in telecommunication. [Dataset]. http://doi.org/10.1371/journal.pone.0303881.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303881.t001
Dataset updated
Jun 6, 2024
Dataset provided by
PLOS ONE
Authors
Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Literature review of papers on churn prediction in telecommunication.
h
Data from: telco-customer-churn
huggingface.co
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
aai510-group1 (2025). telco-customer-churn [Dataset]. https://huggingface.co/datasets/aai510-group1/telco-customer-churn
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 18, 2025
Dataset authored and provided by
aai510-group1
Description
Dataset Card for Telco Customer Churn

This dataset contains information about customers of a fictional telecommunications company, including demographic information, services subscribed to, location details, and churn behavior. This merged dataset combines the information from the original Telco Customer Churn dataset with additional details.

Dataset Details Dataset Description

This merged Telco Customer Churn dataset provides a comprehensive view of customer… See the full description on the dataset page: https://huggingface.co/datasets/aai510-group1/telco-customer-churn.
Data from: Telco Customer Churn
kaggle.com
Updated Jul 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sibelius_5 (2021). Telco Customer Churn [Dataset]. https://www.kaggle.com/sibelius5/telco-customer-churn/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 22, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sibelius_5
Description
To increase the accuracy of the telco churn prediction models, I merged the following two datasets (using ID as index):

https://www.kaggle.com/blastchar/telco-customer-churn https://www.kaggle.com/ylchang/telco-customer-churn-1113

The additional features of the second dataset (like satisfaction score, total revenues and cltv) can increase the accuracy of the models.

I cleaned the merged dataset and also added the clean version here ("Telco_customer_churn_cleaned.csv").

Inspiration:

The dataset can be used for EDA, classification, churn prediction, segmentation etc.
The Telco Churn.xls
kaggle.com
Updated Jun 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vincent Were (2024). The Telco Churn.xls [Dataset]. https://www.kaggle.com/datasets/wereouma/the-telco-churn-xls
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 10, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vincent Were
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The Dataset

About the Customer chun dataset

To start out, you'll be working with real data from the Kenyan Telecommunication survey. This dataset "Churn.xls" is related to customer churn analysis for a telecommunications company. Customer churn refers to the phenomenon where customers stop doing business with a company. The dataset includes various attributes of customers and their usage patterns, which are typically used to predict whether a customer is likely to leave the service (churn) or stay. Here is a brief description of the variables provided in the dataset: 1.ID: A unique identifier for each customer. 2.COLLEGE: Indicates whether the customer has a college degree ("one" for yes, "zero" for no). 3.INCOME: The annual income of the customer. 4.OVERAGE: The number of overage minutes the customer used. 5.LEFTOVER: The number of leftover minutes the customer has. 6.HOUSE: The value of the customer's house. 7.HANDSET_PRICE: The price of the customer's handset. 8.OVER_15MINS_CALLS_PER_MONTH: The number of calls per month that exceed 15 minutes. 9.AVERAGE_CALL_DURATION: The average duration of calls made by the customer. 10.REPORTED_SATISFACTION: The customer's reported level of satisfaction with the service (e.g., "unsat", "very_sat"). 11.REPORTED_USAGE_LEVEL: The customer's reported usage level of the service (e.g., "little", "very_high"). 12.CONSIDERING_CHANGE_OF_PLAN: Indicates whether the customer is considering changing their plan (e.g., "no", "considering"). 13.LEAVE: The target variable indicating whether the customer decided to leave ("LEAVE") or stay ("STAY"). Customers who left within the last month – the column is called "LEAVE". Based on these variables, the dataset shall beused for predictive modeling to identify factors that influence customer churn and to develop strategies to retain customers. The variables cover demographic information, usage patterns, customer satisfaction, and the likelihood of changing plans, all of which are crucial in understanding and predicting churn behavior.

Why Analysis? Customer churn refers to the phenomenon where customers discontinue their relationship or subscription with a company or service provider. It represents the rate at which customers stop using a company's products or services within a specific period. Churn is an important metric for businesses as it directly impacts revenue, growth, and customer retention. In the context of the Churn dataset, the churn label indicates whether a customer has churned or not. A churned customer is one who has decided to discontinue their subscription or usage of the company's services. On the other hand, a non-churned customer is one who continues to remain engaged and retains their relationship with the company. Understanding customer churn is crucial for businesses to identify patterns, factors, and indicators that contribute to customer attrition. By analyzing churn behavior and its associated features, companies can develop strategies to retain existing customers, improve customer satisfaction, and reduce customer turnover. Predictive modeling techniques can also be applied to forecast and proactively address potential churn, enabling companies to take proactive measures to retain at-risk customers.
A
‘JB Link Telco Customer Churn’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘JB Link Telco Customer Churn’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-jb-link-telco-customer-churn-742f/5fbf9511/?iid=042-751&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘JB Link Telco Customer Churn’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/johnflag/jb-link-telco-customer-churn on 28 January 2022.

--- Dataset description provided by original source is as follows ---

This is a customized version of the widely known IBM Telco Customer Churn dataset. I've added a few more columns and modified others in order to make it a little more realistic.

My customizations are based on the following version: Telco customer churn (11.1.3+)

Below you may find a fictional business problem I created. You may use it in order to start developing something around this dataset.

JB Link Customer Churn Problem

JB Link is a small size telecom company located in the state of California that provides Phone and Internet services to customers on more than a 1,000 cities and 1,600 zip codes.

The company is in the market for just 6 years and has quickly grown by investing on infrastructure to bring internet and phone networks to regions that had poor or no coverage.

The company also has a very skilled sales team that is always performing well on attracting new customers. The number of new customers acquired in the past quarter represent 15% over the total.

However, by the end of this same period, only 43% of this customers stayed with the company and most of them decided on not renewing their contracts after a few months, meaning the customer churn rate is very high and the company is now facing a big challenge on retaining its customers.

The total customer churn rate last quarter was around 27%, resulting in a decrease of almost 12% in the total number of customers.

The executive leadership of JB Link is aware that some competitors are investing on new technologies and on the expansion of their network coverage and they believe this is one of the main drivers of the high customer churn rate.

Therefore, as an action plan, they have decided to created a task force inside the company that will be responsible to work on a customer retention strategy.

The task force will involve members from different areas of the company, including Sales, Finance, Marketing, Customer Service, Tech Support and a recent formed Data Science team.

The data science team will play a key role on this process and was assigned some very important tasks that will support on the decisions and actions the other teams will be taking : - Gather insights from the data to understand what is driving the high customer churn rate. - Develop a Machine Learning model that can accurately predict the customers that are more likely to churn. - Prescribe customized actions that could be taken in order to retain each of those customers.

The Data Science team was given a dataset with a random sample of 7,043 customers that can help on achieving this task.

The executives are aware that the cost of acquiring a new customer can be up to five times higher than the cost of retaining a customer, so they are expecting that the results of this project will save a lot of money to the company and make it start growing again.

--- Original source retains full ownership of the source dataset ---
f
Summary of the datasets used in this study.
figshare.com
plos.figshare.com
xls
Updated Jun 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joydeb Kumar Sana; Mohammad Zoynul Abedin; M. Sohel Rahman; M. Saifur Rahman (2023). Summary of the datasets used in this study. [Dataset]. http://doi.org/10.1371/journal.pone.0278095.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0278095.t002
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS ONE
Authors
Joydeb Kumar Sana; Mohammad Zoynul Abedin; M. Sohel Rahman; M. Saifur Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary of the datasets used in this study.
AI-Powered Customer Churn Prediction Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). AI-Powered Customer Churn Prediction Market Research Report 2033 [Dataset]. https://dataintelo.com/report/ai-powered-customer-churn-prediction-market
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Jun 28, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
AI-Powered Customer Churn Prediction Market Outlook

According to our latest research, the AI-powered customer churn prediction market size reached USD 1.58 billion globally in 2024, with a robust CAGR of 19.7% expected from 2025 to 2033. Driven by rapid digital transformation and the increasing need for predictive analytics across sectors, the market is forecasted to attain a value of USD 7.57 billion by 2033. The growth of this market is primarily attributed to the escalating adoption of AI and machine learning technologies by enterprises seeking to reduce customer attrition, optimize retention strategies, and enhance overall customer lifetime value, as per the latest industry research.

One of the fundamental growth drivers for the AI-powered customer churn prediction market is the proliferation of customer data and the imperative need for businesses to leverage this data to drive actionable insights. With the advent of digital touchpoints, organizations are now able to collect vast amounts of structured and unstructured data from various customer interactions. This data, when processed using advanced AI and machine learning algorithms, empowers companies to predict potential churn with high accuracy. As a result, businesses across industries such as telecommunications, BFSI, retail, and healthcare are increasingly investing in AI-powered churn prediction solutions to proactively identify at-risk customers and implement targeted retention strategies, thereby reducing revenue loss and improving profitability.

Another significant factor fueling market expansion is the growing emphasis on customer experience and personalization. In today's hyper-competitive landscape, retaining existing customers has become more cost-effective than acquiring new ones. AI-powered churn prediction tools enable organizations to segment their customer base, understand behavior patterns, and tailor interventions for individual customers. This level of personalization not only helps in reducing churn rates but also enhances customer satisfaction and loyalty. The integration of AI-driven insights into CRM systems and marketing automation platforms further streamlines the process, making it easier for businesses to act on predictions in real time. Moreover, the rising adoption of cloud-based solutions has made these technologies more accessible to small and medium enterprises (SMEs), broadening the market’s reach.

The surge in demand for scalable, real-time analytics platforms is also contributing to market growth. Enterprises are increasingly seeking AI-powered solutions that can integrate seamlessly with their existing IT infrastructure, deliver instant insights, and scale as their data grows. The shift towards cloud deployment models has accelerated this trend, offering cost-effective, flexible, and easily deployable churn prediction solutions. Additionally, advancements in natural language processing (NLP), deep learning, and big data analytics are further enhancing the accuracy and reliability of churn prediction models. As organizations strive to stay ahead of the competition by minimizing customer attrition, the demand for sophisticated, AI-driven predictive analytics tools continues to rise.

Regionally, North America holds the largest market share, followed by Europe and Asia Pacific. The dominance of North America can be attributed to the early adoption of AI technologies, presence of major technology vendors, and a strong focus on customer-centric strategies among enterprises in the region. Europe is also witnessing significant growth, driven by stringent regulations around data protection and a growing emphasis on customer retention in industries like BFSI and retail. The Asia Pacific region is expected to exhibit the highest CAGR during the forecast period, fueled by rapid digitalization, increasing investments in AI, and the expansion of e-commerce and telecommunications sectors. Latin America and the Middle East & Africa are also experiencing gradual adoption, primarily in financial services and telecommunications.

Component Analysis

The component segment of the AI-powered customer churn prediction market is categorized into software and services. The software segment dominates the market, accounting for the largest share in 2024, owing to the widespread deployment of advanced AI and machine learning platforms
C
Customer Churn Software Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Customer Churn Software Report [Dataset]. https://www.marketresearchforecast.com/reports/customer-churn-software-56060
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Mar 25, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Customer Churn Software market is experiencing robust growth, driven by the increasing need for businesses across diverse sectors to improve customer retention and enhance profitability. The market's expansion is fueled by several key factors. Firstly, the rising adoption of cloud-based solutions offers scalability and cost-effectiveness, attracting a wider range of businesses. Secondly, advancements in AI and machine learning are enabling more sophisticated churn prediction and proactive customer engagement strategies. The telecommunications, banking and finance, and retail and e-commerce sectors are currently leading the adoption, leveraging the software to identify at-risk customers and implement targeted retention programs. However, factors such as high implementation costs, integration challenges with existing systems, and the need for skilled personnel to manage the software can act as restraints on market growth. We project a substantial market expansion in the coming years, with a steady compound annual growth rate (CAGR) contributing to a significant increase in market value. The competitive landscape is dynamic, with established players like IBM, Salesforce, and Microsoft competing alongside specialized churn management solution providers. This competition fosters innovation and drives the development of more advanced features and functionalities. Looking ahead, the market will witness further consolidation through mergers and acquisitions, as larger companies seek to expand their market share. The increasing emphasis on data privacy and security regulations will also shape market dynamics, with vendors focusing on compliant solutions. The market is expected to witness the rise of niche solutions tailored to specific industry segments, providing customized functionalities. The geographic distribution of the market is expected to remain concentrated in North America and Europe initially, with significant growth potential in emerging markets like Asia Pacific and the Middle East & Africa, fueled by increasing digitalization and adoption of sophisticated business analytics. The continued evolution of AI and machine learning algorithms will be crucial in improving the accuracy and efficiency of churn prediction models, further enhancing the value proposition of Customer Churn Software. This convergence of technological advancement, regulatory compliance, and industry-specific needs will shape the future trajectory of the Customer Churn Software market.
f
S1 Data -
plos.figshare.com
zip
Updated Oct 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang (2023). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0292466.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292466.s001
Dataset updated
Oct 11, 2023
Dataset provided by
PLOS ONE
Authors
Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analyzing customers’ characteristics and giving the early warning of customer churn based on machine learning algorithms, can help enterprises provide targeted marketing strategies and personalized services, and save a lot of operating costs. Data cleaning, oversampling, data standardization and other preprocessing operations are done on 900,000 telecom customer personal characteristics and historical behavior data set based on Python language. Appropriate model parameters were selected to build BPNN (Back Propagation Neural Network). Random Forest (RF) and Adaboost, the two classic ensemble learning models were introduced, and the Adaboost dual-ensemble learning model with RF as the base learner was put forward. The four models and the other four classical machine learning models-decision tree, naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) were utilized respectively to analyze the customer churn data. The results show that the four models have better performance in terms of recall rate, precision rate, F1 score and other indicators, and the RF-Adaboost dual-ensemble model has the best performance. Among them, the recall rates of BPNN, RF, Adaboost and RF-Adaboost dual-ensemble model on positive samples are respectively 79%, 90%, 89%,93%, the precision rates are 97%, 99%, 98%, 99%, and the F1 scores are 87%, 95%, 94%, 96%. The RF-Adaboost dual-ensemble model has the best performance, and the three indicators are 10%, 1%, and 6% higher than the reference. The prediction results of customer churn provide strong data support for telecom companies to adopt appropriate retention strategies for pre-churn customers and reduce customer churn.
A
‘Client churn rate in Telecom sector’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 18, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2016). ‘Client churn rate in Telecom sector’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-client-churn-rate-in-telecom-sector-72d0/latest
Explore at:
Dataset updated
Feb 18, 2016
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Client churn rate in Telecom sector’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sagnikpatra/edadata on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context "Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs."

Content The Orange Telecom's Churn Dataset, which consists of cleaned customer activity data (features), along with a churn label specifying whether a customer canceled the subscription, will be used to develop predictive models. Two datasets are made available here: The churn-80 and churn-20 datasets can be downloaded.

The two sets are from the same batch, but have been split by an 80/20 ratio. As more data is often desirable for developing ML models, let's use the larger set (that is, churn-80) for training and cross-validation purposes, and the smaller set (that is, churn-20) for final testing and model performance evaluation.

Inspiration To explore this type of models and learn more about the subject.

--- Original source retains full ownership of the source dataset ---
B
Big Data & Machine Learning in Telecom Report
archivemarketresearch.com
doc, pdf, ppt
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Big Data & Machine Learning in Telecom Report [Dataset]. https://www.archivemarketresearch.com/reports/big-data-machine-learning-in-telecom-57186
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Mar 14, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Big Data and Machine Learning (BDML) in Telecom market is experiencing robust growth, driven by the explosive increase in mobile data traffic, the rise of 5G networks, and the increasing need for personalized customer experiences. The market, valued at approximately $15 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 18% from 2025 to 2033, reaching an estimated $60 billion by 2033. This expansion is fueled by several key factors. Telecom operators are leveraging BDML for network optimization, predictive maintenance, fraud detection, customer churn prediction, and personalized service offerings. The adoption of descriptive, predictive, and prescriptive analytics across various applications, including processing, storage, and analysis of vast datasets, is a significant driver. Furthermore, advancements in machine learning algorithms and feature engineering techniques are empowering telecom companies to extract deeper insights from their data, leading to significant efficiency gains and improved revenue streams. The increasing availability of cloud-based BDML solutions is also fostering wider adoption, particularly among smaller operators. However, challenges remain. Data security and privacy concerns, the need for skilled data scientists and engineers, and the high initial investment costs associated with implementing BDML solutions can hinder market growth. Despite these restraints, the strategic advantages offered by BDML are undeniable, making its adoption crucial for telecom companies aiming to stay competitive in a rapidly evolving landscape. Segments like predictive analytics and machine learning for network optimization are expected to experience the most significant growth during the forecast period, driven by the increasing complexity of telecom networks and the demand for proactive network management. Geographic regions such as North America and Asia Pacific, with their advanced technological infrastructure and substantial investments in 5G, are anticipated to lead the market, followed by Europe and other regions.
f
The number of correct predictions in each class.
plos.figshare.com
xls
Updated Jun 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari (2024). The number of correct predictions in each class. [Dataset]. http://doi.org/10.1371/journal.pone.0303881.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303881.t004
Dataset updated
Jun 6, 2024
Dataset provided by
PLOS ONE
Authors
Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Customer churn prediction is vital for organizations to mitigate costs and foster growth. Ensemble learning models are commonly used for churn prediction. Diversity and prediction performance are two essential principles for constructing ensemble classifiers. Therefore, developing accurate ensemble learning models consisting of diverse base classifiers is a considerable challenge in this area. In this study, we propose two multi-objective evolutionary ensemble learning models based on clustering (MOEECs), which are include a novel diversity measure. Also, to overcome the data imbalance problem, another objective function is presented in the second model to evaluate ensemble performance. The proposed models in this paper are evaluated with a dataset collected from a mobile operator database. Our first model, MOEEC-1, achieves an accuracy of 97.30% and an AUC of 93.76%, outperforming classical classifiers and other ensemble models. Similarly, MOEEC-2 attains an accuracy of 96.35% and an AUC of 94.89%, showcasing its effectiveness in churn prediction. Furthermore, comparison with previous churn models reveals that MOEEC-1 and MOEEC-2 exhibit superior performance in accuracy, precision, and F-score. Overall, our proposed MOEECs demonstrate significant advancements in churn prediction accuracy and outperform existing models in terms of key performance metrics. These findings underscore the efficacy of our approach in addressing the challenges of customer churn prediction and its potential for practical application in organizational decision-making.
Global Telecom Crm Market Size By Deployment Model, By Type of CRM Solution,...
verifiedmarketresearch.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH, Global Telecom Crm Market Size By Deployment Model, By Type of CRM Solution, By Telecom Operator Size, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/telecom-crm-market/
Explore at:
Dataset provided by
Verified Market Researchhttps://www.verifiedmarketresearch.com/
Authors
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2024 - 2031
Area covered
Global
Description
Telecom Crm Market size was valued at USD 7.4 Billion in 2024 and is projected to reach USD 25.1 Billion by 2031, growing at a CAGR of 10.1% during the forecast period 2024-2031.

Global Telecom Crm Market Drivers

The market drivers for the Telecom Crm Market can be influenced by various factors. These may include:

Customer Experience Focus: Increasing focus on enhancing customer experience in the telecom industry drives the adoption of CRM (Customer Relationship Management) solutions to manage customer interactions, improve service delivery, and personalize customer engagements. Competitive Differentiation: Telecom companies use CRM systems to differentiate themselves in a competitive market by offering personalized services, targeted marketing campaigns, and efficient customer support. Data Integration and Insights: CRM systems integrate customer data from multiple channels (e.g., mobile apps, websites, call centers) to provide telecom companies with actionable insights for better decision-making and service optimization. Subscriber Retention: CRM solutions help telecom operators in subscriber retention efforts by analyzing customer behavior, preferences, and churn prediction models to proactively address customer needs and reduce attrition. Operational Efficiency: Automation of sales, marketing, and customer service processes through CRM systems improves operational efficiency, reduces manual errors, and streamlines workflows in telecom organizations. Cross-Selling and Up-Selling: CRM platforms enable telecom companies to identify cross-selling and up-selling opportunities by analyzing customer buying patterns and preferences, thereby increasing revenue streams. Regulatory Compliance: CRM systems help telecom operators comply with regulatory requirements related to customer data protection, privacy laws, and telecommunications regulations. Digital Transformation: As telecom companies undergo digital transformation, CRM solutions facilitate seamless integration with digital channels and enable omni-channel customer engagement strategies. Predictive Analytics: Adoption of predictive analytics capabilities within CRM systems allows telecom operators to forecast customer behavior, anticipate market trends, and optimize marketing campaigns. Cloud Adoption: Increasing adoption of cloud-based CRM solutions offers scalability, flexibility, and cost-efficiency benefits to telecom companies, facilitating rapid deployment and accessibility across geographies.
Telecom Customer Churn Prediction
kaggle.com
Updated Apr 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shiyamaladevi R S (2024). Telecom Customer Churn Prediction [Dataset]. https://www.kaggle.com/shiyamaladevirs/telecom-customer-churn-prediction/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 28, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shiyamaladevi R S
Description
Dataset

This dataset was created by Shiyamaladevi R S

Contents

Facebook

Twitter

Click to copy link

Link copied

Cite

CUBIG (2025). Telco Customer Churn Dataset [Dataset]. https://cubig.ai/store/products/312/telco-customer-churn-dataset

Data from: Telco Customer Churn Dataset

Explore at:

Dataset updated

May 28, 2025

Dataset authored and provided by

CUBIG

License

https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

Measurement technique

Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy

Description

1) Data Introduction • The Telco Customer Churn Dataset includes carrier customer service usage, account information, demographics and churn, which can be used to predict and analyze customer churn.

2) Data Utilization (1) Telco Customer Churn Dataset has characteristics that: • This dataset includes a variety of customer and service characteristics, including gender, age group, partner and dependents, service subscription status (telephone, Internet, security, backup, device protection, technical support, streaming, etc.), contract type, payment method, monthly fee, total fee, and departure. (2) Telco Customer Churn Dataset can be used to: • Development of customer churn prediction model: Using customer service usage patterns and account information, we can build a machine learning-based churn prediction model to proactively identify customers at risk of churn.

Clear search

Close search

Google apps

Main menu

Data from: Telco Customer Churn Dataset

Customer Churn - Decision Tree & Random Forest

USE RANDOM FOREST

Telco_Customer_churn_Data

Context and Methodology

Technical Details

Further Details

Synthetic Telecom Customer Churn Data

Comparison of the proposed algorithms with other ensemble models.

Synthetic Customer Churn Prediction Dataset

Dataset Features:

Distribution:

Usage:

Coverage:

License:

Who can use it:

Literature review of papers on churn prediction in telecommunication.

Data from: telco-customer-churn

Data from: Telco Customer Churn

The Telco Churn.xls

‘JB Link Telco Customer Churn’ analyzed by Analyst-2

JB Link Customer Churn Problem

Summary of the datasets used in this study.

AI-Powered Customer Churn Prediction Market Research Report 2033

AI-Powered Customer Churn Prediction Market Outlook

Component Analysis

Customer Churn Software Report

S1 Data -

‘Client churn rate in Telecom sector’ analyzed by Analyst-2

Big Data & Machine Learning in Telecom Report

The number of correct predictions in each class.

Global Telecom Crm Market Size By Deployment Model, By Type of CRM Solution,...

Telecom Customer Churn Prediction

Dataset

Contents

Data from: Telco Customer Churn Dataset