60 datasets found

Customer Churn - Decision Tree & Random Forest
kaggle.com
Updated Jul 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
vikram amin (2023). Customer Churn - Decision Tree & Random Forest [Dataset]. https://www.kaggle.com/datasets/vikramamin/customer-churn-decision-tree-and-random-forest
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 6, 2023
Dataset provided by
Kaggle
Authors
vikram amin
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Main objective: Find out customers who will churn and who will not.

Methodology: It is a classification problem. We will use decision tree and random forest to predict the outcome.

Steps Involved

Read the data

Check for data types https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F1ffb600d8a4b4b36bc25e957524a3524%2FPicture1.png?generation=1688638600831386&alt=media" alt="">

Change character vector to factor vector as this is as classification problem

Drop the variable which is not significant for the analysis. We drop "customerID".

Check for missing values. None are found.

Split the data into train and test so we can use the train data for building the model and use test data for prediction. We split this into 80-20 ratio (train/test) using the sample function.

Install and run libraries (rpart, rpart.plot, rattle, RColorBrewer, caret)

Run decision tree using rpart function. The dependent variable is Churn and 19 other independent variables

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8d3442e6c82d8026c6a448e4780ab38c%2FPicture2.png?generation=1688638685268853&alt=media" alt=""> 9. Plot the decision tree

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F9ab0591e323dc30fe116c79f6d014d06%2FPicture3.png?generation=1688638747644320&alt=media" alt="">

Average customer churn is 27%. The churn can take place if the tenure is more than >=7.5 and there is no internet service

Tuning the model

Define the search grid using the expand.grid function

Set up the control parameters through 5 fold cross validation

When we print the model we get the best CP = 0.01 and an accuracy of 79.00%

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F16080ac04d3743ec238227e1ef2c8269%2FPicture4.png?generation=1688639197455166&alt=media" alt="">

Predict the model

Find out the variables which are most and least significant. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F61beb4224e9351cfc772147c43800502%2FPicture5.png?generation=1688639468638950&alt=media" alt="">

Significant variables are Internet Service, Tenure and the least significant are Streaming Movies, Tech Support.

USE RANDOM FOREST

Run library(randomForest). Here we are using the default ntree (500) and mtry (p/3) where p is the number of independent variables. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc27fe7e83f0b53b7e067371b69c7f4a7%2FPicture6.png?generation=1688640478682685&alt=media" alt="">

Through confusion matrix, accuracy is coming 79.27%. The accuracy is marginally higher than that of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and much higher when predicting "Yes".

Plot the model showing which variables reduce the gini impunity the most and least. Total charges and tenure reduce the gini impunity the most while phone service has the least impact.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fec25fc3ba74ab9cef1a81188209512b1%2FPicture7.png?generation=1688640726235724&alt=media" alt="">

Predict the model and create a new data frame showing the actuals vs predicted values

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F50aa40e5dd676c8285020fd2fe627bf1%2FPicture8.png?generation=1688640896763066&alt=media" alt="">

Plot the model so as to find out where the OOB (out of bag ) error stops decreasing or becoming constant. As we can see that the error stops decreasing between 100 to 200 trees. So we decide to take ntree = 200 when we tune the model.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F87211e1b218c595911fbe6ea2806e27a%2FPicture9.png?generation=1688641103367564&alt=media" alt="">

Tune the model mtry=2 has the lowest OOB error rate

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F6057af5bb0719b16f1a97a58c3d4aa1d%2FPicture10.png?generation=1688641391027971&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc7045eba4ee298c58f1bd0230c24c00d%2FPicture11.png?generation=1688641605829830&alt=media" alt="">

Use random forest with mtry = 2 and ntree = 200

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F01541eff1f9c6303591aa50dd707b5f5%2FPicture12.png?generation=1688641634979403&alt=media" alt="">

Through confusion matrix, accuracy is coming 79.71%. The accuracy is marginally higher than that of default (when ntree was 500 and mtry was 4) i.e 79.27% and of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and m...
t
Telco_Customer_churn_Data
test.researchdata.tuwien.at
bin, csv, png
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erum Naz; Erum Naz; Erum Naz; Erum Naz (2025). Telco_Customer_churn_Data [Dataset]. http://doi.org/10.82556/b0ch-cn44
Explore at:
png, csv, binAvailable download formats
Unique identifier
https://doi.org/10.82556/b0ch-cn44
Dataset updated
Apr 28, 2025
Dataset provided by
TU Wien
Authors
Erum Naz; Erum Naz; Erum Naz; Erum Naz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 28, 2025
Description
Context and Methodology

The dataset originates from the research domain of Customer Churn Prediction in the Telecom Industry. It was created as part of the project "Data-Driven Churn Prediction: ML Solutions for the Telecom Industry," completed within the Data Stewardship course (Master programme Data Science, TU Wien).

The primary purpose of this dataset is to support machine learning model development for predicting customer churn based on customer demographics, service usage, and account information.
The dataset enables the training, testing, and evaluation of classification algorithms, allowing researchers and practitioners to explore techniques for customer retention optimization.

The dataset was originally obtained from the IBM Accelerator Catalog and adapted for academic use. It was uploaded to TU Wien’s DBRepo test system and accessed via SQLAlchemy connections to the MariaDB environment.

Technical Details

The dataset has a tabular structure and was initially stored in CSV format. It contains:

Rows: 7,043 customer records

Columns: 21 features including customer attributes (gender, senior citizen status, partner status), account information (tenure, contract type, payment method), service usage (internet service, streaming TV, tech support), and the target variable (Churn: Yes/No).

Naming Convention:

The table in the database is named telco_customer_churn_data.

Software Requirements:

To open and work with the dataset, any standard database client or programming language supporting MariaDB connections can be used (e.g., Python etc).

For machine learning applications, libraries such as pandas, scikit-learn, and joblib are typically used.

Additional Resources:

Source code for data loading, preprocessing, model training, and evaluation is available at the associated GitHub repository: https://github.com/nazerum/fair-ml-customer-churn

Further Details

When reusing the dataset, users should be aware:

Licensing: The dataset is shared under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Use Case Suitability: The dataset is best suited for classification tasks, particularly binary classification (churn vs. no churn).

Metadata Standards: Metadata describing the dataset adheres to FAIR principles and is supplemented by CodeMeta and Croissant standards for improved interoperability.
Synthetic Telecom Customer Churn Data
kaggle.com
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdulrahman Qaten (2025). Synthetic Telecom Customer Churn Data [Dataset]. https://www.kaggle.com/datasets/abdulrahmanqaten/synthetic-customer-churn/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 27, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abdulrahman Qaten
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
If you found the dataset useful, your upvote will help others discover it. Thanks for your support!

This dataset simulates customer behavior for a fictional telecommunications company. It contains demographic information, account details, services subscribed to, and whether the customer ultimately churned (stopped using the service) or not. The data is synthetically generated but designed to reflect realistic patterns often found in telecom churn scenarios.

Purpose:

The primary goal of this dataset is to provide a clean and straightforward resource for beginners learning about:

Exploratory Data Analysis (EDA): Understanding customer characteristics and identifying potential drivers of churn through visualization and statistical summaries.

Data Preprocessing: Handling categorical features (like converting text to numbers) and scaling numerical features.

Classification Modeling: Building and evaluating simple machine learning models (like Logistic Regression or Decision Trees) to predict customer churn.

Features:

The dataset includes the following columns:

CustomerID: Unique identifier for each customer.

Age: Customer's age in years.

Gender: Customer's gender (Male/Female).

Location: General location of the customer (e.g., New York, Los Angeles).

SubscriptionDurationMonths: How many months the customer has been subscribed.

MonthlyCharges: The amount the customer is charged each month.

TotalCharges: The total amount the customer has been charged over their subscription period.

ContractType: The type of contract the customer has (Month-to-month, One year, Two year).

PaymentMethod: How the customer pays their bill (e.g., Electronic check, Credit card).

OnlineSecurity: Whether the customer has online security service (Yes, No, No internet service).

TechSupport: Whether the customer has tech support service (Yes, No, No internet service).

StreamingTV: Whether the customer has TV streaming service (Yes, No, No internet service).

StreamingMovies: Whether the customer has movie streaming service (Yes, No, No internet service).

Churn: (Target Variable) Whether the customer churned (1 = Yes, 0 = No).

Data Quality:

This dataset is intentionally clean with no missing values, making it easy for beginners to focus on analysis and modeling concepts without complex data cleaning steps.

Inspiration:

Understanding customer churn is crucial for many businesses. This dataset provides a sandbox environment to practice the fundamental techniques used in churn analysis and prediction.
Telecom Customer Churn Prediction
kaggle.com
Updated Apr 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shiyamaladevi R S (2024). Telecom Customer Churn Prediction [Dataset]. https://www.kaggle.com/shiyamaladevirs/telecom-customer-churn-prediction/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 28, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shiyamaladevi R S
Description
Dataset

This dataset was created by Shiyamaladevi R S

Contents
f
Comparison of the proposed algorithms with other ensemble models.
plos.figshare.com
xls
Updated Jun 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari (2024). Comparison of the proposed algorithms with other ensemble models. [Dataset]. http://doi.org/10.1371/journal.pone.0303881.t009
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303881.t009
Dataset updated
Jun 6, 2024
Dataset provided by
PLOS ONE
Authors
Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of the proposed algorithms with other ensemble models.
f
S1 Data -
plos.figshare.com
zip
Updated Oct 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang (2023). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0292466.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292466.s001
Dataset updated
Oct 11, 2023
Dataset provided by
PLOS ONE
Authors
Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analyzing customers’ characteristics and giving the early warning of customer churn based on machine learning algorithms, can help enterprises provide targeted marketing strategies and personalized services, and save a lot of operating costs. Data cleaning, oversampling, data standardization and other preprocessing operations are done on 900,000 telecom customer personal characteristics and historical behavior data set based on Python language. Appropriate model parameters were selected to build BPNN (Back Propagation Neural Network). Random Forest (RF) and Adaboost, the two classic ensemble learning models were introduced, and the Adaboost dual-ensemble learning model with RF as the base learner was put forward. The four models and the other four classical machine learning models-decision tree, naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) were utilized respectively to analyze the customer churn data. The results show that the four models have better performance in terms of recall rate, precision rate, F1 score and other indicators, and the RF-Adaboost dual-ensemble model has the best performance. Among them, the recall rates of BPNN, RF, Adaboost and RF-Adaboost dual-ensemble model on positive samples are respectively 79%, 90%, 89%,93%, the precision rates are 97%, 99%, 98%, 99%, and the F1 scores are 87%, 95%, 94%, 96%. The RF-Adaboost dual-ensemble model has the best performance, and the three indicators are 10%, 1%, and 6% higher than the reference. The prediction results of customer churn provide strong data support for telecom companies to adopt appropriate retention strategies for pre-churn customers and reduce customer churn.
Synthetic Customer Churn Prediction Dataset
opendatabay.com
.undefined
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Opendatabay Labs (2025). Synthetic Customer Churn Prediction Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/5d7ef013-5848-4367-bf3b-2ce359587b43
Explore at:
.undefinedAvailable download formats
Dataset updated
May 6, 2025
Dataset provided by
Buy & Sell Data | Opendatabay - AI & Synthetic Data Marketplace
Authors
Opendatabay Labs
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Retail & Consumer Behavior
Description
This Synthetic Customer Churn Prediction Dataset has been designed as an educational resource for exploring data science, machine learning, and predictive modelling techniques in a customer retention context. The dataset simulates key attributes relevant to customer churn analysis, such as service usage, contract details, and customer demographics. It allows users to practice data manipulation, visualization, and the development of models to predict churn behaviour in industries like telecommunications, subscription services, or utilities.

Dataset Features:

Customer_Id: Unique identifier for each customer (not included in this dataset for privacy).

Gender: Gender of the customer (e.g., "Male," "Female").

Partner: Whether the customer has a partner (e.g., "Yes," "No").

Dependents: Whether the customer has dependents (e.g., "Yes," "No").

Tenure (Months): The number of months the customer has been with the company.

PhoneService: Whether the customer has a phone service (e.g., "Yes," "No").

MultipleLines: Whether the customer has multiple phone lines (e.g., "Yes," "No phone service").

InternetService: Type of internet service (e.g., "DSL," "Fiber optic," "No").

OnlineSecurity: Whether the customer has online security services (e.g., "Yes," "No," "No internet service").

OnlineBackup: Whether the customer has online backup services (e.g., "Yes," "No," "No internet service").

DeviceProtection: Whether the customer has device protection services (e.g., "Yes," "No," "No internet service").

TechSupport: Whether the customer has tech support services (e.g., "Yes," "No," "No internet service").

StreamingTV: Whether the customer has streaming TV services (e.g., "Yes," "No," "No internet service").

StreamingMovies: Whether the customer has streaming movies services (e.g., "Yes," "No," "No internet service").

Contract: Type of contract the customer has (e.g., "Month-to-month," "One year," "Two year").

PaperlessBilling: Whether the customer uses paperless billing (e.g., "Yes," "No").

PaymentMethod: The payment method used by the customer (e.g., "Electronic check," "Credit card," "Bank transfer").

MonthlyCharges: Monthly charges billed to the customer.

TotalCharges: Total charges incurred by the customer over their tenure.

Churn: Whether the customer has churned (e.g., "Yes," "No").

Distribution:

https://storage.googleapis.com/opendatabay_public/images/churn_c4aae9d4-3939-4866-a249-35d81c5965dc.png" alt="Synthetic Customer Churn Prediction Dataset Distribution">

Usage:

This dataset is useful for a variety of applications, including:

Customer Behavior Analysis: To understand factors influencing customer retention and churn.

Educational Training: To practice data cleaning, feature engineering, and visualization techniques in customer analytics.

Predictive Modeling: To build machine learning models for predicting customer churn based on service usage patterns and demographic information.

Coverage:

This dataset is synthetic and anonymized, making it a safe tool for experimentation and learning without compromising real patient privacy.

License:

CCO (Public Domain)

Who can use it:

Data scientists and enthusiasts: For developing customer analytics skills and predictive modelling expertise.

Business analysts: To understand customer churn drivers and improve retention strategies.

Educators and students: For teaching and learning applications in data science and machine learning.
Telecom customer
kaggle.com
Updated Aug 27, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
abhinav (2017). Telecom customer [Dataset]. https://www.kaggle.com/abhinav89/telecom-customer/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 27, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
abhinav
Description
Telecom customer churn prediction

This data set consists of 100 variables and approx 100 thousand records. This data set contains different variables explaining the attributes of telecom industry and various factors considered important while dealing with customers of telecom industry. The target variable here is churn which explains whether the customer will churn or not. We can use this data set to predict the customers who would churn or who wouldn't churn depending on various variables available.
h
Data from: telco-customer-churn
huggingface.co
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
aai510-group1 (2025). telco-customer-churn [Dataset]. https://huggingface.co/datasets/aai510-group1/telco-customer-churn
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 18, 2025
Dataset authored and provided by
aai510-group1
Description
Dataset Card for Telco Customer Churn

This dataset contains information about customers of a fictional telecommunications company, including demographic information, services subscribed to, location details, and churn behavior. This merged dataset combines the information from the original Telco Customer Churn dataset with additional details.

Dataset Details Dataset Description

This merged Telco Customer Churn dataset provides a comprehensive view of customer… See the full description on the dataset page: https://huggingface.co/datasets/aai510-group1/telco-customer-churn.
f
Literature review of papers on churn prediction in telecommunication.
plos.figshare.com
xls
Updated Jun 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari (2024). Literature review of papers on churn prediction in telecommunication. [Dataset]. http://doi.org/10.1371/journal.pone.0303881.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303881.t001
Dataset updated
Jun 6, 2024
Dataset provided by
PLOS ONE
Authors
Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Literature review of papers on churn prediction in telecommunication.
Telecom Customer Churn Prediction
kaggle.com
Updated Sep 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
R. Joseph Manoj, PhD (2020). Telecom Customer Churn Prediction [Dataset]. https://www.kaggle.com/rjmanoj/telecom-customer-churn-prediction/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 12, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
R. Joseph Manoj, PhD
Description
Dataset

This dataset was created by R. Joseph Manoj, PhD

Contents
A
‘Client churn rate in Telecom sector’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 18, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2016). ‘Client churn rate in Telecom sector’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-client-churn-rate-in-telecom-sector-72d0/latest
Explore at:
Dataset updated
Feb 18, 2016
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Client churn rate in Telecom sector’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sagnikpatra/edadata on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context "Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs."

Content The Orange Telecom's Churn Dataset, which consists of cleaned customer activity data (features), along with a churn label specifying whether a customer canceled the subscription, will be used to develop predictive models. Two datasets are made available here: The churn-80 and churn-20 datasets can be downloaded.

The two sets are from the same batch, but have been split by an 80/20 ratio. As more data is often desirable for developing ML models, let's use the larger set (that is, churn-80) for training and cross-validation purposes, and the smaller set (that is, churn-20) for final testing and model performance evaluation.

Inspiration To explore this type of models and learn more about the subject.

--- Original source retains full ownership of the source dataset ---
B
Big Data & Machine Learning in Telecom Report
archivemarketresearch.com
doc, pdf, ppt
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Big Data & Machine Learning in Telecom Report [Dataset]. https://www.archivemarketresearch.com/reports/big-data-machine-learning-in-telecom-57186
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Mar 14, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Big Data and Machine Learning (BDML) in Telecom market is experiencing robust growth, driven by the explosive increase in mobile data traffic, the rise of 5G networks, and the increasing need for personalized customer experiences. The market, valued at approximately $15 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 18% from 2025 to 2033, reaching an estimated $60 billion by 2033. This expansion is fueled by several key factors. Telecom operators are leveraging BDML for network optimization, predictive maintenance, fraud detection, customer churn prediction, and personalized service offerings. The adoption of descriptive, predictive, and prescriptive analytics across various applications, including processing, storage, and analysis of vast datasets, is a significant driver. Furthermore, advancements in machine learning algorithms and feature engineering techniques are empowering telecom companies to extract deeper insights from their data, leading to significant efficiency gains and improved revenue streams. The increasing availability of cloud-based BDML solutions is also fostering wider adoption, particularly among smaller operators. However, challenges remain. Data security and privacy concerns, the need for skilled data scientists and engineers, and the high initial investment costs associated with implementing BDML solutions can hinder market growth. Despite these restraints, the strategic advantages offered by BDML are undeniable, making its adoption crucial for telecom companies aiming to stay competitive in a rapidly evolving landscape. Segments like predictive analytics and machine learning for network optimization are expected to experience the most significant growth during the forecast period, driven by the increasing complexity of telecom networks and the demand for proactive network management. Geographic regions such as North America and Asia Pacific, with their advanced technological infrastructure and substantial investments in 5G, are anticipated to lead the market, followed by Europe and other regions.
AI-Powered Customer Churn Prediction Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). AI-Powered Customer Churn Prediction Market Research Report 2033 [Dataset]. https://dataintelo.com/report/ai-powered-customer-churn-prediction-market
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Jun 28, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
AI-Powered Customer Churn Prediction Market Outlook

According to our latest research, the AI-powered customer churn prediction market size reached USD 1.58 billion globally in 2024, with a robust CAGR of 19.7% expected from 2025 to 2033. Driven by rapid digital transformation and the increasing need for predictive analytics across sectors, the market is forecasted to attain a value of USD 7.57 billion by 2033. The growth of this market is primarily attributed to the escalating adoption of AI and machine learning technologies by enterprises seeking to reduce customer attrition, optimize retention strategies, and enhance overall customer lifetime value, as per the latest industry research.

One of the fundamental growth drivers for the AI-powered customer churn prediction market is the proliferation of customer data and the imperative need for businesses to leverage this data to drive actionable insights. With the advent of digital touchpoints, organizations are now able to collect vast amounts of structured and unstructured data from various customer interactions. This data, when processed using advanced AI and machine learning algorithms, empowers companies to predict potential churn with high accuracy. As a result, businesses across industries such as telecommunications, BFSI, retail, and healthcare are increasingly investing in AI-powered churn prediction solutions to proactively identify at-risk customers and implement targeted retention strategies, thereby reducing revenue loss and improving profitability.

Another significant factor fueling market expansion is the growing emphasis on customer experience and personalization. In today's hyper-competitive landscape, retaining existing customers has become more cost-effective than acquiring new ones. AI-powered churn prediction tools enable organizations to segment their customer base, understand behavior patterns, and tailor interventions for individual customers. This level of personalization not only helps in reducing churn rates but also enhances customer satisfaction and loyalty. The integration of AI-driven insights into CRM systems and marketing automation platforms further streamlines the process, making it easier for businesses to act on predictions in real time. Moreover, the rising adoption of cloud-based solutions has made these technologies more accessible to small and medium enterprises (SMEs), broadening the market’s reach.

The surge in demand for scalable, real-time analytics platforms is also contributing to market growth. Enterprises are increasingly seeking AI-powered solutions that can integrate seamlessly with their existing IT infrastructure, deliver instant insights, and scale as their data grows. The shift towards cloud deployment models has accelerated this trend, offering cost-effective, flexible, and easily deployable churn prediction solutions. Additionally, advancements in natural language processing (NLP), deep learning, and big data analytics are further enhancing the accuracy and reliability of churn prediction models. As organizations strive to stay ahead of the competition by minimizing customer attrition, the demand for sophisticated, AI-driven predictive analytics tools continues to rise.

Regionally, North America holds the largest market share, followed by Europe and Asia Pacific. The dominance of North America can be attributed to the early adoption of AI technologies, presence of major technology vendors, and a strong focus on customer-centric strategies among enterprises in the region. Europe is also witnessing significant growth, driven by stringent regulations around data protection and a growing emphasis on customer retention in industries like BFSI and retail. The Asia Pacific region is expected to exhibit the highest CAGR during the forecast period, fueled by rapid digitalization, increasing investments in AI, and the expansion of e-commerce and telecommunications sectors. Latin America and the Middle East & Africa are also experiencing gradual adoption, primarily in financial services and telecommunications.

Component Analysis

The component segment of the AI-powered customer churn prediction market is categorized into software and services. The software segment dominates the market, accounting for the largest share in 2024, owing to the widespread deployment of advanced AI and machine learning platforms
C
Customer Churn Software Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Customer Churn Software Report [Dataset]. https://www.marketresearchforecast.com/reports/customer-churn-software-56060
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Mar 25, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Customer Churn Software market is experiencing robust growth, driven by the increasing need for businesses across diverse sectors to improve customer retention and enhance profitability. The market's expansion is fueled by several key factors. Firstly, the rising adoption of cloud-based solutions offers scalability and cost-effectiveness, attracting a wider range of businesses. Secondly, advancements in AI and machine learning are enabling more sophisticated churn prediction and proactive customer engagement strategies. The telecommunications, banking and finance, and retail and e-commerce sectors are currently leading the adoption, leveraging the software to identify at-risk customers and implement targeted retention programs. However, factors such as high implementation costs, integration challenges with existing systems, and the need for skilled personnel to manage the software can act as restraints on market growth. We project a substantial market expansion in the coming years, with a steady compound annual growth rate (CAGR) contributing to a significant increase in market value. The competitive landscape is dynamic, with established players like IBM, Salesforce, and Microsoft competing alongside specialized churn management solution providers. This competition fosters innovation and drives the development of more advanced features and functionalities. Looking ahead, the market will witness further consolidation through mergers and acquisitions, as larger companies seek to expand their market share. The increasing emphasis on data privacy and security regulations will also shape market dynamics, with vendors focusing on compliant solutions. The market is expected to witness the rise of niche solutions tailored to specific industry segments, providing customized functionalities. The geographic distribution of the market is expected to remain concentrated in North America and Europe initially, with significant growth potential in emerging markets like Asia Pacific and the Middle East & Africa, fueled by increasing digitalization and adoption of sophisticated business analytics. The continued evolution of AI and machine learning algorithms will be crucial in improving the accuracy and efficiency of churn prediction models, further enhancing the value proposition of Customer Churn Software. This convergence of technological advancement, regulatory compliance, and industry-specific needs will shape the future trajectory of the Customer Churn Software market.
Telecom Churn Prediction - 2
kaggle.com
Updated Nov 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RahulRajML (2022). Telecom Churn Prediction - 2 [Dataset]. https://www.kaggle.com/datasets/rahulrajml/telecom-churncasestudy-clustering
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 23, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
RahulRajML
Description
Dataset

This dataset was created by RahulRajML

Contents
Global Telecom Crm Market Size By Deployment Model, By Type of CRM Solution,...
verifiedmarketresearch.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH, Global Telecom Crm Market Size By Deployment Model, By Type of CRM Solution, By Telecom Operator Size, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/telecom-crm-market/
Explore at:
Dataset provided by
Verified Market Researchhttps://www.verifiedmarketresearch.com/
Authors
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2024 - 2031
Area covered
Global
Description
Telecom Crm Market size was valued at USD 7.4 Billion in 2024 and is projected to reach USD 25.1 Billion by 2031, growing at a CAGR of 10.1% during the forecast period 2024-2031.

Global Telecom Crm Market Drivers

The market drivers for the Telecom Crm Market can be influenced by various factors. These may include:

Customer Experience Focus: Increasing focus on enhancing customer experience in the telecom industry drives the adoption of CRM (Customer Relationship Management) solutions to manage customer interactions, improve service delivery, and personalize customer engagements. Competitive Differentiation: Telecom companies use CRM systems to differentiate themselves in a competitive market by offering personalized services, targeted marketing campaigns, and efficient customer support. Data Integration and Insights: CRM systems integrate customer data from multiple channels (e.g., mobile apps, websites, call centers) to provide telecom companies with actionable insights for better decision-making and service optimization. Subscriber Retention: CRM solutions help telecom operators in subscriber retention efforts by analyzing customer behavior, preferences, and churn prediction models to proactively address customer needs and reduce attrition. Operational Efficiency: Automation of sales, marketing, and customer service processes through CRM systems improves operational efficiency, reduces manual errors, and streamlines workflows in telecom organizations. Cross-Selling and Up-Selling: CRM platforms enable telecom companies to identify cross-selling and up-selling opportunities by analyzing customer buying patterns and preferences, thereby increasing revenue streams. Regulatory Compliance: CRM systems help telecom operators comply with regulatory requirements related to customer data protection, privacy laws, and telecommunications regulations. Digital Transformation: As telecom companies undergo digital transformation, CRM solutions facilitate seamless integration with digital channels and enable omni-channel customer engagement strategies. Predictive Analytics: Adoption of predictive analytics capabilities within CRM systems allows telecom operators to forecast customer behavior, anticipate market trends, and optimize marketing campaigns. Cloud Adoption: Increasing adoption of cloud-based CRM solutions offers scalability, flexibility, and cost-efficiency benefits to telecom companies, facilitating rapid deployment and accessibility across geographies.
f
The number of correct predictions in each class.
plos.figshare.com
xls
Updated Jun 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari (2024). The number of correct predictions in each class. [Dataset]. http://doi.org/10.1371/journal.pone.0303881.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303881.t004
Dataset updated
Jun 6, 2024
Dataset provided by
PLOS ONE
Authors
Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Customer churn prediction is vital for organizations to mitigate costs and foster growth. Ensemble learning models are commonly used for churn prediction. Diversity and prediction performance are two essential principles for constructing ensemble classifiers. Therefore, developing accurate ensemble learning models consisting of diverse base classifiers is a considerable challenge in this area. In this study, we propose two multi-objective evolutionary ensemble learning models based on clustering (MOEECs), which are include a novel diversity measure. Also, to overcome the data imbalance problem, another objective function is presented in the second model to evaluate ensemble performance. The proposed models in this paper are evaluated with a dataset collected from a mobile operator database. Our first model, MOEEC-1, achieves an accuracy of 97.30% and an AUC of 93.76%, outperforming classical classifiers and other ensemble models. Similarly, MOEEC-2 attains an accuracy of 96.35% and an AUC of 94.89%, showcasing its effectiveness in churn prediction. Furthermore, comparison with previous churn models reveals that MOEEC-1 and MOEEC-2 exhibit superior performance in accuracy, precision, and F-score. Overall, our proposed MOEECs demonstrate significant advancements in churn prediction accuracy and outperform existing models in terms of key performance metrics. These findings underscore the efficacy of our approach in addressing the challenges of customer churn prediction and its potential for practical application in organizational decision-making.
f
Summary of the datasets used in this study.
figshare.com
plos.figshare.com
xls
Updated Jun 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joydeb Kumar Sana; Mohammad Zoynul Abedin; M. Sohel Rahman; M. Saifur Rahman (2023). Summary of the datasets used in this study. [Dataset]. http://doi.org/10.1371/journal.pone.0278095.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0278095.t002
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS ONE
Authors
Joydeb Kumar Sana; Mohammad Zoynul Abedin; M. Sohel Rahman; M. Saifur Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary of the datasets used in this study.
Big Data Analytics In Telecom Market Report | Global Forecast From 2025 To...
dataintelo.com
csv, pdf, pptx
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Big Data Analytics In Telecom Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/big-data-analytics-in-telecom-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Oct 16, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Big Data Analytics In Telecom Market Outlook

The global market size for Big Data Analytics in the Telecom sector was valued at approximately USD 10 billion in 2023 and is projected to reach around USD 50 billion by 2032, exhibiting a robust CAGR of 20% during the forecast period. This impressive growth trajectory is fueled by the increasing demand for advanced analytics to optimize operations, enhance customer experience, and improve network management. The telecom sector's continuous expansion and the proliferation of connected devices are also significant contributors to this market's rapid growth.

One of the primary growth factors for this market is the exponential increase in data generation. With the advent of 5G technology, the volume of data transmitted over networks has surged, necessitating sophisticated analytics to manage and utilize this data effectively. Telecom companies are increasingly relying on big data analytics to derive actionable insights from vast datasets, which can lead to improved decision-making and strategic planning. Moreover, the integration of IoT devices and services has further amplified data traffic, making analytics indispensable for telecom operators.

Another crucial driver is the need for enhanced customer experience. Telecom operators are leveraging big data analytics to gain deeper insights into customer behavior, preferences, and pain points. This data-driven approach allows for personalized marketing strategies, better customer service, and reduced churn rates. By analyzing customer data, telecom companies can identify trends and patterns that help in developing targeted campaigns and offers, thereby increasing customer loyalty and satisfaction.

Operational efficiency is also a significant factor propelling the growth of big data analytics in the telecom market. Telecom operators are under constant pressure to improve their network performance and reduce operational costs. Big data analytics enables real-time monitoring and predictive maintenance of network infrastructure, leading to fewer outages and improved service quality. Additionally, analytics helps in optimizing resource allocation and enhancing the overall efficiency of telecom operations.

Regionally, North America holds a substantial share of the big data analytics in telecom market, driven by the presence of leading telecom companies and advanced technology infrastructure. Additionally, the Asia Pacific region is expected to witness the fastest growth rate due to the rapid digital transformation and increasing adoption of advanced analytics solutions in emerging economies like China and India. European countries are also making significant investments in big data analytics to enhance their telecom services, contributing to the market's growth.

Component Analysis

In the context of components, the Big Data Analytics in Telecom market is segmented into software, hardware, and services. The software segment is anticipated to dominate the market, as telecom operators increasingly invest in advanced analytics platforms and tools. The software solutions facilitate the processing and analysis of large datasets, enabling telecom companies to gain valuable insights and improve decision-making processes. Moreover, the software segment includes various sub-categories such as data management, data mining, and predictive analytics, each contributing significantly to market growth.

The hardware segment, although smaller compared to software, plays a critical role in the overall ecosystem. This segment includes servers, storage systems, and other hardware components necessary for data processing and storage. As data volumes continue to grow, the demand for robust and scalable hardware solutions is also on the rise. Telecom companies are investing in high-performance hardware to ensure seamless data management and analytics capabilities. The hardware segment is essential for supporting the infrastructure needed for big data analytics.

On the services front, the market is witnessing substantial growth due to the increasing need for consulting, integration, and maintenance services. Telecom operators often require expert guidance and support to implement and manage big data analytics solutions effectively. Service providers offer a range of services, including system integration, data migration, and ongoing support, which are crucial for the success

Facebook

Twitter

Click to copy link

Link copied

Cite

vikram amin (2023). Customer Churn - Decision Tree & Random Forest [Dataset]. https://www.kaggle.com/datasets/vikramamin/customer-churn-decision-tree-and-random-forest

Customer Churn - Decision Tree & Random Forest

Predicting the Customer Churn for a Telecom Company

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 6, 2023

Dataset provided by

Kaggle

Authors

vikram amin

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Main objective: Find out customers who will churn and who will not.
Methodology: It is a classification problem. We will use decision tree and random forest to predict the outcome.
Steps Involved
Read the data
Check for data types https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F1ffb600d8a4b4b36bc25e957524a3524%2FPicture1.png?generation=1688638600831386&alt=media" alt="">

Change character vector to factor vector as this is as classification problem
Drop the variable which is not significant for the analysis. We drop "customerID".
Check for missing values. None are found.
Split the data into train and test so we can use the train data for building the model and use test data for prediction. We split this into 80-20 ratio (train/test) using the sample function.
Install and run libraries (rpart, rpart.plot, rattle, RColorBrewer, caret)
Run decision tree using rpart function. The dependent variable is Churn and 19 other independent variables

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8d3442e6c82d8026c6a448e4780ab38c%2FPicture2.png?generation=1688638685268853&alt=media" alt=""> 9. Plot the decision tree

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F9ab0591e323dc30fe116c79f6d014d06%2FPicture3.png?generation=1688638747644320&alt=media" alt="">

Average customer churn is 27%. The churn can take place if the tenure is more than >=7.5 and there is no internet service

Tuning the model
Define the search grid using the expand.grid function
Set up the control parameters through 5 fold cross validation
When we print the model we get the best CP = 0.01 and an accuracy of 79.00%

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F16080ac04d3743ec238227e1ef2c8269%2FPicture4.png?generation=1688639197455166&alt=media" alt="">

Predict the model
Find out the variables which are most and least significant. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F61beb4224e9351cfc772147c43800502%2FPicture5.png?generation=1688639468638950&alt=media" alt="">

Significant variables are Internet Service, Tenure and the least significant are Streaming Movies, Tech Support.

USE RANDOM FOREST

Run library(randomForest). Here we are using the default ntree (500) and mtry (p/3) where p is the number of independent variables. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc27fe7e83f0b53b7e067371b69c7f4a7%2FPicture6.png?generation=1688640478682685&alt=media" alt="">

Through confusion matrix, accuracy is coming 79.27%. The accuracy is marginally higher than that of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and much higher when predicting "Yes".
Plot the model showing which variables reduce the gini impunity the most and least. Total charges and tenure reduce the gini impunity the most while phone service has the least impact.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fec25fc3ba74ab9cef1a81188209512b1%2FPicture7.png?generation=1688640726235724&alt=media" alt="">

Predict the model and create a new data frame showing the actuals vs predicted values

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F50aa40e5dd676c8285020fd2fe627bf1%2FPicture8.png?generation=1688640896763066&alt=media" alt="">

Plot the model so as to find out where the OOB (out of bag ) error stops decreasing or becoming constant. As we can see that the error stops decreasing between 100 to 200 trees. So we decide to take ntree = 200 when we tune the model.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F87211e1b218c595911fbe6ea2806e27a%2FPicture9.png?generation=1688641103367564&alt=media" alt="">

Tune the model mtry=2 has the lowest OOB error rate

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F6057af5bb0719b16f1a97a58c3d4aa1d%2FPicture10.png?generation=1688641391027971&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc7045eba4ee298c58f1bd0230c24c00d%2FPicture11.png?generation=1688641605829830&alt=media" alt="">

Use random forest with mtry = 2 and ntree = 200

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F01541eff1f9c6303591aa50dd707b5f5%2FPicture12.png?generation=1688641634979403&alt=media" alt="">

Through confusion matrix, accuracy is coming 79.71%. The accuracy is marginally higher than that of default (when ntree was 500 and mtry was 4) i.e 79.27% and of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and m...

Clear search

Close search

Google apps

Main menu

Customer Churn - Decision Tree & Random Forest

USE RANDOM FOREST

Telco_Customer_churn_Data

Context and Methodology

Technical Details

Further Details

Synthetic Telecom Customer Churn Data

Telecom Customer Churn Prediction

Dataset

Contents

Comparison of the proposed algorithms with other ensemble models.

S1 Data -

Synthetic Customer Churn Prediction Dataset

Dataset Features:

Distribution:

Usage:

Coverage:

License:

Who can use it:

Telecom customer

Data from: telco-customer-churn

Literature review of papers on churn prediction in telecommunication.

Telecom Customer Churn Prediction

Dataset

Contents

‘Client churn rate in Telecom sector’ analyzed by Analyst-2

Big Data & Machine Learning in Telecom Report

AI-Powered Customer Churn Prediction Market Research Report 2033

AI-Powered Customer Churn Prediction Market Outlook

Component Analysis

Customer Churn Software Report

Telecom Churn Prediction - 2

Dataset

Contents

Global Telecom Crm Market Size By Deployment Model, By Type of CRM Solution,...

The number of correct predictions in each class.

Summary of the datasets used in this study.

Big Data Analytics In Telecom Market Report | Global Forecast From 2025 To...

Big Data Analytics In Telecom Market Outlook

Component Analysis

Customer Churn - Decision Tree & Random Forest

Predicting the Customer Churn for a Telecom Company

USE RANDOM FOREST