Dataset Card for Telco Customer Churn
This dataset contains information about customers of a fictional telecommunications company, including demographic information, services subscribed to, location details, and churn behavior. This merged dataset combines the information from the original Telco Customer Churn dataset with additional details.
Dataset Details
Dataset Description
This merged Telco Customer Churn dataset provides a comprehensive view of customer… See the full description on the dataset page: https://huggingface.co/datasets/aai510-group1/telco-customer-churn.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8d3442e6c82d8026c6a448e4780ab38c%2FPicture2.png?generation=1688638685268853&alt=media" alt="">
9. Plot the decision tree
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F9ab0591e323dc30fe116c79f6d014d06%2FPicture3.png?generation=1688638747644320&alt=media" alt="">
Average customer churn is 27%. The churn can take place if the tenure is more than >=7.5 and there is no internet service
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F16080ac04d3743ec238227e1ef2c8269%2FPicture4.png?generation=1688639197455166&alt=media" alt="">
Significant variables are Internet Service, Tenure and the least significant are Streaming Movies, Tech Support.
Run library(randomForest). Here we are using the default ntree (500) and mtry (p/3) where p is the number of
independent variables.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc27fe7e83f0b53b7e067371b69c7f4a7%2FPicture6.png?generation=1688640478682685&alt=media" alt="">
Through confusion matrix, accuracy is coming 79.27%. The accuracy is marginally higher than that of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and much higher when predicting "Yes".
Plot the model showing which variables reduce the gini impunity the most and least. Total charges and tenure reduce the gini impunity the most while phone service has the least impact.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fec25fc3ba74ab9cef1a81188209512b1%2FPicture7.png?generation=1688640726235724&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F50aa40e5dd676c8285020fd2fe627bf1%2FPicture8.png?generation=1688640896763066&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F87211e1b218c595911fbe6ea2806e27a%2FPicture9.png?generation=1688641103367564&alt=media" alt="">
Tune the model mtry=2 has the lowest OOB error rate
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F6057af5bb0719b16f1a97a58c3d4aa1d%2FPicture10.png?generation=1688641391027971&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc7045eba4ee298c58f1bd0230c24c00d%2FPicture11.png?generation=1688641605829830&alt=media" alt="">
Use random forest with mtry = 2 and ntree = 200
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F01541eff1f9c6303591aa50dd707b5f5%2FPicture12.png?generation=1688641634979403&alt=media" alt="">
Through confusion matrix, accuracy is coming 79.71%. The accuracy is marginally higher than that of default (when ntree was 500 and mtry was 4) i.e 79.27% and of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and m...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Nowadays
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset originates from the research domain of Customer Churn Prediction in the Telecom Industry. It was created as part of the project "Data-Driven Churn Prediction: ML Solutions for the Telecom Industry," completed within the Data Stewardship course (Master programme Data Science, TU Wien).
The primary purpose of this dataset is to support machine learning model development for predicting customer churn based on customer demographics, service usage, and account information.
The dataset enables the training, testing, and evaluation of classification algorithms, allowing researchers and practitioners to explore techniques for customer retention optimization.
The dataset was originally obtained from the IBM Accelerator Catalog and adapted for academic use. It was uploaded to TU Wien’s DBRepo test system and accessed via SQLAlchemy connections to the MariaDB environment.
The dataset has a tabular structure and was initially stored in CSV format. It contains:
Rows: 7,043 customer records
Columns: 21 features including customer attributes (gender, senior citizen status, partner status), account information (tenure, contract type, payment method), service usage (internet service, streaming TV, tech support), and the target variable (Churn: Yes/No).
Naming Convention:
The table in the database is named telco_customer_churn_data
.
Software Requirements:
To open and work with the dataset, any standard database client or programming language supporting MariaDB connections can be used (e.g., Python etc).
For machine learning applications, libraries such as pandas
, scikit-learn
, and joblib
are typically used.
Additional Resources:
Source code for data loading, preprocessing, model training, and evaluation is available at the associated GitHub repository: https://github.com/nazerum/fair-ml-customer-churn
When reusing the dataset, users should be aware:
Licensing: The dataset is shared under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Use Case Suitability: The dataset is best suited for classification tasks, particularly binary classification (churn vs. no churn).
Metadata Standards: Metadata describing the dataset adheres to FAIR principles and is supplemented by CodeMeta and Croissant standards for improved interoperability.
Although the results were close, the industry in the United States where customers were most likely to leave their current provider due to poor customer service appears to be cable television, with a 25 percent churn rate in 2020.
Churn rate
Churn rate, sometimes also called attrition rate, is the percentage of customers that stop utilizing a service within a time given period. It is often used to measure businesses which have a contractual customer base, especially subscriber-based service models.
With the rapid development of telecommunication industry, the service providers are inclined more towards expansion of the subscriber base. To meet the need of surviving in the competitive environment, the retention of existing customers has become a huge challenge. It is stated that the cost of acquiring a new customer is far more than that for retaining the existing one. Therefore, it is imperative for the telecom industries to use advanced analytics to understand consumer behavior and in-turn predict the association of the customers as whether or not they will leave the company.
This data set contains customer level information for a telecom company. Various attributes related to the services used are recorded for each customer.
Some possible insights could be - 1. What variables are contributing to customer churn? 2. Who are the customers more likely to churn? 3. What actions can be taken to stop them from leaving?
T-Mobile reported a prepaid customer churn rate of 2.75 percent in the United States in the first quarter of 2024. This was a decrease in comparison to the last two quarters of 2023. The company's prepaid churn rate has fallen over recent years, having peaked at over five percent in the final quarter of 2014.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Gaurang Swarge
Released under CC0: Public Domain
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘JB Link Telco Customer Churn’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/johnflag/jb-link-telco-customer-churn on 28 January 2022.
--- Dataset description provided by original source is as follows ---
This is a customized version of the widely known IBM Telco Customer Churn dataset. I've added a few more columns and modified others in order to make it a little more realistic.
My customizations are based on the following version: Telco customer churn (11.1.3+)
Below you may find a fictional business problem I created. You may use it in order to start developing something around this dataset.
JB Link is a small size telecom company located in the state of California that provides Phone and Internet services to customers on more than a 1,000 cities and 1,600 zip codes.
The company is in the market for just 6 years and has quickly grown by investing on infrastructure to bring internet and phone networks to regions that had poor or no coverage.
The company also has a very skilled sales team that is always performing well on attracting new customers. The number of new customers acquired in the past quarter represent 15% over the total.
However, by the end of this same period, only 43% of this customers stayed with the company and most of them decided on not renewing their contracts after a few months, meaning the customer churn rate is very high and the company is now facing a big challenge on retaining its customers.
The total customer churn rate last quarter was around 27%, resulting in a decrease of almost 12% in the total number of customers.
The executive leadership of JB Link is aware that some competitors are investing on new technologies and on the expansion of their network coverage and they believe this is one of the main drivers of the high customer churn rate.
Therefore, as an action plan, they have decided to created a task force inside the company that will be responsible to work on a customer retention strategy.
The task force will involve members from different areas of the company, including Sales, Finance, Marketing, Customer Service, Tech Support and a recent formed Data Science team.
The data science team will play a key role on this process and was assigned some very important tasks that will support on the decisions and actions the other teams will be taking : - Gather insights from the data to understand what is driving the high customer churn rate. - Develop a Machine Learning model that can accurately predict the customers that are more likely to churn. - Prescribe customized actions that could be taken in order to retain each of those customers.
The Data Science team was given a dataset with a random sample of 7,043 customers that can help on achieving this task.
The executives are aware that the cost of acquiring a new customer can be up to five times higher than the cost of retaining a customer, so they are expecting that the results of this project will save a lot of money to the company and make it start growing again.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analyzing customers’ characteristics and giving the early warning of customer churn based on machine learning algorithms, can help enterprises provide targeted marketing strategies and personalized services, and save a lot of operating costs. Data cleaning, oversampling, data standardization and other preprocessing operations are done on 900,000 telecom customer personal characteristics and historical behavior data set based on Python language. Appropriate model parameters were selected to build BPNN (Back Propagation Neural Network). Random Forest (RF) and Adaboost, the two classic ensemble learning models were introduced, and the Adaboost dual-ensemble learning model with RF as the base learner was put forward. The four models and the other four classical machine learning models-decision tree, naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) were utilized respectively to analyze the customer churn data. The results show that the four models have better performance in terms of recall rate, precision rate, F1 score and other indicators, and the RF-Adaboost dual-ensemble model has the best performance. Among them, the recall rates of BPNN, RF, Adaboost and RF-Adaboost dual-ensemble model on positive samples are respectively 79%, 90%, 89%,93%, the precision rates are 97%, 99%, 98%, 99%, and the F1 scores are 87%, 95%, 94%, 96%. The RF-Adaboost dual-ensemble model has the best performance, and the three indicators are 10%, 1%, and 6% higher than the reference. The prediction results of customer churn provide strong data support for telecom companies to adopt appropriate retention strategies for pre-churn customers and reduce customer churn.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global market size for Big Data Analytics in the Telecom sector was valued at approximately USD 10 billion in 2023 and is projected to reach around USD 50 billion by 2032, exhibiting a robust CAGR of 20% during the forecast period. This impressive growth trajectory is fueled by the increasing demand for advanced analytics to optimize operations, enhance customer experience, and improve network management. The telecom sector's continuous expansion and the proliferation of connected devices are also significant contributors to this market's rapid growth.
One of the primary growth factors for this market is the exponential increase in data generation. With the advent of 5G technology, the volume of data transmitted over networks has surged, necessitating sophisticated analytics to manage and utilize this data effectively. Telecom companies are increasingly relying on big data analytics to derive actionable insights from vast datasets, which can lead to improved decision-making and strategic planning. Moreover, the integration of IoT devices and services has further amplified data traffic, making analytics indispensable for telecom operators.
Another crucial driver is the need for enhanced customer experience. Telecom operators are leveraging big data analytics to gain deeper insights into customer behavior, preferences, and pain points. This data-driven approach allows for personalized marketing strategies, better customer service, and reduced churn rates. By analyzing customer data, telecom companies can identify trends and patterns that help in developing targeted campaigns and offers, thereby increasing customer loyalty and satisfaction.
Operational efficiency is also a significant factor propelling the growth of big data analytics in the telecom market. Telecom operators are under constant pressure to improve their network performance and reduce operational costs. Big data analytics enables real-time monitoring and predictive maintenance of network infrastructure, leading to fewer outages and improved service quality. Additionally, analytics helps in optimizing resource allocation and enhancing the overall efficiency of telecom operations.
Regionally, North America holds a substantial share of the big data analytics in telecom market, driven by the presence of leading telecom companies and advanced technology infrastructure. Additionally, the Asia Pacific region is expected to witness the fastest growth rate due to the rapid digital transformation and increasing adoption of advanced analytics solutions in emerging economies like China and India. European countries are also making significant investments in big data analytics to enhance their telecom services, contributing to the market's growth.
In the context of components, the Big Data Analytics in Telecom market is segmented into software, hardware, and services. The software segment is anticipated to dominate the market, as telecom operators increasingly invest in advanced analytics platforms and tools. The software solutions facilitate the processing and analysis of large datasets, enabling telecom companies to gain valuable insights and improve decision-making processes. Moreover, the software segment includes various sub-categories such as data management, data mining, and predictive analytics, each contributing significantly to market growth.
The hardware segment, although smaller compared to software, plays a critical role in the overall ecosystem. This segment includes servers, storage systems, and other hardware components necessary for data processing and storage. As data volumes continue to grow, the demand for robust and scalable hardware solutions is also on the rise. Telecom companies are investing in high-performance hardware to ensure seamless data management and analytics capabilities. The hardware segment is essential for supporting the infrastructure needed for big data analytics.
On the services front, the market is witnessing substantial growth due to the increasing need for consulting, integration, and maintenance services. Telecom operators often require expert guidance and support to implement and manage big data analytics solutions effectively. Service providers offer a range of services, including system integration, data migration, and ongoing support, which are crucial for the success
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Client churn rate in Telecom sector’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sagnikpatra/edadata on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Context "Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs."
Content The Orange Telecom's Churn Dataset, which consists of cleaned customer activity data (features), along with a churn label specifying whether a customer canceled the subscription, will be used to develop predictive models. Two datasets are made available here: The churn-80 and churn-20 datasets can be downloaded.
The two sets are from the same batch, but have been split by an 80/20 ratio. As more data is often desirable for developing ML models, let's use the larger set (that is, churn-80) for training and cross-validation purposes, and the smaller set (that is, churn-20) for final testing and model performance evaluation.
Inspiration To explore this type of models and learn more about the subject.
--- Original source retains full ownership of the source dataset ---
In the first quarter of Vodafone's financial year 2024/2025, the firm's total churn rate in Germany was ****, the lowest of its European markets. African countries had the highest churn rate at **** percent, while the United Kingdom reported the highest churn rate within Europe, with **** percent. This figure was driven by exceptionally high prepaid churn in the UK.
This dataset was created by Shiyamaladevi R S
This dataset contains customer demographics, service usage patterns, and the target variable 'Churn' for predicting customer churn in a telecom company.
This graph displays the average monthly churn rate for top wireless carriers in the United States from the first quarter of 2013 to the third quarter of 2018. The average monthly churn rate of Verizon Wireless was at 1.22 percent in the third quarter of 2018.
Churn rates of wireless carriers - additional information
The average monthly churn rate of wireless carriers refers to the average percentage of subscribers that cease to use the company’s services per month. The churn rate is used as an indicator of the health and loyalty of a company’s subscriber base and the lower the churn rate, the better the outlook is for the company. Verizon Wireless was the company with the lowest churn rate in the U.S. from 2013 to 2016. This success can be seen in the company’s revenue, with wireless services earning Verizon almost 90 billion U.S. dollars in 2016 alone.
AT&T’s churn rate in the fourth quarter of 2016 stood at 1.71 percent, the third lowest of all the wireless carriers in the U.S. The Texas-based company’s churn rate has remained relatively stable in recent years, although it has risen slightly since it was at its lowest of 1.31 percent in 2010 and 2015. The number of wireless subscribers of AT&T has nevertheless continued to grow, with the 146.8 million customers in 2016 marking the company’s highest ever total to date. Of these wireless subscribers 77.8 million held a postpaid subscription in comparison to just 13.5 million who were prepaid subscribers.
At 2.8 percent, Sprint Nextel was the wireless carrier with the highest churn rate in the U.S. in 2016. This high churn rate can be attributed to Sprint Nextel’s prepaid customer segment because whilst the postpaid churn rate has stayed mostly below 2.5 since the start of 2008, the prepaid churn rate stood at 5.62 percent in the first quarter of 2016. Although this churn rate has come down more recently after its peak at 9.93 percent at the start of 2008, it still remains higher than the company average and the respective churn rates of its competitors.
This dataset was created by R. Joseph Manoj, PhD
Telecom customer churn prediction
This data set consists of 100 variables and approx 100 thousand records. This data set contains different variables explaining the attributes of telecom industry and various factors considered important while dealing with customers of telecom industry. The target variable here is churn which explains whether the customer will churn or not. We can use this data set to predict the customers who would churn or who wouldn't churn depending on various variables available.
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Customer Churn Analysis Software Market size was valued at USD 1.9 Billion in 2024 and is projected to reach USD 8.4 Billion by 2032, growing at a CAGR of 19.80% during the forecast period 2026-2032.Global Customer Churn Analysis Software Market DriversThe market drivers for the Customer Churn Analysis Software Market can be influenced by various factors. These may include:Customer Retention Methods: As obtaining new consumers is becoming more expensive, greater emphasis is placed on retaining existing ones. Churn analysis software is used to forecast and reduce turnover, resulting in increased customer lifetime value.An Increase in the Usage of Predictive Analytics and AI Technologies: To examine big data sets, churn prediction technologies now incorporate artificial intelligence and machine learning. Their application is allowing for more accurate churn forecasting and targeted actions.
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The Telco Customer Experience Management (CEM) market is experiencing robust growth, projected to reach $2,522 million in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 7.7% from 2025 to 2033. This expansion is fueled by several key drivers. The increasing adoption of digital channels by telecom companies necessitates sophisticated CEM solutions to ensure seamless and personalized customer interactions across various touchpoints, from online portals and mobile apps to social media and in-person interactions. Rising customer expectations for immediate issue resolution and proactive support are also driving demand for advanced analytics and AI-powered CEM tools that allow telcos to anticipate and address customer needs before they escalate into complaints. Furthermore, the growing competition within the telecom industry is pushing companies to invest heavily in improving customer loyalty and reducing churn through enhanced CEM strategies. Segmentation reveals strong demand from both large enterprises and small companies across diverse sectors including OTT, banking, and retail, reflecting the broad applicability of effective CEM solutions. The North American market currently holds a significant share, driven by early adoption of advanced technologies and a high concentration of telecom companies. However, rapid technological advancements and increasing digital penetration in regions like Asia Pacific and Europe are expected to fuel significant growth in these markets over the forecast period. While the market faces challenges such as high implementation costs and the need for specialized expertise, the strategic benefits of improved customer satisfaction, reduced operational costs, and increased revenue generation outweigh these constraints. Key players like Nuance, mPhasis, Tieto, Wipro, Tech Mahindra, IBM, Huawei, ChatterPlug, ClickFox, and InMoment are actively shaping the market landscape through innovation and strategic partnerships, further accelerating growth within the Telco CEM sector.
Dataset Card for Telco Customer Churn
This dataset contains information about customers of a fictional telecommunications company, including demographic information, services subscribed to, location details, and churn behavior. This merged dataset combines the information from the original Telco Customer Churn dataset with additional details.
Dataset Details
Dataset Description
This merged Telco Customer Churn dataset provides a comprehensive view of customer… See the full description on the dataset page: https://huggingface.co/datasets/aai510-group1/telco-customer-churn.