Dataset Card for Telco Customer Churn
This dataset contains information about customers of a fictional telecommunications company, including demographic information, services subscribed to, location details, and churn behavior. This merged dataset combines the information from the original Telco Customer Churn dataset with additional details.
Dataset Details
Dataset Description
This merged Telco Customer Churn dataset provides a comprehensive view of customer… See the full description on the dataset page: https://huggingface.co/datasets/aai510-group1/telco-customer-churn.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Nowadays
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Telco Customer Churn Dataset includes carrier customer service usage, account information, demographics and churn, which can be used to predict and analyze customer churn.
2) Data Utilization (1) Telco Customer Churn Dataset has characteristics that: • This dataset includes a variety of customer and service characteristics, including gender, age group, partner and dependents, service subscription status (telephone, Internet, security, backup, device protection, technical support, streaming, etc.), contract type, payment method, monthly fee, total fee, and departure. (2) Telco Customer Churn Dataset can be used to: • Development of customer churn prediction model: Using customer service usage patterns and account information, we can build a machine learning-based churn prediction model to proactively identify customers at risk of churn.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Dataset
About the Customer chun dataset
To start out, you'll be working with real data from the Kenyan Telecommunication survey. This dataset "Churn.xls" is related to customer churn analysis for a telecommunications company. Customer churn refers to the phenomenon where customers stop doing business with a company. The dataset includes various attributes of customers and their usage patterns, which are typically used to predict whether a customer is likely to leave the service (churn) or stay. Here is a brief description of the variables provided in the dataset: 1.ID: A unique identifier for each customer. 2.COLLEGE: Indicates whether the customer has a college degree ("one" for yes, "zero" for no). 3.INCOME: The annual income of the customer. 4.OVERAGE: The number of overage minutes the customer used. 5.LEFTOVER: The number of leftover minutes the customer has. 6.HOUSE: The value of the customer's house. 7.HANDSET_PRICE: The price of the customer's handset. 8.OVER_15MINS_CALLS_PER_MONTH: The number of calls per month that exceed 15 minutes. 9.AVERAGE_CALL_DURATION: The average duration of calls made by the customer. 10.REPORTED_SATISFACTION: The customer's reported level of satisfaction with the service (e.g., "unsat", "very_sat"). 11.REPORTED_USAGE_LEVEL: The customer's reported usage level of the service (e.g., "little", "very_high"). 12.CONSIDERING_CHANGE_OF_PLAN: Indicates whether the customer is considering changing their plan (e.g., "no", "considering"). 13.LEAVE: The target variable indicating whether the customer decided to leave ("LEAVE") or stay ("STAY"). Customers who left within the last month – the column is called "LEAVE". Based on these variables, the dataset shall beused for predictive modeling to identify factors that influence customer churn and to develop strategies to retain customers. The variables cover demographic information, usage patterns, customer satisfaction, and the likelihood of changing plans, all of which are crucial in understanding and predicting churn behavior.
Why Analysis? Customer churn refers to the phenomenon where customers discontinue their relationship or subscription with a company or service provider. It represents the rate at which customers stop using a company's products or services within a specific period. Churn is an important metric for businesses as it directly impacts revenue, growth, and customer retention. In the context of the Churn dataset, the churn label indicates whether a customer has churned or not. A churned customer is one who has decided to discontinue their subscription or usage of the company's services. On the other hand, a non-churned customer is one who continues to remain engaged and retains their relationship with the company. Understanding customer churn is crucial for businesses to identify patterns, factors, and indicators that contribute to customer attrition. By analyzing churn behavior and its associated features, companies can develop strategies to retain existing customers, improve customer satisfaction, and reduce customer turnover. Predictive modeling techniques can also be applied to forecast and proactively address potential churn, enabling companies to take proactive measures to retain at-risk customers.
"Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]
Each row represents a customer, each column contains customer’s attributes described on the column Metadata.
The data set includes information about:
To explore this type of models and learn more about the subject.
New version from IBM: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset originates from the research domain of Customer Churn Prediction in the Telecom Industry. It was created as part of the project "Data-Driven Churn Prediction: ML Solutions for the Telecom Industry," completed within the Data Stewardship course (Master programme Data Science, TU Wien).
The primary purpose of this dataset is to support machine learning model development for predicting customer churn based on customer demographics, service usage, and account information.
The dataset enables the training, testing, and evaluation of classification algorithms, allowing researchers and practitioners to explore techniques for customer retention optimization.
The dataset was originally obtained from the IBM Accelerator Catalog and adapted for academic use. It was uploaded to TU Wien’s DBRepo test system and accessed via SQLAlchemy connections to the MariaDB environment.
The dataset has a tabular structure and was initially stored in CSV format. It contains:
Rows: 7,043 customer records
Columns: 21 features including customer attributes (gender, senior citizen status, partner status), account information (tenure, contract type, payment method), service usage (internet service, streaming TV, tech support), and the target variable (Churn: Yes/No).
Naming Convention:
The table in the database is named telco_customer_churn_data
.
Software Requirements:
To open and work with the dataset, any standard database client or programming language supporting MariaDB connections can be used (e.g., Python etc).
For machine learning applications, libraries such as pandas
, scikit-learn
, and joblib
are typically used.
Additional Resources:
Source code for data loading, preprocessing, model training, and evaluation is available at the associated GitHub repository: https://github.com/nazerum/fair-ml-customer-churn
When reusing the dataset, users should be aware:
Licensing: The dataset is shared under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Use Case Suitability: The dataset is best suited for classification tasks, particularly binary classification (churn vs. no churn).
Metadata Standards: Metadata describing the dataset adheres to FAIR principles and is supplemented by CodeMeta and Croissant standards for improved interoperability.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Telco Customer Churn’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/blastchar/telco-customer-churn on 21 November 2021.
--- Dataset description provided by original source is as follows ---
"Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]
Each row represents a customer, each column contains customer’s attributes described on the column Metadata.
The data set includes information about:
To explore this type of models and learn more about the subject.
New version from IBM: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113
--- Original source retains full ownership of the source dataset ---
📝 Dataset Description This dataset contains information about customers of a telecommunications company, including their demographic details, account information, service subscriptions, and churn status. It is a modified version of the popular Telco Churn dataset, curated for exploratory data analysis, machine learning model development, and churn prediction tasks.
The dataset includes simulated missing values in some columns to reflect real-world data issues and support preprocessing and imputation tasks. This makes it especially useful for demonstrating data cleaning techniques and evaluating model robustness.
📂 Files Included telco_data_modified.csv: The main dataset with 21 columns and 7043 rows (some missing values are intentionally inserted).
📌 Features Column Name Description customerID Unique identifier for each customer gender Customer gender: Male/Female SeniorCitizen Indicates if the customer is a senior citizen (0 = No, 1 = Yes) Partner Whether the customer has a partner Dependents Whether the customer has dependents tenure Number of months the customer has stayed with the company PhoneService Whether the customer has phone service MultipleLines Whether the customer has multiple lines InternetService Customer's internet service provider (DSL, Fiber optic, No) OnlineSecurity Whether the customer has online security OnlineBackup Whether the customer has online backup DeviceProtection Whether the customer has device protection TechSupport Whether the customer has tech support StreamingTV Whether the customer has streaming TV StreamingMovies Whether the customer has streaming movies Contract Type of contract: Month-to-month, One year, Two year PaperlessBilling Whether the customer uses paperless billing PaymentMethod Payment method: (e.g., Electronic check, Mailed check, etc.) MonthlyCharges Monthly charges TotalCharges Total charges to date Churn Whether the customer has left the company (Yes/No)
🔍 Use Cases Binary classification: Predict customer churn
Data preprocessing and imputation exercises
Feature engineering and importance analysis
Customer segmentation and churn modeling
⚠️ Notes Missing values were intentionally inserted in the dataset to help simulate real-world conditions.
Some preprocessing may be required before modeling (e.g., converting categorical to numerical data, handling TotalCharges as numeric).
🏷️ Tags
🙏 Acknowledgements This dataset is based on the original Telco Customer Churn dataset (initially provided by IBM). The current version has been modified for academic and practical exercises.
Although the results were close, the industry in the United States where customers were most likely to leave their current provider due to poor customer service appears to be cable television, with a 25 percent churn rate in 2020.
Churn rate
Churn rate, sometimes also called attrition rate, is the percentage of customers that stop utilizing a service within a time given period. It is often used to measure businesses which have a contractual customer base, especially subscriber-based service models.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset is about telecom industry which tells about the number of customers who churned the service. It consists of 3333 observations having 21 variables. We have to predict which customer is going to churn the service.
Account.Length: how long account has been active.
VMail.Message: Number of voice mail messages send by the customer.
Day.Mins: Time spent on day calls.
Eve.Mins: Time spent on evening calls.
Night.Mins: Time spent on night calls.
Intl. Mins: Time spent on international calls.
Day.Calls: Number of day calls by customers.
Eve.Calls: Number of evening calls by customers.
Intl.Calls: Number of international calls.
Night.Calls: Number of night calls by customer.
Day.Charge: Charges of Day Calls.
Night.Charge: Charges of Night Calls.
Eve.Charge: Charges of evening Calls.
Intl.Charge: Charges of international calls.
VMail.Plan: Voice mail plan taken by the customer or not.
State: State in Area of study.
Phone: Phone number of the customer.
Area.Code: Area Code of customer.
Int.l.Plan: Does customer have international plan or not.
CustServ.Calls: Number of customer service calls by customer.
Churn : Customers who churned the telecom service or who doesn’t(0=“Churner”, 1=“Non-Churner”)
https://choosealicense.com/licenses/ecl-2.0/https://choosealicense.com/licenses/ecl-2.0/
Telco Churn 7k
A 7,043-row customer-retention dataset drawn from a U.S. telecom provider. Each record profiles one account with 21 concise attributes and a Churn flag (Yes / No) indicating whether the customer left within the last month. The schema is:
customerID – unique subscriber identifier
gender – {Female, Male}
SeniorCitizen – {0, 1}
Partner, Dependents – {Yes, No}
tenure – months of service (0–72)
PhoneService, MultipleLines – {Yes, No, No phone service}… See the full description on the dataset page: https://huggingface.co/datasets/mnemoraorg/telco-churn-7k.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The "Real World Customer Churn Dataset in Telco Domain" is a comprehensive collection of anonymized data that provides insights into customer behavior and churn prediction within the telecommunications industry.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F6361330%2F860271e0362e6c10503889f289201402%2FCustomer-churn.jpg?generation=1698182677600097&alt=media" alt="Dataset Image">
The dataset contains data on over 60,000 customers across more than 10+ distinct usage categories. Some of the key usage categories include:
The dataset consists of the following key files:
The "Real World Customer Churn Dataset in Telco Domain" offers a range of potential use cases, including:
This dataset's real-world aspect is of significant importance. It reflects actual customer interactions with a major telecommunications company in Sri Lanka, offering insights that can be directly applied to real-world scenarios. The dataset is sourced from one of the largest telco companies in the country, adding credibility and relevance to the insights it provides.
Understanding customer churn and usage behavior is pivotal for the telecommunications industry, and this dataset empowers researchers, data scientists, and businesses to gain deeper insights into these aspects.
The dataset is anonymized to protect customer privacy, and all data used is in compliance with privacy regulations and agreements. Users are encouraged to explore and contribute to the "Real World Customer Churn Dataset in Telco Domain."
Thank you for your valuable contributions to this dataset.
This graph displays the average monthly churn rate for top wireless carriers in the United States from the first quarter of 2013 to the third quarter of 2018. The average monthly churn rate of Verizon Wireless was at **** percent in the third quarter of 2018. Churn rates of wireless carriers - additional information The average monthly churn rate of wireless carriers refers to the average percentage of subscribers that cease to use the company’s services per month. The churn rate is used as an indicator of the health and loyalty of a company’s subscriber base and the lower the churn rate, the better the outlook is for the company. Verizon Wireless was the company with the lowest churn rate in the U.S. from 2013 to 2016. This success can be seen in the company’s revenue, with wireless services earning Verizon almost ** billion U.S. dollars in 2016 alone. AT&T’s churn rate in the fourth quarter of 2016 stood at **** percent, the third lowest of all the wireless carriers in the U.S. The Texas-based company’s churn rate has remained relatively stable in recent years, although it has risen slightly since it was at its lowest of **** percent in 2010 and 2015. The number of wireless subscribers of AT&T has nevertheless continued to grow, with the ***** million customers in 2016 marking the company’s highest ever total to date. Of these wireless subscribers **** million held a postpaid subscription in comparison to just **** million who were prepaid subscribers. At *** percent, Sprint Nextel was the wireless carrier with the highest churn rate in the U.S. in 2016. This high churn rate can be attributed to Sprint Nextel’s prepaid customer segment because whilst the postpaid churn rate has stayed mostly below *** since the start of 2008, the prepaid churn rate stood at **** percent in the first quarter of 2016. Although this churn rate has come down more recently after its peak at **** percent at the start of 2008, it still remains higher than the company average and the respective churn rates of its competitors.
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The Telco Customer Experience Management (CEM) market is experiencing robust growth, projected to reach $2,522 million in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 7.7% from 2025 to 2033. This expansion is fueled by several key drivers. The increasing adoption of digital channels by telecom companies necessitates sophisticated CEM solutions to ensure seamless and personalized customer interactions across various touchpoints, from online portals and mobile apps to social media and in-person interactions. Rising customer expectations for immediate issue resolution and proactive support are also driving demand for advanced analytics and AI-powered CEM tools that allow telcos to anticipate and address customer needs before they escalate into complaints. Furthermore, the growing competition within the telecom industry is pushing companies to invest heavily in improving customer loyalty and reducing churn through enhanced CEM strategies. Segmentation reveals strong demand from both large enterprises and small companies across diverse sectors including OTT, banking, and retail, reflecting the broad applicability of effective CEM solutions. The North American market currently holds a significant share, driven by early adoption of advanced technologies and a high concentration of telecom companies. However, rapid technological advancements and increasing digital penetration in regions like Asia Pacific and Europe are expected to fuel significant growth in these markets over the forecast period. While the market faces challenges such as high implementation costs and the need for specialized expertise, the strategic benefits of improved customer satisfaction, reduced operational costs, and increased revenue generation outweigh these constraints. Key players like Nuance, mPhasis, Tieto, Wipro, Tech Mahindra, IBM, Huawei, ChatterPlug, ClickFox, and InMoment are actively shaping the market landscape through innovation and strategic partnerships, further accelerating growth within the Telco CEM sector.
📌**Dataset Story** Telco churn data includes information about a fictitious telecom company that provided home phone and Internet services to 7,043 customers in California in the third quarter. It shows which customers left, stayed, or signed up for their service.
🆔**CustomerId:** Customer Id 👫**Gender:** Gender 👵**SeniorCitizen:** Whether the customer is elderly (1, 0) 👫**Partner:** Whether the customer has a partner (Yes, No) 👨👨👧👧**Dependents:** Whether the customer has dependents (Yes, No) 📜**Tenure:** Number of months the customer has been with the company ☎️**PhoneService:** Whether the customer has phone service (Yes, No) 📞**MultipleLines:** Whether the customer has more than one line (Yes, No, No phone service) 💻**InternetService:** Whether the customer has internet service provider (DSL, Fiber optic, No) ㊙️**OnlineSecurity:** Whether the customer has online security (Yes, No, No internet service) ◀️**OnlineBackup:** Whether the customer has online backup (Yes, No, No internet service) 🚫**DeviceProtection:** Whether the customer has device protection (Yes, No, No internet service) 🧢**TechSupport:** Whether the customer has technical support (Yes, No, No internet service) 📺**StreamingTV**: Whether the customer has streaming TV (Yes, No, No Internet service) 📽️**StreamingMovies:** Whether the customer streams movies (Yes, No, No internet service) 🗞️**Contract:** Whether the customer's contract term (Month-to-month, One year, Two years) 📰**PaperlessBilling:** Whether the customer has paperless billing (Yes, No) 💳**PaymentMethod:** Whether the customer's payment method (Electronic check, Postal check, Wire transfer (automatic), Credit card (automatic)) 🤑**MonthlyCharges:** The amount charged to the customer monthly 💰**TotalCharges:** The total amount charged to the customer ❌**Churn:** Whether the customer uses (Yes or No)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Churn in Telecom's dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/becksddf/churn-in-telecoms-dataset on 30 September 2021.
--- No further description of dataset provided by original source ---
--- Original source retains full ownership of the source dataset ---
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Forecast: TELUS Telecom Company Churn Rate in Canada 2022 - 2026 Discover more data with ReportLinker!
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
mohab-yasser2/telecom-churn-model dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Turkey Churn Rate: Period End: Operators: Turkcell data was reported at 1.900 % in Jun 2018. This records an increase from the previous number of 1.500 % for Mar 2018. Turkey Churn Rate: Period End: Operators: Turkcell data is updated quarterly, averaging 2.660 % from Mar 2010 (Median) to Jun 2018, with 34 observations. The data reached an all-time high of 3.870 % in Mar 2010 and a record low of 1.400 % in Mar 2017. Turkey Churn Rate: Period End: Operators: Turkcell data remains active status in CEIC and is reported by Information and Communication Technologies Authority . The data is categorized under Global Database’s Turkey – Table TR.TB003: Telecommunication Statistics.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global market size for Big Data Analytics in the Telecom sector was valued at approximately USD 10 billion in 2023 and is projected to reach around USD 50 billion by 2032, exhibiting a robust CAGR of 20% during the forecast period. This impressive growth trajectory is fueled by the increasing demand for advanced analytics to optimize operations, enhance customer experience, and improve network management. The telecom sector's continuous expansion and the proliferation of connected devices are also significant contributors to this market's rapid growth.
One of the primary growth factors for this market is the exponential increase in data generation. With the advent of 5G technology, the volume of data transmitted over networks has surged, necessitating sophisticated analytics to manage and utilize this data effectively. Telecom companies are increasingly relying on big data analytics to derive actionable insights from vast datasets, which can lead to improved decision-making and strategic planning. Moreover, the integration of IoT devices and services has further amplified data traffic, making analytics indispensable for telecom operators.
Another crucial driver is the need for enhanced customer experience. Telecom operators are leveraging big data analytics to gain deeper insights into customer behavior, preferences, and pain points. This data-driven approach allows for personalized marketing strategies, better customer service, and reduced churn rates. By analyzing customer data, telecom companies can identify trends and patterns that help in developing targeted campaigns and offers, thereby increasing customer loyalty and satisfaction.
Operational efficiency is also a significant factor propelling the growth of big data analytics in the telecom market. Telecom operators are under constant pressure to improve their network performance and reduce operational costs. Big data analytics enables real-time monitoring and predictive maintenance of network infrastructure, leading to fewer outages and improved service quality. Additionally, analytics helps in optimizing resource allocation and enhancing the overall efficiency of telecom operations.
Regionally, North America holds a substantial share of the big data analytics in telecom market, driven by the presence of leading telecom companies and advanced technology infrastructure. Additionally, the Asia Pacific region is expected to witness the fastest growth rate due to the rapid digital transformation and increasing adoption of advanced analytics solutions in emerging economies like China and India. European countries are also making significant investments in big data analytics to enhance their telecom services, contributing to the market's growth.
In the context of components, the Big Data Analytics in Telecom market is segmented into software, hardware, and services. The software segment is anticipated to dominate the market, as telecom operators increasingly invest in advanced analytics platforms and tools. The software solutions facilitate the processing and analysis of large datasets, enabling telecom companies to gain valuable insights and improve decision-making processes. Moreover, the software segment includes various sub-categories such as data management, data mining, and predictive analytics, each contributing significantly to market growth.
The hardware segment, although smaller compared to software, plays a critical role in the overall ecosystem. This segment includes servers, storage systems, and other hardware components necessary for data processing and storage. As data volumes continue to grow, the demand for robust and scalable hardware solutions is also on the rise. Telecom companies are investing in high-performance hardware to ensure seamless data management and analytics capabilities. The hardware segment is essential for supporting the infrastructure needed for big data analytics.
On the services front, the market is witnessing substantial growth due to the increasing need for consulting, integration, and maintenance services. Telecom operators often require expert guidance and support to implement and manage big data analytics solutions effectively. Service providers offer a range of services, including system integration, data migration, and ongoing support, which are crucial for the success
Dataset Card for Telco Customer Churn
This dataset contains information about customers of a fictional telecommunications company, including demographic information, services subscribed to, location details, and churn behavior. This merged dataset combines the information from the original Telco Customer Churn dataset with additional details.
Dataset Details
Dataset Description
This merged Telco Customer Churn dataset provides a comprehensive view of customer… See the full description on the dataset page: https://huggingface.co/datasets/aai510-group1/telco-customer-churn.