📝 Dataset Description This dataset contains information about customers of a telecommunications company, including their demographic details, account information, service subscriptions, and churn status. It is a modified version of the popular Telco Churn dataset, curated for exploratory data analysis, machine learning model development, and churn prediction tasks.
The dataset includes simulated missing values in some columns to reflect real-world data issues and support preprocessing and imputation tasks. This makes it especially useful for demonstrating data cleaning techniques and evaluating model robustness.
📂 Files Included telco_data_modified.csv: The main dataset with 21 columns and 7043 rows (some missing values are intentionally inserted).
📌 Features Column Name Description customerID Unique identifier for each customer gender Customer gender: Male/Female SeniorCitizen Indicates if the customer is a senior citizen (0 = No, 1 = Yes) Partner Whether the customer has a partner Dependents Whether the customer has dependents tenure Number of months the customer has stayed with the company PhoneService Whether the customer has phone service MultipleLines Whether the customer has multiple lines InternetService Customer's internet service provider (DSL, Fiber optic, No) OnlineSecurity Whether the customer has online security OnlineBackup Whether the customer has online backup DeviceProtection Whether the customer has device protection TechSupport Whether the customer has tech support StreamingTV Whether the customer has streaming TV StreamingMovies Whether the customer has streaming movies Contract Type of contract: Month-to-month, One year, Two year PaperlessBilling Whether the customer uses paperless billing PaymentMethod Payment method: (e.g., Electronic check, Mailed check, etc.) MonthlyCharges Monthly charges TotalCharges Total charges to date Churn Whether the customer has left the company (Yes/No)
🔍 Use Cases Binary classification: Predict customer churn
Data preprocessing and imputation exercises
Feature engineering and importance analysis
Customer segmentation and churn modeling
⚠️ Notes Missing values were intentionally inserted in the dataset to help simulate real-world conditions.
Some preprocessing may be required before modeling (e.g., converting categorical to numerical data, handling TotalCharges as numeric).
🏷️ Tags
🙏 Acknowledgements This dataset is based on the original Telco Customer Churn dataset (initially provided by IBM). The current version has been modified for academic and practical exercises.
"Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]
Each row represents a customer, each column contains customer’s attributes described on the column Metadata.
The data set includes information about:
To explore this type of models and learn more about the subject.
New version from IBM: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113
📌**Dataset Story** Telco churn data includes information about a fictitious telecom company that provided home phone and Internet services to 7,043 customers in California in the third quarter. It shows which customers left, stayed, or signed up for their service.
🆔**CustomerId:** Customer Id 👫**Gender:** Gender 👵**SeniorCitizen:** Whether the customer is elderly (1, 0) 👫**Partner:** Whether the customer has a partner (Yes, No) 👨👨👧👧**Dependents:** Whether the customer has dependents (Yes, No) 📜**Tenure:** Number of months the customer has been with the company ☎️**PhoneService:** Whether the customer has phone service (Yes, No) 📞**MultipleLines:** Whether the customer has more than one line (Yes, No, No phone service) 💻**InternetService:** Whether the customer has internet service provider (DSL, Fiber optic, No) ㊙️**OnlineSecurity:** Whether the customer has online security (Yes, No, No internet service) ◀️**OnlineBackup:** Whether the customer has online backup (Yes, No, No internet service) 🚫**DeviceProtection:** Whether the customer has device protection (Yes, No, No internet service) 🧢**TechSupport:** Whether the customer has technical support (Yes, No, No internet service) 📺**StreamingTV**: Whether the customer has streaming TV (Yes, No, No Internet service) 📽️**StreamingMovies:** Whether the customer streams movies (Yes, No, No internet service) 🗞️**Contract:** Whether the customer's contract term (Month-to-month, One year, Two years) 📰**PaperlessBilling:** Whether the customer has paperless billing (Yes, No) 💳**PaymentMethod:** Whether the customer's payment method (Electronic check, Postal check, Wire transfer (automatic), Credit card (automatic)) 🤑**MonthlyCharges:** The amount charged to the customer monthly 💰**TotalCharges:** The total amount charged to the customer ❌**Churn:** Whether the customer uses (Yes or No)
This is a customized version of the widely known IBM Telco Customer Churn dataset. I've added a few more columns and modified others in order to make it a little more realistic.
My customizations are based on the following version: Telco customer churn (11.1.3+)
Below you may find a fictional business problem I created. You may use it in order to start developing something around this dataset.
JB Link is a small size telecom company located in the state of California that provides Phone and Internet services to customers on more than a 1,000 cities and 1,600 zip codes.
The company is in the market for just 6 years and has quickly grown by investing on infrastructure to bring internet and phone networks to regions that had poor or no coverage.
The company also has a very skilled sales team that is always performing well on attracting new customers. The number of new customers acquired in the past quarter represent 15% over the total.
However, by the end of this same period, only 43% of this customers stayed with the company and most of them decided on not renewing their contracts after a few months, meaning the customer churn rate is very high and the company is now facing a big challenge on retaining its customers.
The total customer churn rate last quarter was around 27%, resulting in a decrease of almost 12% in the total number of customers.
The executive leadership of JB Link is aware that some competitors are investing on new technologies and on the expansion of their network coverage and they believe this is one of the main drivers of the high customer churn rate.
Therefore, as an action plan, they have decided to created a task force inside the company that will be responsible to work on a customer retention strategy.
The task force will involve members from different areas of the company, including Sales, Finance, Marketing, Customer Service, Tech Support and a recent formed Data Science team.
The data science team will play a key role on this process and was assigned some very important tasks that will support on the decisions and actions the other teams will be taking : - Gather insights from the data to understand what is driving the high customer churn rate. - Develop a Machine Learning model that can accurately predict the customers that are more likely to churn. - Prescribe customized actions that could be taken in order to retain each of those customers.
The Data Science team was given a dataset with a random sample of 7,043 customers that can help on achieving this task.
The executives are aware that the cost of acquiring a new customer can be up to five times higher than the cost of retaining a customer, so they are expecting that the results of this project will save a lot of money to the company and make it start growing again.
This dataset was created by wakibia
This dataset was created by John Wilken Christoper
Strategies and Solutions" is a detailed exploration of the persistent challenge of customer churn within the telecommunications industry. This resource delves into the factors that drive customer attrition in telecom, including pricing, customer service, competition, and technological advancements, emphasizing the significance of customer retention.
This comprehensive guide offers a wide array of strategies and innovative solutions for telecom companies to reduce churn rates and enhance customer satisfaction. Whether you are a telecom professional, business owner, or interested in the industry, this publication provides valuable insights and actionable recommendations to address this critical issue. "Understanding Telco Customer Churn" is an indispensable resource for those seeking to improve customer relationships and overall business success in the telecommunications sector.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Ramasubramanian M
Released under MIT
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Dataset
About the Customer chun dataset
To start out, you'll be working with real data from the Kenyan Telecommunication survey. This dataset "Churn.xls" is related to customer churn analysis for a telecommunications company. Customer churn refers to the phenomenon where customers stop doing business with a company. The dataset includes various attributes of customers and their usage patterns, which are typically used to predict whether a customer is likely to leave the service (churn) or stay. Here is a brief description of the variables provided in the dataset: 1.ID: A unique identifier for each customer. 2.COLLEGE: Indicates whether the customer has a college degree ("one" for yes, "zero" for no). 3.INCOME: The annual income of the customer. 4.OVERAGE: The number of overage minutes the customer used. 5.LEFTOVER: The number of leftover minutes the customer has. 6.HOUSE: The value of the customer's house. 7.HANDSET_PRICE: The price of the customer's handset. 8.OVER_15MINS_CALLS_PER_MONTH: The number of calls per month that exceed 15 minutes. 9.AVERAGE_CALL_DURATION: The average duration of calls made by the customer. 10.REPORTED_SATISFACTION: The customer's reported level of satisfaction with the service (e.g., "unsat", "very_sat"). 11.REPORTED_USAGE_LEVEL: The customer's reported usage level of the service (e.g., "little", "very_high"). 12.CONSIDERING_CHANGE_OF_PLAN: Indicates whether the customer is considering changing their plan (e.g., "no", "considering"). 13.LEAVE: The target variable indicating whether the customer decided to leave ("LEAVE") or stay ("STAY"). Customers who left within the last month – the column is called "LEAVE". Based on these variables, the dataset shall beused for predictive modeling to identify factors that influence customer churn and to develop strategies to retain customers. The variables cover demographic information, usage patterns, customer satisfaction, and the likelihood of changing plans, all of which are crucial in understanding and predicting churn behavior.
Why Analysis? Customer churn refers to the phenomenon where customers discontinue their relationship or subscription with a company or service provider. It represents the rate at which customers stop using a company's products or services within a specific period. Churn is an important metric for businesses as it directly impacts revenue, growth, and customer retention. In the context of the Churn dataset, the churn label indicates whether a customer has churned or not. A churned customer is one who has decided to discontinue their subscription or usage of the company's services. On the other hand, a non-churned customer is one who continues to remain engaged and retains their relationship with the company. Understanding customer churn is crucial for businesses to identify patterns, factors, and indicators that contribute to customer attrition. By analyzing churn behavior and its associated features, companies can develop strategies to retain existing customers, improve customer satisfaction, and reduce customer turnover. Predictive modeling techniques can also be applied to forecast and proactively address potential churn, enabling companies to take proactive measures to retain at-risk customers.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The "Real World Customer Churn Dataset in Telco Domain" is a comprehensive collection of anonymized data that provides insights into customer behavior and churn prediction within the telecommunications industry.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F6361330%2F860271e0362e6c10503889f289201402%2FCustomer-churn.jpg?generation=1698182677600097&alt=media" alt="Dataset Image">
The dataset contains data on over 60,000 customers across more than 10+ distinct usage categories. Some of the key usage categories include:
The dataset consists of the following key files:
The "Real World Customer Churn Dataset in Telco Domain" offers a range of potential use cases, including:
This dataset's real-world aspect is of significant importance. It reflects actual customer interactions with a major telecommunications company in Sri Lanka, offering insights that can be directly applied to real-world scenarios. The dataset is sourced from one of the largest telco companies in the country, adding credibility and relevance to the insights it provides.
Understanding customer churn and usage behavior is pivotal for the telecommunications industry, and this dataset empowers researchers, data scientists, and businesses to gain deeper insights into these aspects.
The dataset is anonymized to protect customer privacy, and all data used is in compliance with privacy regulations and agreements. Users are encouraged to explore and contribute to the "Real World Customer Churn Dataset in Telco Domain."
Thank you for your valuable contributions to this dataset.
"Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs."
The Orange Telecom's Churn Dataset, which consists of cleaned customer activity data (features), along with a churn label specifying whether a customer canceled the subscription, will be used to develop predictive models. Two datasets are made available here: The churn-80 and churn-20 datasets can be downloaded.
The two sets are from the same batch, but have been split by an 80/20 ratio. As more data is often desirable for developing ML models, let's use the larger set (that is, churn-80) for training and cross-validation purposes, and the smaller set (that is, churn-20) for final testing and model performance evaluation.
To explore this type of models and learn more about the subject.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset is about telecom industry which tells about the number of customers who churned the service. It consists of 3333 observations having 21 variables. We have to predict which customer is going to churn the service.
Account.Length: how long account has been active.
VMail.Message: Number of voice mail messages send by the customer.
Day.Mins: Time spent on day calls.
Eve.Mins: Time spent on evening calls.
Night.Mins: Time spent on night calls.
Intl. Mins: Time spent on international calls.
Day.Calls: Number of day calls by customers.
Eve.Calls: Number of evening calls by customers.
Intl.Calls: Number of international calls.
Night.Calls: Number of night calls by customer.
Day.Charge: Charges of Day Calls.
Night.Charge: Charges of Night Calls.
Eve.Charge: Charges of evening Calls.
Intl.Charge: Charges of international calls.
VMail.Plan: Voice mail plan taken by the customer or not.
State: State in Area of study.
Phone: Phone number of the customer.
Area.Code: Area Code of customer.
Int.l.Plan: Does customer have international plan or not.
CustServ.Calls: Number of customer service calls by customer.
Churn : Customers who churned the telecom service or who doesn’t(0=“Churner”, 1=“Non-Churner”)
This dataset was created by Amal Joseph3377
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
If you found the dataset useful, your upvote will help others discover it. Thanks for your support!
This dataset simulates customer behavior for a fictional telecommunications company. It contains demographic information, account details, services subscribed to, and whether the customer ultimately churned (stopped using the service) or not. The data is synthetically generated but designed to reflect realistic patterns often found in telecom churn scenarios.
Purpose:
The primary goal of this dataset is to provide a clean and straightforward resource for beginners learning about:
Features:
The dataset includes the following columns:
CustomerID
: Unique identifier for each customer.Age
: Customer's age in years.Gender
: Customer's gender (Male/Female).Location
: General location of the customer (e.g., New York, Los Angeles).SubscriptionDurationMonths
: How many months the customer has been subscribed.MonthlyCharges
: The amount the customer is charged each month.TotalCharges
: The total amount the customer has been charged over their subscription period.ContractType
: The type of contract the customer has (Month-to-month, One year, Two year).PaymentMethod
: How the customer pays their bill (e.g., Electronic check, Credit card).OnlineSecurity
: Whether the customer has online security service (Yes, No, No internet service).TechSupport
: Whether the customer has tech support service (Yes, No, No internet service).StreamingTV
: Whether the customer has TV streaming service (Yes, No, No internet service).StreamingMovies
: Whether the customer has movie streaming service (Yes, No, No internet service).Churn
: (Target Variable) Whether the customer churned (1 = Yes, 0 = No).Data Quality:
This dataset is intentionally clean with no missing values, making it easy for beginners to focus on analysis and modeling concepts without complex data cleaning steps.
Inspiration:
Understanding customer churn is crucial for many businesses. This dataset provides a sandbox environment to practice the fundamental techniques used in churn analysis and prediction.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8d3442e6c82d8026c6a448e4780ab38c%2FPicture2.png?generation=1688638685268853&alt=media" alt="">
9. Plot the decision tree
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F9ab0591e323dc30fe116c79f6d014d06%2FPicture3.png?generation=1688638747644320&alt=media" alt="">
Average customer churn is 27%. The churn can take place if the tenure is more than >=7.5 and there is no internet service
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F16080ac04d3743ec238227e1ef2c8269%2FPicture4.png?generation=1688639197455166&alt=media" alt="">
Significant variables are Internet Service, Tenure and the least significant are Streaming Movies, Tech Support.
Run library(randomForest). Here we are using the default ntree (500) and mtry (p/3) where p is the number of
independent variables.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc27fe7e83f0b53b7e067371b69c7f4a7%2FPicture6.png?generation=1688640478682685&alt=media" alt="">
Through confusion matrix, accuracy is coming 79.27%. The accuracy is marginally higher than that of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and much higher when predicting "Yes".
Plot the model showing which variables reduce the gini impunity the most and least. Total charges and tenure reduce the gini impunity the most while phone service has the least impact.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fec25fc3ba74ab9cef1a81188209512b1%2FPicture7.png?generation=1688640726235724&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F50aa40e5dd676c8285020fd2fe627bf1%2FPicture8.png?generation=1688640896763066&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F87211e1b218c595911fbe6ea2806e27a%2FPicture9.png?generation=1688641103367564&alt=media" alt="">
Tune the model mtry=2 has the lowest OOB error rate
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F6057af5bb0719b16f1a97a58c3d4aa1d%2FPicture10.png?generation=1688641391027971&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc7045eba4ee298c58f1bd0230c24c00d%2FPicture11.png?generation=1688641605829830&alt=media" alt="">
Use random forest with mtry = 2 and ntree = 200
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F01541eff1f9c6303591aa50dd707b5f5%2FPicture12.png?generation=1688641634979403&alt=media" alt="">
Through confusion matrix, accuracy is coming 79.71%. The accuracy is marginally higher than that of default (when ntree was 500 and mtry was 4) i.e 79.27% and of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and m...
This dataset was created by Shiyamaladevi R S
This dataset was created by Anjolaoluwa Ajayi
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains detailed information about customers of a telecom company, including demographics, service subscriptions, billing information, and churn status. It captures key aspects such as whether a customer has phone or internet services, their tenure with the company, usage of additional services like online security, and their chosen payment methods. The dataset is particularly useful for analyzing patterns and factors that contribute to customer churn, helping the company understand and potentially mitigate reasons for customer departure.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
PLEASE UPVOTE THIS DATASET IF THIS HELP YOU... GLAD TO ANY FORKS HERE
BACKGROUND DQLab Telco is a telecommunications company with numerous locations all over the world. In order to ensure that customers are not left behind, DQLab Telco has consistently paid attention to the customer experience since its establishment in 2019.
Even though DQLab Telco is only a little over a year old, many of its customers have already changed their subscriptions to rival companies. By using machine learning, management hopes to lower the number of customers who leave.
After cleaning the data yesterday, it is now time for us to build the best model to forecast customer churn.
TASKS & STEPS Yesterday, we completed "Cleansing Data" as part of project part 1. You are now expected to develop the appropriate model as a data scientist.
You will perform "Machine Learning Modeling" in this assignment using data from the previous month, specifically June 2020.
The actions that must be taken are, 1. Analyze exploratory data first. 2. Carry out pre-processing of the data. 3. Using modeling from machine learning. 4. Picking the Ideal Model.
This dataset was created by Yoseph Endale
📝 Dataset Description This dataset contains information about customers of a telecommunications company, including their demographic details, account information, service subscriptions, and churn status. It is a modified version of the popular Telco Churn dataset, curated for exploratory data analysis, machine learning model development, and churn prediction tasks.
The dataset includes simulated missing values in some columns to reflect real-world data issues and support preprocessing and imputation tasks. This makes it especially useful for demonstrating data cleaning techniques and evaluating model robustness.
📂 Files Included telco_data_modified.csv: The main dataset with 21 columns and 7043 rows (some missing values are intentionally inserted).
📌 Features Column Name Description customerID Unique identifier for each customer gender Customer gender: Male/Female SeniorCitizen Indicates if the customer is a senior citizen (0 = No, 1 = Yes) Partner Whether the customer has a partner Dependents Whether the customer has dependents tenure Number of months the customer has stayed with the company PhoneService Whether the customer has phone service MultipleLines Whether the customer has multiple lines InternetService Customer's internet service provider (DSL, Fiber optic, No) OnlineSecurity Whether the customer has online security OnlineBackup Whether the customer has online backup DeviceProtection Whether the customer has device protection TechSupport Whether the customer has tech support StreamingTV Whether the customer has streaming TV StreamingMovies Whether the customer has streaming movies Contract Type of contract: Month-to-month, One year, Two year PaperlessBilling Whether the customer uses paperless billing PaymentMethod Payment method: (e.g., Electronic check, Mailed check, etc.) MonthlyCharges Monthly charges TotalCharges Total charges to date Churn Whether the customer has left the company (Yes/No)
🔍 Use Cases Binary classification: Predict customer churn
Data preprocessing and imputation exercises
Feature engineering and importance analysis
Customer segmentation and churn modeling
⚠️ Notes Missing values were intentionally inserted in the dataset to help simulate real-world conditions.
Some preprocessing may be required before modeling (e.g., converting categorical to numerical data, handling TotalCharges as numeric).
🏷️ Tags
🙏 Acknowledgements This dataset is based on the original Telco Customer Churn dataset (initially provided by IBM). The current version has been modified for academic and practical exercises.