21 datasets found
  1. Data from: 📊 Telco Customer Churn Dataset

    • kaggle.com
    Updated Jul 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soulz (2025). 📊 Telco Customer Churn Dataset [Dataset]. https://www.kaggle.com/datasets/jethwaaatmik/telco-customer-churn-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 18, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Soulz
    Description

    📝 Dataset Description This dataset contains information about customers of a telecommunications company, including their demographic details, account information, service subscriptions, and churn status. It is a modified version of the popular Telco Churn dataset, curated for exploratory data analysis, machine learning model development, and churn prediction tasks.

    The dataset includes simulated missing values in some columns to reflect real-world data issues and support preprocessing and imputation tasks. This makes it especially useful for demonstrating data cleaning techniques and evaluating model robustness.

    📂 Files Included telco_data_modified.csv: The main dataset with 21 columns and 7043 rows (some missing values are intentionally inserted).

    📌 Features Column Name Description customerID Unique identifier for each customer gender Customer gender: Male/Female SeniorCitizen Indicates if the customer is a senior citizen (0 = No, 1 = Yes) Partner Whether the customer has a partner Dependents Whether the customer has dependents tenure Number of months the customer has stayed with the company PhoneService Whether the customer has phone service MultipleLines Whether the customer has multiple lines InternetService Customer's internet service provider (DSL, Fiber optic, No) OnlineSecurity Whether the customer has online security OnlineBackup Whether the customer has online backup DeviceProtection Whether the customer has device protection TechSupport Whether the customer has tech support StreamingTV Whether the customer has streaming TV StreamingMovies Whether the customer has streaming movies Contract Type of contract: Month-to-month, One year, Two year PaperlessBilling Whether the customer uses paperless billing PaymentMethod Payment method: (e.g., Electronic check, Mailed check, etc.) MonthlyCharges Monthly charges TotalCharges Total charges to date Churn Whether the customer has left the company (Yes/No)

    🔍 Use Cases Binary classification: Predict customer churn

    Data preprocessing and imputation exercises

    Feature engineering and importance analysis

    Customer segmentation and churn modeling

    ⚠️ Notes Missing values were intentionally inserted in the dataset to help simulate real-world conditions.

    Some preprocessing may be required before modeling (e.g., converting categorical to numerical data, handling TotalCharges as numeric).

    🏷️ Tags

    telecom #churn #classification #customer-analytics #data-cleaning #feature-engineering

    🙏 Acknowledgements This dataset is based on the original Telco Customer Churn dataset (initially provided by IBM). The current version has been modified for academic and practical exercises.

  2. Data from: Telco Customer Churn

    • kaggle.com
    Updated Feb 23, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BlastChar (2018). Telco Customer Churn [Dataset]. https://www.kaggle.com/datasets/blastchar/telco-customer-churn/code?datasetId=13996&sortBy=voteCount
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 23, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    BlastChar
    Description

    Context

    "Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]

    Content

    Each row represents a customer, each column contains customer’s attributes described on the column Metadata.

    The data set includes information about:

    • Customers who left within the last month – the column is called Churn
    • Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
    • Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges
    • Demographic info about customers – gender, age range, and if they have partners and dependents

    Inspiration

    To explore this type of models and learn more about the subject.

    New version from IBM: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113

  3. Data from: Telco Customer Churn

    • kaggle.com
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa OZ (2025). Telco Customer Churn [Dataset]. https://www.kaggle.com/datasets/mustafaoz158/telco-customer-churn/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 11, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mustafa OZ
    Description

    📌**Dataset Story** Telco churn data includes information about a fictitious telecom company that provided home phone and Internet services to 7,043 customers in California in the third quarter. It shows which customers left, stayed, or signed up for their service.

    🆔**CustomerId:** Customer Id 👫**Gender:** Gender 👵**SeniorCitizen:** Whether the customer is elderly (1, 0) 👫**Partner:** Whether the customer has a partner (Yes, No) 👨‍👨‍👧‍👧**Dependents:** Whether the customer has dependents (Yes, No) 📜**Tenure:** Number of months the customer has been with the company ☎️**PhoneService:** Whether the customer has phone service (Yes, No) 📞**MultipleLines:** Whether the customer has more than one line (Yes, No, No phone service) 💻**InternetService:** Whether the customer has internet service provider (DSL, Fiber optic, No) ㊙️**OnlineSecurity:** Whether the customer has online security (Yes, No, No internet service) ◀️**OnlineBackup:** Whether the customer has online backup (Yes, No, No internet service) 🚫**DeviceProtection:** Whether the customer has device protection (Yes, No, No internet service) 🧢**TechSupport:** Whether the customer has technical support (Yes, No, No internet service) 📺**StreamingTV**: Whether the customer has streaming TV (Yes, No, No Internet service) 📽️**StreamingMovies:** Whether the customer streams movies (Yes, No, No internet service) 🗞️**Contract:** Whether the customer's contract term (Month-to-month, One year, Two years) 📰**PaperlessBilling:** Whether the customer has paperless billing (Yes, No) 💳**PaymentMethod:** Whether the customer's payment method (Electronic check, Postal check, Wire transfer (automatic), Credit card (automatic)) 🤑**MonthlyCharges:** The amount charged to the customer monthly 💰**TotalCharges:** The total amount charged to the customer ❌**Churn:** Whether the customer uses (Yes or No)

  4. JB Link Telco Customer Churn

    • kaggle.com
    Updated Dec 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Bandeira (2021). JB Link Telco Customer Churn [Dataset]. https://www.kaggle.com/johnflag/jb-link-telco-customer-churn/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 8, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    João Bandeira
    Description

    This is a customized version of the widely known IBM Telco Customer Churn dataset. I've added a few more columns and modified others in order to make it a little more realistic.

    My customizations are based on the following version: Telco customer churn (11.1.3+)

    Below you may find a fictional business problem I created. You may use it in order to start developing something around this dataset.

    JB Link Customer Churn Problem

    JB Link is a small size telecom company located in the state of California that provides Phone and Internet services to customers on more than a 1,000 cities and 1,600 zip codes.

    The company is in the market for just 6 years and has quickly grown by investing on infrastructure to bring internet and phone networks to regions that had poor or no coverage.

    The company also has a very skilled sales team that is always performing well on attracting new customers. The number of new customers acquired in the past quarter represent 15% over the total.

    However, by the end of this same period, only 43% of this customers stayed with the company and most of them decided on not renewing their contracts after a few months, meaning the customer churn rate is very high and the company is now facing a big challenge on retaining its customers.

    The total customer churn rate last quarter was around 27%, resulting in a decrease of almost 12% in the total number of customers.

    The executive leadership of JB Link is aware that some competitors are investing on new technologies and on the expansion of their network coverage and they believe this is one of the main drivers of the high customer churn rate.

    Therefore, as an action plan, they have decided to created a task force inside the company that will be responsible to work on a customer retention strategy.

    The task force will involve members from different areas of the company, including Sales, Finance, Marketing, Customer Service, Tech Support and a recent formed Data Science team.

    The data science team will play a key role on this process and was assigned some very important tasks that will support on the decisions and actions the other teams will be taking : - Gather insights from the data to understand what is driving the high customer churn rate. - Develop a Machine Learning model that can accurately predict the customers that are more likely to churn. - Prescribe customized actions that could be taken in order to retain each of those customers.

    The Data Science team was given a dataset with a random sample of 7,043 customers that can help on achieving this task.

    The executives are aware that the cost of acquiring a new customer can be up to five times higher than the cost of retaining a customer, so they are expecting that the results of this project will save a lot of money to the company and make it start growing again.

  5. Data from: telco-customer-churn

    • kaggle.com
    Updated Feb 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wakibia (2022). telco-customer-churn [Dataset]. https://www.kaggle.com/datasets/machariawaks/telcocustomerchurn/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 25, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    wakibia
    Description

    Dataset

    This dataset was created by wakibia

    Contents

  6. Data from: Telco-Customer-Churn

    • kaggle.com
    Updated Mar 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Wilken Christoper (2024). Telco-Customer-Churn [Dataset]. https://www.kaggle.com/datasets/johnwilkenchristoper/telco-customer-churn/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    John Wilken Christoper
    Description

    Dataset

    This dataset was created by John Wilken Christoper

    Contents

  7. Data from: Telco-Customer-Churn

    • kaggle.com
    Updated May 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ranamalla Nithin Reddy (2023). Telco-Customer-Churn [Dataset]. https://www.kaggle.com/datasets/nithinreddy90/wa-fn-usec-telco-customer-churn/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 27, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ranamalla Nithin Reddy
    Description

    Strategies and Solutions" is a detailed exploration of the persistent challenge of customer churn within the telecommunications industry. This resource delves into the factors that drive customer attrition in telecom, including pricing, customer service, competition, and technological advancements, emphasizing the significance of customer retention.

    This comprehensive guide offers a wide array of strategies and innovative solutions for telecom companies to reduce churn rates and enhance customer satisfaction. Whether you are a telecom professional, business owner, or interested in the industry, this publication provides valuable insights and actionable recommendations to address this critical issue. "Understanding Telco Customer Churn" is an indispensable resource for those seeking to improve customer relationships and overall business success in the telecommunications sector.

  8. Telco Customer Churn Prediction notebook

    • kaggle.com
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramasubramanian M (2025). Telco Customer Churn Prediction notebook [Dataset]. https://www.kaggle.com/datasets/ramasub78/telco-customer-churn-prediction-notebook
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ramasubramanian M
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Ramasubramanian M

    Released under MIT

    Contents

  9. The Telco Churn.xls

    • kaggle.com
    Updated Jun 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vincent Were (2024). The Telco Churn.xls [Dataset]. https://www.kaggle.com/datasets/wereouma/the-telco-churn-xls
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 10, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vincent Were
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The Dataset

    About the Customer chun dataset

    To start out, you'll be working with real data from the Kenyan Telecommunication survey. This dataset "Churn.xls" is related to customer churn analysis for a telecommunications company. Customer churn refers to the phenomenon where customers stop doing business with a company. The dataset includes various attributes of customers and their usage patterns, which are typically used to predict whether a customer is likely to leave the service (churn) or stay. Here is a brief description of the variables provided in the dataset: 1.ID: A unique identifier for each customer. 2.COLLEGE: Indicates whether the customer has a college degree ("one" for yes, "zero" for no). 3.INCOME: The annual income of the customer. 4.OVERAGE: The number of overage minutes the customer used. 5.LEFTOVER: The number of leftover minutes the customer has. 6.HOUSE: The value of the customer's house. 7.HANDSET_PRICE: The price of the customer's handset. 8.OVER_15MINS_CALLS_PER_MONTH: The number of calls per month that exceed 15 minutes. 9.AVERAGE_CALL_DURATION: The average duration of calls made by the customer. 10.REPORTED_SATISFACTION: The customer's reported level of satisfaction with the service (e.g., "unsat", "very_sat"). 11.REPORTED_USAGE_LEVEL: The customer's reported usage level of the service (e.g., "little", "very_high"). 12.CONSIDERING_CHANGE_OF_PLAN: Indicates whether the customer is considering changing their plan (e.g., "no", "considering"). 13.LEAVE: The target variable indicating whether the customer decided to leave ("LEAVE") or stay ("STAY"). Customers who left within the last month – the column is called "LEAVE". Based on these variables, the dataset shall beused for predictive modeling to identify factors that influence customer churn and to develop strategies to retain customers. The variables cover demographic information, usage patterns, customer satisfaction, and the likelihood of changing plans, all of which are crucial in understanding and predicting churn behavior.

    Why Analysis? Customer churn refers to the phenomenon where customers discontinue their relationship or subscription with a company or service provider. It represents the rate at which customers stop using a company's products or services within a specific period. Churn is an important metric for businesses as it directly impacts revenue, growth, and customer retention. In the context of the Churn dataset, the churn label indicates whether a customer has churned or not. A churned customer is one who has decided to discontinue their subscription or usage of the company's services. On the other hand, a non-churned customer is one who continues to remain engaged and retains their relationship with the company. Understanding customer churn is crucial for businesses to identify patterns, factors, and indicators that contribute to customer attrition. By analyzing churn behavior and its associated features, companies can develop strategies to retain existing customers, improve customer satisfaction, and reduce customer turnover. Predictive modeling techniques can also be applied to forecast and proactively address potential churn, enabling companies to take proactive measures to retain at-risk customers.

  10. Real World Customer Churn Dataset

    • kaggle.com
    Updated Oct 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lasal Jayawardena (2023). Real World Customer Churn Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/6787676
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 24, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Lasal Jayawardena
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    World
    Description

    60,000+ Real Anonymized Customer Usage Data for Churn Prediction!

    Dataset Information

    • Dataset Name: Real World Customer Churn Dataset in Telco Domain
    • Snapshot Period: January 1, 2023, to March 31, 2023
    • Source: One of the Largest Telco Companies in Sri Lanka
    • Data Anonymization: The Dataset is Anonymized to Protect Customer Privacy.

    Overview

    The "Real World Customer Churn Dataset in Telco Domain" is a comprehensive collection of anonymized data that provides insights into customer behavior and churn prediction within the telecommunications industry.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F6361330%2F860271e0362e6c10503889f289201402%2FCustomer-churn.jpg?generation=1698182677600097&alt=media" alt="Dataset Image">

    Usage Categories

    The dataset contains data on over 60,000 customers across more than 10+ distinct usage categories. Some of the key usage categories include:

    • usage_app_youtube_daily: YouTube Traffic in MBs.
    • usage_app_facebook_daily: Facebook Traffic in MBs.
    • usage_app_tiktok_daily: TikTok Traffic in MBs.
    • usage_app_whatsapp_daily: WhatsApp Traffic in MBs.
    • usage_app_helakuru_daily: Helakuru App traffic in MBs.
    • usage_voice_o2o_outgoing: Outgoing call volume in minutes between the same operator.
    • usage_voice_o2op_outgoing: Outgoing call volume in minutes between operator and other operators.
    • usage_voice_o2o_incoming: Incoming call volume in minutes between the same operator.
    • usage_voice_op2o_incoming: Incoming call volume in minutes between other operator to operator.
    • usage_pack_data: Spend in LKR for data package purchasing.
    • usage_pack_vas: Spend in LKR for value-added service rentals or usage.

    Dataset Files

    The dataset consists of the following key files:

    1. main.csv: An aggregated dataset that compiles usage data from all usage categories, providing a holistic view of customer behavior.
    2. raw_dump folder: The raw data export, preserving the original source data for detailed exploration.
    3. test and train folders: These folders contain customer IDs and corresponding Churn Labels, facilitating model training and testing.
    4. usage_profiles folder: It comprises broken-down data frames for each customer under specific usage categories, allowing in-depth analysis of individual customer behavior within various usage categories.

    Potential Use Cases

    The "Real World Customer Churn Dataset in Telco Domain" offers a range of potential use cases, including:

    • Customer Churn Prediction: Leveraging customer usage patterns to predict and reduce churn.
    • Targeted Marketing: Designing customized marketing campaigns based on customer preferences.
    • Service Quality Enhancement: Identifying areas for service improvement, such as network quality.
    • Revenue Optimization: Maximizing revenue through the analysis of data package spending and value-added service usage.

    Dataset Importance

    This dataset's real-world aspect is of significant importance. It reflects actual customer interactions with a major telecommunications company in Sri Lanka, offering insights that can be directly applied to real-world scenarios. The dataset is sourced from one of the largest telco companies in the country, adding credibility and relevance to the insights it provides.

    Understanding customer churn and usage behavior is pivotal for the telecommunications industry, and this dataset empowers researchers, data scientists, and businesses to gain deeper insights into these aspects.

    Disclaimer

    The dataset is anonymized to protect customer privacy, and all data used is in compliance with privacy regulations and agreements. Users are encouraged to explore and contribute to the "Real World Customer Churn Dataset in Telco Domain."

    Thank you for your valuable contributions to this dataset.

  11. Telecom Churn Dataset

    • kaggle.com
    Updated Jul 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Baligh Mnassri (2019). Telecom Churn Dataset [Dataset]. https://www.kaggle.com/mnassrib/telecom-churn-datasets/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 5, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Baligh Mnassri
    Description

    Context

    "Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs."

    Content

    The Orange Telecom's Churn Dataset, which consists of cleaned customer activity data (features), along with a churn label specifying whether a customer canceled the subscription, will be used to develop predictive models. Two datasets are made available here: The churn-80 and churn-20 datasets can be downloaded.

    The two sets are from the same batch, but have been split by an 80/20 ratio. As more data is often desirable for developing ML models, let's use the larger set (that is, churn-80) for training and cross-validation purposes, and the smaller set (that is, churn-20) for final testing and model performance evaluation.

    Inspiration

    To explore this type of models and learn more about the subject.

  12. Telecom Churn Analysis Dataset

    • kaggle.com
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geet Mukherjee (2023). Telecom Churn Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/geetmukherjee/telecom-churn-analysis-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Geet Mukherjee
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The dataset is about telecom industry which tells about the number of customers who churned the service. It consists of 3333 observations having 21 variables. We have to predict which customer is going to churn the service.

    Account.Length: how long account has been active.

    VMail.Message: Number of voice mail messages send by the customer.

    Day.Mins: Time spent on day calls.

    Eve.Mins: Time spent on evening calls.

    Night.Mins: Time spent on night calls.

    Intl. Mins: Time spent on international calls.

    Day.Calls: Number of day calls by customers.

    Eve.Calls: Number of evening calls by customers.

    Intl.Calls: Number of international calls.

    Night.Calls: Number of night calls by customer.

    Day.Charge: Charges of Day Calls.

    Night.Charge: Charges of Night Calls.

    Eve.Charge: Charges of evening Calls.

    Intl.Charge: Charges of international calls.

    VMail.Plan: Voice mail plan taken by the customer or not.

    State: State in Area of study.

    Phone: Phone number of the customer.

    Area.Code: Area Code of customer.

    Int.l.Plan: Does customer have international plan or not.

    CustServ.Calls: Number of customer service calls by customer.

    Churn : Customers who churned the telecom service or who doesn’t(0=“Churner”, 1=“Non-Churner”)

  13. Data from: Telecom Customer Churn Dataset

    • kaggle.com
    Updated May 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amal Joseph3377 (2024). Telecom Customer Churn Dataset [Dataset]. https://www.kaggle.com/datasets/amaljoseph3377/telecom-customer-churn-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 5, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Amal Joseph3377
    Description

    Dataset

    This dataset was created by Amal Joseph3377

    Contents

  14. Synthetic Telecom Customer Churn Data

    • kaggle.com
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulrahman Qaten (2025). Synthetic Telecom Customer Churn Data [Dataset]. https://www.kaggle.com/datasets/abdulrahmanqaten/synthetic-customer-churn/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 27, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abdulrahman Qaten
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    If you found the dataset useful, your upvote will help others discover it. Thanks for your support!

    This dataset simulates customer behavior for a fictional telecommunications company. It contains demographic information, account details, services subscribed to, and whether the customer ultimately churned (stopped using the service) or not. The data is synthetically generated but designed to reflect realistic patterns often found in telecom churn scenarios.

    Purpose:

    The primary goal of this dataset is to provide a clean and straightforward resource for beginners learning about:

    • Exploratory Data Analysis (EDA): Understanding customer characteristics and identifying potential drivers of churn through visualization and statistical summaries.
    • Data Preprocessing: Handling categorical features (like converting text to numbers) and scaling numerical features.
    • Classification Modeling: Building and evaluating simple machine learning models (like Logistic Regression or Decision Trees) to predict customer churn.

    Features:

    The dataset includes the following columns:

    • CustomerID: Unique identifier for each customer.
    • Age: Customer's age in years.
    • Gender: Customer's gender (Male/Female).
    • Location: General location of the customer (e.g., New York, Los Angeles).
    • SubscriptionDurationMonths: How many months the customer has been subscribed.
    • MonthlyCharges: The amount the customer is charged each month.
    • TotalCharges: The total amount the customer has been charged over their subscription period.
    • ContractType: The type of contract the customer has (Month-to-month, One year, Two year).
    • PaymentMethod: How the customer pays their bill (e.g., Electronic check, Credit card).
    • OnlineSecurity: Whether the customer has online security service (Yes, No, No internet service).
    • TechSupport: Whether the customer has tech support service (Yes, No, No internet service).
    • StreamingTV: Whether the customer has TV streaming service (Yes, No, No internet service).
    • StreamingMovies: Whether the customer has movie streaming service (Yes, No, No internet service).
    • Churn: (Target Variable) Whether the customer churned (1 = Yes, 0 = No).

    Data Quality:

    This dataset is intentionally clean with no missing values, making it easy for beginners to focus on analysis and modeling concepts without complex data cleaning steps.

    Inspiration:

    Understanding customer churn is crucial for many businesses. This dataset provides a sandbox environment to practice the fundamental techniques used in churn analysis and prediction.

  15. Customer Churn - Decision Tree & Random Forest

    • kaggle.com
    Updated Jul 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vikram amin (2023). Customer Churn - Decision Tree & Random Forest [Dataset]. https://www.kaggle.com/datasets/vikramamin/customer-churn-decision-tree-and-random-forest
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 6, 2023
    Dataset provided by
    Kaggle
    Authors
    vikram amin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description
    • Main objective: Find out customers who will churn and who will not.
    • Methodology: It is a classification problem. We will use decision tree and random forest to predict the outcome.
    • Steps Involved
    • Read the data
    • Check for data types https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F1ffb600d8a4b4b36bc25e957524a3524%2FPicture1.png?generation=1688638600831386&alt=media" alt="">
    1. Change character vector to factor vector as this is as classification problem
    2. Drop the variable which is not significant for the analysis. We drop "customerID".
    3. Check for missing values. None are found.
    4. Split the data into train and test so we can use the train data for building the model and use test data for prediction. We split this into 80-20 ratio (train/test) using the sample function.
    5. Install and run libraries (rpart, rpart.plot, rattle, RColorBrewer, caret)
    6. Run decision tree using rpart function. The dependent variable is Churn and 19 other independent variables

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8d3442e6c82d8026c6a448e4780ab38c%2FPicture2.png?generation=1688638685268853&alt=media" alt=""> 9. Plot the decision tree

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F9ab0591e323dc30fe116c79f6d014d06%2FPicture3.png?generation=1688638747644320&alt=media" alt="">

    Average customer churn is 27%. The churn can take place if the tenure is more than >=7.5 and there is no internet service

    1. Tuning the model
    2. Define the search grid using the expand.grid function
    3. Set up the control parameters through 5 fold cross validation
    4. When we print the model we get the best CP = 0.01 and an accuracy of 79.00%

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F16080ac04d3743ec238227e1ef2c8269%2FPicture4.png?generation=1688639197455166&alt=media" alt="">

    1. Predict the model
    2. Find out the variables which are most and least significant. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F61beb4224e9351cfc772147c43800502%2FPicture5.png?generation=1688639468638950&alt=media" alt="">

    Significant variables are Internet Service, Tenure and the least significant are Streaming Movies, Tech Support.

    USE RANDOM FOREST

    1. Run library(randomForest). Here we are using the default ntree (500) and mtry (p/3) where p is the number of independent variables. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc27fe7e83f0b53b7e067371b69c7f4a7%2FPicture6.png?generation=1688640478682685&alt=media" alt="">

      Through confusion matrix, accuracy is coming 79.27%. The accuracy is marginally higher than that of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and much higher when predicting "Yes".

    2. Plot the model showing which variables reduce the gini impunity the most and least. Total charges and tenure reduce the gini impunity the most while phone service has the least impact.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fec25fc3ba74ab9cef1a81188209512b1%2FPicture7.png?generation=1688640726235724&alt=media" alt="">

    1. Predict the model and create a new data frame showing the actuals vs predicted values

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F50aa40e5dd676c8285020fd2fe627bf1%2FPicture8.png?generation=1688640896763066&alt=media" alt="">

    1. Plot the model so as to find out where the OOB (out of bag ) error stops decreasing or becoming constant. As we can see that the error stops decreasing between 100 to 200 trees. So we decide to take ntree = 200 when we tune the model.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F87211e1b218c595911fbe6ea2806e27a%2FPicture9.png?generation=1688641103367564&alt=media" alt="">

    Tune the model mtry=2 has the lowest OOB error rate

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F6057af5bb0719b16f1a97a58c3d4aa1d%2FPicture10.png?generation=1688641391027971&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc7045eba4ee298c58f1bd0230c24c00d%2FPicture11.png?generation=1688641605829830&alt=media" alt="">

    Use random forest with mtry = 2 and ntree = 200

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F01541eff1f9c6303591aa50dd707b5f5%2FPicture12.png?generation=1688641634979403&alt=media" alt="">

    Through confusion matrix, accuracy is coming 79.71%. The accuracy is marginally higher than that of default (when ntree was 500 and mtry was 4) i.e 79.27% and of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and m...

  16. Telecom Customer Churn Prediction

    • kaggle.com
    Updated Apr 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiyamaladevi R S (2024). Telecom Customer Churn Prediction [Dataset]. https://www.kaggle.com/shiyamaladevirs/telecom-customer-churn-prediction/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shiyamaladevi R S
    Description

    Dataset

    This dataset was created by Shiyamaladevi R S

    Contents

  17. Customer Churn (Telecom company) Dataset

    • kaggle.com
    Updated Jul 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anjolaoluwa Ajayi (2023). Customer Churn (Telecom company) Dataset [Dataset]. https://www.kaggle.com/datasets/anjolaoluwaajayi/customer-churn-telecom-company-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 7, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anjolaoluwa Ajayi
    Description

    Dataset

    This dataset was created by Anjolaoluwa Ajayi

    Contents

  18. Telecom Customers

    • kaggle.com
    Updated Sep 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarek Muhammed Abdel-Hamid (2024). Telecom Customers [Dataset]. https://www.kaggle.com/datasets/tarekmuhammed/telecom-customers/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 3, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tarek Muhammed Abdel-Hamid
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains detailed information about customers of a telecom company, including demographics, service subscriptions, billing information, and churn status. It captures key aspects such as whether a customer has phone or internet services, their tenure with the company, usage of additional services like online security, and their chosen payment methods. The dataset is particularly useful for analyzing patterns and factors that contribute to customer churn, helping the company understand and potentially mitigate reasons for customer departure.

  19. DQLab Telco Final

    • kaggle.com
    Updated Mar 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuel Robert Ardi Nugraha (2025). DQLab Telco Final [Dataset]. https://www.kaggle.com/datasets/samran98/customer-churn-telco-final
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 9, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Samuel Robert Ardi Nugraha
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    PLEASE UPVOTE THIS DATASET IF THIS HELP YOU... GLAD TO ANY FORKS HERE

    BACKGROUND DQLab Telco is a telecommunications company with numerous locations all over the world. In order to ensure that customers are not left behind, DQLab Telco has consistently paid attention to the customer experience since its establishment in 2019.

    Even though DQLab Telco is only a little over a year old, many of its customers have already changed their subscriptions to rival companies. By using machine learning, management hopes to lower the number of customers who leave.

    After cleaning the data yesterday, it is now time for us to build the best model to forecast customer churn.

    TASKS & STEPS Yesterday, we completed "Cleansing Data" as part of project part 1. You are now expected to develop the appropriate model as a data scientist.

    You will perform "Machine Learning Modeling" in this assignment using data from the previous month, specifically June 2020.

    The actions that must be taken are, 1. Analyze exploratory data first. 2. Carry out pre-processing of the data. 3. Using modeling from machine learning. 4. Picking the Ideal Model.

  20. Customer_Churn_for_Telecom_Company

    • kaggle.com
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yoseph Endale (2025). Customer_Churn_for_Telecom_Company [Dataset]. https://www.kaggle.com/datasets/yosephendale/customer-churn-for-telecom-company
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 13, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yoseph Endale
    Description

    Dataset

    This dataset was created by Yoseph Endale

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Soulz (2025). 📊 Telco Customer Churn Dataset [Dataset]. https://www.kaggle.com/datasets/jethwaaatmik/telco-customer-churn-dataset/discussion
Organization logo

Data from: 📊 Telco Customer Churn Dataset

Useful for both Classification and Regression.

Related Article
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 18, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Soulz
Description

📝 Dataset Description This dataset contains information about customers of a telecommunications company, including their demographic details, account information, service subscriptions, and churn status. It is a modified version of the popular Telco Churn dataset, curated for exploratory data analysis, machine learning model development, and churn prediction tasks.

The dataset includes simulated missing values in some columns to reflect real-world data issues and support preprocessing and imputation tasks. This makes it especially useful for demonstrating data cleaning techniques and evaluating model robustness.

📂 Files Included telco_data_modified.csv: The main dataset with 21 columns and 7043 rows (some missing values are intentionally inserted).

📌 Features Column Name Description customerID Unique identifier for each customer gender Customer gender: Male/Female SeniorCitizen Indicates if the customer is a senior citizen (0 = No, 1 = Yes) Partner Whether the customer has a partner Dependents Whether the customer has dependents tenure Number of months the customer has stayed with the company PhoneService Whether the customer has phone service MultipleLines Whether the customer has multiple lines InternetService Customer's internet service provider (DSL, Fiber optic, No) OnlineSecurity Whether the customer has online security OnlineBackup Whether the customer has online backup DeviceProtection Whether the customer has device protection TechSupport Whether the customer has tech support StreamingTV Whether the customer has streaming TV StreamingMovies Whether the customer has streaming movies Contract Type of contract: Month-to-month, One year, Two year PaperlessBilling Whether the customer uses paperless billing PaymentMethod Payment method: (e.g., Electronic check, Mailed check, etc.) MonthlyCharges Monthly charges TotalCharges Total charges to date Churn Whether the customer has left the company (Yes/No)

🔍 Use Cases Binary classification: Predict customer churn

Data preprocessing and imputation exercises

Feature engineering and importance analysis

Customer segmentation and churn modeling

⚠️ Notes Missing values were intentionally inserted in the dataset to help simulate real-world conditions.

Some preprocessing may be required before modeling (e.g., converting categorical to numerical data, handling TotalCharges as numeric).

🏷️ Tags

telecom #churn #classification #customer-analytics #data-cleaning #feature-engineering

🙏 Acknowledgements This dataset is based on the original Telco Customer Churn dataset (initially provided by IBM). The current version has been modified for academic and practical exercises.

Search
Clear search
Close search
Google apps
Main menu