100+ datasets found
  1. Bank Customer Churn Dataset

    • kaggle.com
    Updated Jul 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhuvi Ranga (2023). Bank Customer Churn Dataset [Dataset]. https://www.kaggle.com/datasets/bhuviranga/customer-churn-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 11, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bhuvi Ranga
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    The customer churn dataset is a collection of customer data that focuses on predicting customer churn, which refers to the tendency of customers to stop using a company's products or services. The dataset contains various features that describe each customer, such as their credit score, country, gender, age, tenure, balance, number of products, credit card status, active membership, estimated salary, and churn status. The churn status indicates whether a customer has churned or not. The dataset is used to analyze and understand factors that contribute to customer churn and to build predictive models to identify customers at risk of churning. The goal is to develop strategies and interventions to reduce churn and improve customer retention

  2. i

    Data from: Customer Churn Dataset

    • ieee-dataport.org
    Updated Jun 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Usman JOY (2024). Customer Churn Dataset [Dataset]. https://ieee-dataport.org/documents/customer-churn-dataset
    Explore at:
    Dataset updated
    Jun 4, 2024
    Authors
    Usman JOY
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    259

  3. m

    Customer Churn Analysis Software Market Size, Share & Trends Analysis 2033

    • marketresearchintellect.com
    Updated Jul 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Intellect (2025). Customer Churn Analysis Software Market Size, Share & Trends Analysis 2033 [Dataset]. https://www.marketresearchintellect.com/product/customer-churn-analysis-software-market/
    Explore at:
    Dataset updated
    Jul 7, 2025
    Dataset authored and provided by
    Market Research Intellect
    License

    https://www.marketresearchintellect.com/privacy-policyhttps://www.marketresearchintellect.com/privacy-policy

    Area covered
    Global
    Description

    Dive into Market Research Intellect's Customer Churn Analysis Software Market Report, valued at USD 2.1 billion in 2024, and forecast to reach USD 4.8 billion by 2033, growing at a CAGR of 10.2% from 2026 to 2033.

  4. c

    Data from: Customer Churn Dataset

    • cubig.ai
    Updated May 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Customer Churn Dataset [Dataset]. https://cubig.ai/store/products/256/customer-churn-dataset
    Explore at:
    Dataset updated
    May 25, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Customer Churn Dataset is a dataset that collects various customer characteristics and service usage information to predict whether or not communication service customers will turn.

    2) Data Utilization (1) Customer Churn Dataset has characteristics that: • The dataset consists of several categorical and numerical variables, including customer demographics, service types, contract information, charges, usage patterns, and Turn. (2) Customer Churn Dataset can be used to: • Development of customer churn prediction model : Machine learning and deep learning techniques can be used to develop classification models that predict churn based on customer characteristics and service use data. • Segmenting customers and developing marketing strategies : It can be used to analyze customer groups at high risk of departure and to design custom retention strategies or targeted marketing campaigns.

  5. Customer Churn - Decision Tree & Random Forest

    • kaggle.com
    Updated Jul 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vikram amin (2023). Customer Churn - Decision Tree & Random Forest [Dataset]. https://www.kaggle.com/datasets/vikramamin/customer-churn-decision-tree-and-random-forest
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 6, 2023
    Dataset provided by
    Kaggle
    Authors
    vikram amin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description
    • Main objective: Find out customers who will churn and who will not.
    • Methodology: It is a classification problem. We will use decision tree and random forest to predict the outcome.
    • Steps Involved
    • Read the data
    • Check for data types https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F1ffb600d8a4b4b36bc25e957524a3524%2FPicture1.png?generation=1688638600831386&alt=media" alt="">
    1. Change character vector to factor vector as this is as classification problem
    2. Drop the variable which is not significant for the analysis. We drop "customerID".
    3. Check for missing values. None are found.
    4. Split the data into train and test so we can use the train data for building the model and use test data for prediction. We split this into 80-20 ratio (train/test) using the sample function.
    5. Install and run libraries (rpart, rpart.plot, rattle, RColorBrewer, caret)
    6. Run decision tree using rpart function. The dependent variable is Churn and 19 other independent variables

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8d3442e6c82d8026c6a448e4780ab38c%2FPicture2.png?generation=1688638685268853&alt=media" alt=""> 9. Plot the decision tree

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F9ab0591e323dc30fe116c79f6d014d06%2FPicture3.png?generation=1688638747644320&alt=media" alt="">

    Average customer churn is 27%. The churn can take place if the tenure is more than >=7.5 and there is no internet service

    1. Tuning the model
    2. Define the search grid using the expand.grid function
    3. Set up the control parameters through 5 fold cross validation
    4. When we print the model we get the best CP = 0.01 and an accuracy of 79.00%

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F16080ac04d3743ec238227e1ef2c8269%2FPicture4.png?generation=1688639197455166&alt=media" alt="">

    1. Predict the model
    2. Find out the variables which are most and least significant. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F61beb4224e9351cfc772147c43800502%2FPicture5.png?generation=1688639468638950&alt=media" alt="">

    Significant variables are Internet Service, Tenure and the least significant are Streaming Movies, Tech Support.

    USE RANDOM FOREST

    1. Run library(randomForest). Here we are using the default ntree (500) and mtry (p/3) where p is the number of independent variables. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc27fe7e83f0b53b7e067371b69c7f4a7%2FPicture6.png?generation=1688640478682685&alt=media" alt="">

      Through confusion matrix, accuracy is coming 79.27%. The accuracy is marginally higher than that of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and much higher when predicting "Yes".

    2. Plot the model showing which variables reduce the gini impunity the most and least. Total charges and tenure reduce the gini impunity the most while phone service has the least impact.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fec25fc3ba74ab9cef1a81188209512b1%2FPicture7.png?generation=1688640726235724&alt=media" alt="">

    1. Predict the model and create a new data frame showing the actuals vs predicted values

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F50aa40e5dd676c8285020fd2fe627bf1%2FPicture8.png?generation=1688640896763066&alt=media" alt="">

    1. Plot the model so as to find out where the OOB (out of bag ) error stops decreasing or becoming constant. As we can see that the error stops decreasing between 100 to 200 trees. So we decide to take ntree = 200 when we tune the model.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F87211e1b218c595911fbe6ea2806e27a%2FPicture9.png?generation=1688641103367564&alt=media" alt="">

    Tune the model mtry=2 has the lowest OOB error rate

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F6057af5bb0719b16f1a97a58c3d4aa1d%2FPicture10.png?generation=1688641391027971&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc7045eba4ee298c58f1bd0230c24c00d%2FPicture11.png?generation=1688641605829830&alt=media" alt="">

    Use random forest with mtry = 2 and ntree = 200

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F01541eff1f9c6303591aa50dd707b5f5%2FPicture12.png?generation=1688641634979403&alt=media" alt="">

    Through confusion matrix, accuracy is coming 79.71%. The accuracy is marginally higher than that of default (when ntree was 500 and mtry was 4) i.e 79.27% and of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and m...

  6. Synthetic Customer Churn Prediction Dataset

    • opendatabay.com
    .undefined
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Opendatabay Labs (2025). Synthetic Customer Churn Prediction Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/5d7ef013-5848-4367-bf3b-2ce359587b43
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    May 6, 2025
    Dataset provided by
    Buy & Sell Data | Opendatabay - AI & Synthetic Data Marketplace
    Authors
    Opendatabay Labs
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Retail & Consumer Behavior
    Description

    This Synthetic Customer Churn Prediction Dataset has been designed as an educational resource for exploring data science, machine learning, and predictive modelling techniques in a customer retention context. The dataset simulates key attributes relevant to customer churn analysis, such as service usage, contract details, and customer demographics. It allows users to practice data manipulation, visualization, and the development of models to predict churn behaviour in industries like telecommunications, subscription services, or utilities.

    Dataset Features:

    • Customer_Id: Unique identifier for each customer (not included in this dataset for privacy).
    • Gender: Gender of the customer (e.g., "Male," "Female").
    • Partner: Whether the customer has a partner (e.g., "Yes," "No").
    • Dependents: Whether the customer has dependents (e.g., "Yes," "No").
    • Tenure (Months): The number of months the customer has been with the company.
    • PhoneService: Whether the customer has a phone service (e.g., "Yes," "No").
    • MultipleLines: Whether the customer has multiple phone lines (e.g., "Yes," "No phone service").
    • InternetService: Type of internet service (e.g., "DSL," "Fiber optic," "No").
    • OnlineSecurity: Whether the customer has online security services (e.g., "Yes," "No," "No internet service").
    • OnlineBackup: Whether the customer has online backup services (e.g., "Yes," "No," "No internet service").
    • DeviceProtection: Whether the customer has device protection services (e.g., "Yes," "No," "No internet service").
    • TechSupport: Whether the customer has tech support services (e.g., "Yes," "No," "No internet service").
    • StreamingTV: Whether the customer has streaming TV services (e.g., "Yes," "No," "No internet service").
    • StreamingMovies: Whether the customer has streaming movies services (e.g., "Yes," "No," "No internet service").
    • Contract: Type of contract the customer has (e.g., "Month-to-month," "One year," "Two year").
    • PaperlessBilling: Whether the customer uses paperless billing (e.g., "Yes," "No").
    • PaymentMethod: The payment method used by the customer (e.g., "Electronic check," "Credit card," "Bank transfer").
    • MonthlyCharges: Monthly charges billed to the customer.
    • TotalCharges: Total charges incurred by the customer over their tenure.
    • Churn: Whether the customer has churned (e.g., "Yes," "No").

    Distribution:

    https://storage.googleapis.com/opendatabay_public/images/churn_c4aae9d4-3939-4866-a249-35d81c5965dc.png" alt="Synthetic Customer Churn Prediction Dataset Distribution">

    Usage:

    This dataset is useful for a variety of applications, including:

    • Customer Behavior Analysis: To understand factors influencing customer retention and churn.
    • Educational Training: To practice data cleaning, feature engineering, and visualization techniques in customer analytics.
    • Predictive Modeling: To build machine learning models for predicting customer churn based on service usage patterns and demographic information.

    Coverage:

    This dataset is synthetic and anonymized, making it a safe tool for experimentation and learning without compromising real patient privacy.

    License:

    CCO (Public Domain)

    Who can use it:

    • Data scientists and enthusiasts: For developing customer analytics skills and predictive modelling expertise.
    • Business analysts: To understand customer churn drivers and improve retention strategies.
    • Educators and students: For teaching and learning applications in data science and machine learning.
  7. c

    Data from: Telco Customer Churn Dataset

    • cubig.ai
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Telco Customer Churn Dataset [Dataset]. https://cubig.ai/store/products/312/telco-customer-churn-dataset
    Explore at:
    Dataset updated
    May 28, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Telco Customer Churn Dataset includes carrier customer service usage, account information, demographics and churn, which can be used to predict and analyze customer churn.

    2) Data Utilization (1) Telco Customer Churn Dataset has characteristics that: • This dataset includes a variety of customer and service characteristics, including gender, age group, partner and dependents, service subscription status (telephone, Internet, security, backup, device protection, technical support, streaming, etc.), contract type, payment method, monthly fee, total fee, and departure. (2) Telco Customer Churn Dataset can be used to: • Development of customer churn prediction model: Using customer service usage patterns and account information, we can build a machine learning-based churn prediction model to proactively identify customers at risk of churn.

  8. Bank customer churn model prediction

    • kaggle.com
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    T S S ABHI RAM KOTIPALLI (2023). Bank customer churn model prediction [Dataset]. https://www.kaggle.com/datasets/tssabhiramkotipalli/bank-customer-churn-model-prediction/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    T S S ABHI RAM KOTIPALLI
    Description

    Dataset

    This dataset was created by T S S ABHI RAM KOTIPALLI

    Contents

  9. f

    Comparison of the proposed algorithms with other ensemble models.

    • plos.figshare.com
    xls
    Updated Jun 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari (2024). Comparison of the proposed algorithms with other ensemble models. [Dataset]. http://doi.org/10.1371/journal.pone.0303881.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 6, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of the proposed algorithms with other ensemble models.

  10. f

    Details of feature variables of the data set.

    • plos.figshare.com
    xls
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Details of feature variables of the data set. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.

  11. Bank customer churn model

    • kaggle.com
    Updated Dec 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vamshi (2020). Bank customer churn model [Dataset]. https://www.kaggle.com/datasets/rudravamshi/bank-customer-churn-model/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 30, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vamshi
    Description

    Dataset

    This dataset was created by Vamshi

    Contents

  12. f

    The number of correct predictions in each class.

    • plos.figshare.com
    xls
    Updated Jun 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari (2024). The number of correct predictions in each class. [Dataset]. http://doi.org/10.1371/journal.pone.0303881.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 6, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Customer churn prediction is vital for organizations to mitigate costs and foster growth. Ensemble learning models are commonly used for churn prediction. Diversity and prediction performance are two essential principles for constructing ensemble classifiers. Therefore, developing accurate ensemble learning models consisting of diverse base classifiers is a considerable challenge in this area. In this study, we propose two multi-objective evolutionary ensemble learning models based on clustering (MOEECs), which are include a novel diversity measure. Also, to overcome the data imbalance problem, another objective function is presented in the second model to evaluate ensemble performance. The proposed models in this paper are evaluated with a dataset collected from a mobile operator database. Our first model, MOEEC-1, achieves an accuracy of 97.30% and an AUC of 93.76%, outperforming classical classifiers and other ensemble models. Similarly, MOEEC-2 attains an accuracy of 96.35% and an AUC of 94.89%, showcasing its effectiveness in churn prediction. Furthermore, comparison with previous churn models reveals that MOEEC-1 and MOEEC-2 exhibit superior performance in accuracy, precision, and F-score. Overall, our proposed MOEECs demonstrate significant advancements in churn prediction accuracy and outperform existing models in terms of key performance metrics. These findings underscore the efficacy of our approach in addressing the challenges of customer churn prediction and its potential for practical application in organizational decision-making.

  13. Synthetic Telecom Customer Churn Data

    • kaggle.com
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulrahman Qaten (2025). Synthetic Telecom Customer Churn Data [Dataset]. https://www.kaggle.com/datasets/abdulrahmanqaten/synthetic-customer-churn/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 27, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abdulrahman Qaten
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    If you found the dataset useful, your upvote will help others discover it. Thanks for your support!

    This dataset simulates customer behavior for a fictional telecommunications company. It contains demographic information, account details, services subscribed to, and whether the customer ultimately churned (stopped using the service) or not. The data is synthetically generated but designed to reflect realistic patterns often found in telecom churn scenarios.

    Purpose:

    The primary goal of this dataset is to provide a clean and straightforward resource for beginners learning about:

    • Exploratory Data Analysis (EDA): Understanding customer characteristics and identifying potential drivers of churn through visualization and statistical summaries.
    • Data Preprocessing: Handling categorical features (like converting text to numbers) and scaling numerical features.
    • Classification Modeling: Building and evaluating simple machine learning models (like Logistic Regression or Decision Trees) to predict customer churn.

    Features:

    The dataset includes the following columns:

    • CustomerID: Unique identifier for each customer.
    • Age: Customer's age in years.
    • Gender: Customer's gender (Male/Female).
    • Location: General location of the customer (e.g., New York, Los Angeles).
    • SubscriptionDurationMonths: How many months the customer has been subscribed.
    • MonthlyCharges: The amount the customer is charged each month.
    • TotalCharges: The total amount the customer has been charged over their subscription period.
    • ContractType: The type of contract the customer has (Month-to-month, One year, Two year).
    • PaymentMethod: How the customer pays their bill (e.g., Electronic check, Credit card).
    • OnlineSecurity: Whether the customer has online security service (Yes, No, No internet service).
    • TechSupport: Whether the customer has tech support service (Yes, No, No internet service).
    • StreamingTV: Whether the customer has TV streaming service (Yes, No, No internet service).
    • StreamingMovies: Whether the customer has movie streaming service (Yes, No, No internet service).
    • Churn: (Target Variable) Whether the customer churned (1 = Yes, 0 = No).

    Data Quality:

    This dataset is intentionally clean with no missing values, making it easy for beginners to focus on analysis and modeling concepts without complex data cleaning steps.

    Inspiration:

    Understanding customer churn is crucial for many businesses. This dataset provides a sandbox environment to practice the fundamental techniques used in churn analysis and prediction.

  14. A

    ‘Customer Churn’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Mar 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2018). ‘Customer Churn’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-customer-churn-4f0b/a31eb722/?iid=005-065&v=presentation
    Explore at:
    Dataset updated
    Mar 5, 2018
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Customer Churn’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hassanamin/customer-churn on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    Binary Customer Churn

    A marketing agency has many customers that use their service to produce ads for the client/customer websites. They've noticed that they have quite a bit of churn in clients. They basically randomly assign account managers right now, but want you to create a machine learning model that will help predict which customers will churn (stop buying their service) so that they can correctly assign the customers most at risk to churn an account manager. Luckily they have some historical data, can you help them out? Create a classification algorithm that will help classify whether or not a customer churned. Then the company can test this against incoming data for future customers to predict which customers will churn and assign them an account manager.

    Content

    The data is saved as customer_churn.csv. Here are the fields and their definitions:

    Name : Name of the latest contact at Company

    Age: Customer Age

    Total_Purchase: Total Ads Purchased

    Account_Manager: Binary 0=No manager, 1= Account manager assigned

    Years: Totaly Years as a customer

    Num_sites: Number of websites that use the service.

    Onboard_date: Date that the name of the latest contact was onboarded

    Location: Client HQ Address

    Company: Name of Client Company

    Once you've created the model and evaluated it, test out the model on some new data (you can think of this almost like a hold-out set) that your client has provided, saved under new_customers.csv. The client wants to know which customers are most likely to churn given this data (they don't have the label yet).

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

    --- Original source retains full ownership of the source dataset ---

  15. t

    Telco_Customer_churn_Data

    • test.researchdata.tuwien.at
    bin, csv, png
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erum Naz; Erum Naz; Erum Naz; Erum Naz (2025). Telco_Customer_churn_Data [Dataset]. http://doi.org/10.82556/b0ch-cn44
    Explore at:
    png, csv, binAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    TU Wien
    Authors
    Erum Naz; Erum Naz; Erum Naz; Erum Naz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 28, 2025
    Description

    Context and Methodology

    The dataset originates from the research domain of Customer Churn Prediction in the Telecom Industry. It was created as part of the project "Data-Driven Churn Prediction: ML Solutions for the Telecom Industry," completed within the Data Stewardship course (Master programme Data Science, TU Wien).

    The primary purpose of this dataset is to support machine learning model development for predicting customer churn based on customer demographics, service usage, and account information.
    The dataset enables the training, testing, and evaluation of classification algorithms, allowing researchers and practitioners to explore techniques for customer retention optimization.

    The dataset was originally obtained from the IBM Accelerator Catalog and adapted for academic use. It was uploaded to TU Wien’s DBRepo test system and accessed via SQLAlchemy connections to the MariaDB environment.

    Technical Details

    The dataset has a tabular structure and was initially stored in CSV format. It contains:

    • Rows: 7,043 customer records

    • Columns: 21 features including customer attributes (gender, senior citizen status, partner status), account information (tenure, contract type, payment method), service usage (internet service, streaming TV, tech support), and the target variable (Churn: Yes/No).

    Naming Convention:

    • The table in the database is named telco_customer_churn_data.

    Software Requirements:

    • To open and work with the dataset, any standard database client or programming language supporting MariaDB connections can be used (e.g., Python etc).

    • For machine learning applications, libraries such as pandas, scikit-learn, and joblib are typically used.

    Additional Resources:

    Further Details

    When reusing the dataset, users should be aware:

    • Licensing: The dataset is shared under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

    • Use Case Suitability: The dataset is best suited for classification tasks, particularly binary classification (churn vs. no churn).

    • Metadata Standards: Metadata describing the dataset adheres to FAIR principles and is supplemented by CodeMeta and Croissant standards for improved interoperability.

  16. Bank Churn (test)

    • kaggle.com
    Updated Jan 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harshit Sharma (2024). Bank Churn (test) [Dataset]. https://www.kaggle.com/datasets/harshitstark/bank-churn-dataset-test
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 21, 2024
    Dataset provided by
    Kaggle
    Authors
    Harshit Sharma
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Explore the 'Bank Churn (test)' dataset, a comprehensive collection designed for evaluating predictive models and analyzing customer attrition in the banking sector. This test dataset, derived from real-world scenarios, offers a robust platform to assess the effectiveness of machine learning algorithms in predicting and understanding bank churn dynamics.

  17. Global Customer Churn Analysis Software Market Size By Component (Software,...

    • verifiedmarketresearch.com
    pdf,excel,csv,ppt
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verified Market Research (2025). Global Customer Churn Analysis Software Market Size By Component (Software, Services), By Deployment Mode (On-Premise, Cloud-Based), By Organization Size (Large Enterprises, Small And Medium Enterprises), By Application (Customer Retention, Customer Experience Management), By Industry Vertical (BFSI, Telecom), By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/customer-churn-analysis-software-market/
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset authored and provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2026 - 2032
    Area covered
    Global
    Description

    Customer Churn Analysis Software Market size was valued at USD 1.9 Billion in 2024 and is projected to reach USD 8.4 Billion by 2032, growing at a CAGR of 19.80% during the forecast period 2026-2032.Global Customer Churn Analysis Software Market DriversThe market drivers for the Customer Churn Analysis Software Market can be influenced by various factors. These may include:Customer Retention Methods: As obtaining new consumers is becoming more expensive, greater emphasis is placed on retaining existing ones. Churn analysis software is used to forecast and reduce turnover, resulting in increased customer lifetime value.An Increase in the Usage of Predictive Analytics and AI Technologies: To examine big data sets, churn prediction technologies now incorporate artificial intelligence and machine learning. Their application is allowing for more accurate churn forecasting and targeted actions.

  18. A

    ‘JB Link Telco Customer Churn’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘JB Link Telco Customer Churn’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-jb-link-telco-customer-churn-742f/5fbf9511/?iid=042-751&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘JB Link Telco Customer Churn’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/johnflag/jb-link-telco-customer-churn on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    This is a customized version of the widely known IBM Telco Customer Churn dataset. I've added a few more columns and modified others in order to make it a little more realistic.

    My customizations are based on the following version: Telco customer churn (11.1.3+)

    Below you may find a fictional business problem I created. You may use it in order to start developing something around this dataset.

    JB Link Customer Churn Problem

    JB Link is a small size telecom company located in the state of California that provides Phone and Internet services to customers on more than a 1,000 cities and 1,600 zip codes.

    The company is in the market for just 6 years and has quickly grown by investing on infrastructure to bring internet and phone networks to regions that had poor or no coverage.

    The company also has a very skilled sales team that is always performing well on attracting new customers. The number of new customers acquired in the past quarter represent 15% over the total.

    However, by the end of this same period, only 43% of this customers stayed with the company and most of them decided on not renewing their contracts after a few months, meaning the customer churn rate is very high and the company is now facing a big challenge on retaining its customers.

    The total customer churn rate last quarter was around 27%, resulting in a decrease of almost 12% in the total number of customers.

    The executive leadership of JB Link is aware that some competitors are investing on new technologies and on the expansion of their network coverage and they believe this is one of the main drivers of the high customer churn rate.

    Therefore, as an action plan, they have decided to created a task force inside the company that will be responsible to work on a customer retention strategy.

    The task force will involve members from different areas of the company, including Sales, Finance, Marketing, Customer Service, Tech Support and a recent formed Data Science team.

    The data science team will play a key role on this process and was assigned some very important tasks that will support on the decisions and actions the other teams will be taking : - Gather insights from the data to understand what is driving the high customer churn rate. - Develop a Machine Learning model that can accurately predict the customers that are more likely to churn. - Prescribe customized actions that could be taken in order to retain each of those customers.

    The Data Science team was given a dataset with a random sample of 7,043 customers that can help on achieving this task.

    The executives are aware that the cost of acquiring a new customer can be up to five times higher than the cost of retaining a customer, so they are expecting that the results of this project will save a lot of money to the company and make it start growing again.

    --- Original source retains full ownership of the source dataset ---

  19. f

    S1 Data -

    • plos.figshare.com
    zip
    Updated Oct 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang (2023). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0292466.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 11, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analyzing customers’ characteristics and giving the early warning of customer churn based on machine learning algorithms, can help enterprises provide targeted marketing strategies and personalized services, and save a lot of operating costs. Data cleaning, oversampling, data standardization and other preprocessing operations are done on 900,000 telecom customer personal characteristics and historical behavior data set based on Python language. Appropriate model parameters were selected to build BPNN (Back Propagation Neural Network). Random Forest (RF) and Adaboost, the two classic ensemble learning models were introduced, and the Adaboost dual-ensemble learning model with RF as the base learner was put forward. The four models and the other four classical machine learning models-decision tree, naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) were utilized respectively to analyze the customer churn data. The results show that the four models have better performance in terms of recall rate, precision rate, F1 score and other indicators, and the RF-Adaboost dual-ensemble model has the best performance. Among them, the recall rates of BPNN, RF, Adaboost and RF-Adaboost dual-ensemble model on positive samples are respectively 79%, 90%, 89%,93%, the precision rates are 97%, 99%, 98%, 99%, and the F1 scores are 87%, 95%, 94%, 96%. The RF-Adaboost dual-ensemble model has the best performance, and the three indicators are 10%, 1%, and 6% higher than the reference. The prediction results of customer churn provide strong data support for telecom companies to adopt appropriate retention strategies for pre-churn customers and reduce customer churn.

  20. Customer Churn Analysis

    • kaggle.com
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Maaz (2024). Customer Churn Analysis [Dataset]. https://www.kaggle.com/datasets/mohammadmaaz23/customer-churn-analysis/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 1, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Md Maaz
    Description

    Dataset

    This dataset was created by Md Maaz

    Released under Other (specified in description)

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bhuvi Ranga (2023). Bank Customer Churn Dataset [Dataset]. https://www.kaggle.com/datasets/bhuviranga/customer-churn-data
Organization logo

Bank Customer Churn Dataset

The customer churn dataset for churn prediction. Predictive Analysis

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bhuvi Ranga
License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

The customer churn dataset is a collection of customer data that focuses on predicting customer churn, which refers to the tendency of customers to stop using a company's products or services. The dataset contains various features that describe each customer, such as their credit score, country, gender, age, tenure, balance, number of products, credit card status, active membership, estimated salary, and churn status. The churn status indicates whether a customer has churned or not. The dataset is used to analyze and understand factors that contribute to customer churn and to build predictive models to identify customers at risk of churning. The goal is to develop strategies and interventions to reduce churn and improve customer retention

Search
Clear search
Close search
Google apps
Main menu