54 datasets found
  1. Banking Customer Churn Prediction Dataset

    • kaggle.com
    zip
    Updated May 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurabh Badole (2024). Banking Customer Churn Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/saurabhbadole/bank-customer-churn-prediction-dataset
    Explore at:
    zip(267794 bytes)Available download formats
    Dataset updated
    May 16, 2024
    Authors
    Saurabh Badole
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Description:

    This dataset contains information about bank customers and their churn status, which indicates whether they have exited the bank or not. It is suitable for exploring and analyzing factors influencing customer churn in banking institutions and for building predictive models to identify customers at risk of churning.

    Features:

    RowNumber: The sequential number assigned to each row in the dataset.

    CustomerId: A unique identifier for each customer.

    Surname: The surname of the customer.

    CreditScore: The credit score of the customer.

    Geography: The geographical location of the customer (e.g., country or region).

    Gender: The gender of the customer.

    Age: The age of the customer.

    Tenure: The number of years the customer has been with the bank.

    Balance: The account balance of the customer.

    NumOfProducts: The number of bank products the customer has.

    HasCrCard: Indicates whether the customer has a credit card (binary: yes/no).

    IsActiveMember: Indicates whether the customer is an active member (binary: yes/no).

    EstimatedSalary: The estimated salary of the customer.

    Exited: Indicates whether the customer has exited the bank (binary: yes/no).

    Usage:

    • This dataset can be used for exploratory data analysis to understand the factors influencing customer churn in banks.
    • It can also be used to build machine learning models for predicting customer churn based on the given features.

    License:

    This dataset is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

  2. Bank Customer Churn Data

    • kaggle.com
    zip
    Updated Nov 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Penta Krishna Kishore (2023). Bank Customer Churn Data [Dataset]. https://www.kaggle.com/datasets/pentakrishnakishore/bank-customer-churn-data
    Explore at:
    zip(3163011 bytes)Available download formats
    Dataset updated
    Nov 3, 2023
    Authors
    Penta Krishna Kishore
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    the churn prediction dataset, which contains raw data of 28,382 customers. The dataset includes the following columns:

    • customer_id: Unique identifier for each customer.
    • vintage: The duration of the customer's relationship with the company.
    • age: Age of the customer.
    • gender: Gender of the customer.
    • dependents: Number of dependents the customer has.
    • occupation: The occupation of the customer.
    • city: City in which the customer is located.
    • customer_nw_category: Net worth category of the customer.
    • branch_code: Code identifying the branch associated with the customer.
    • current_balance: Current balance in the customer's account.
    • previous_month_end_balance: Account balance at the end of the previous month.
    • average_monthly_balance_prevQ: Average monthly balance in the previous quarter.
    • average_monthly_balance_prevQ2: Average monthly balance in the second previous quarter.
    • current_month_credit: Credit amount in the current month.
    • previous_month_credit: Credit amount in the previous month.
    • current_month_debit: Debit amount in the current month.
    • previous_month_debit: Debit amount in the previous month.
    • current_month_balance: Account balance in the current month.
    • previous_month_balance: Account balance in the previous month.
    • churn: The target variable indicating whether the customer has churned (1 for churned, 0 for not churned).
    • last_transaction: Timestamp of the customer's last transaction. This dataset provides a comprehensive view of various attributes related to the customers' banking activities. With these features, it becomes possible to build predictive models to identify potential churners based on historical and current customer behavior. The dataset's size allows for robust analysis and modeling to improve customer retention strategies.
  3. Data from: Bank Customer Churn Prediction

    • kaggle.com
    zip
    Updated Mar 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murilo Zangari (2024). Bank Customer Churn Prediction [Dataset]. https://www.kaggle.com/datasets/murilozangari/customer-churn-from-a-bank
    Explore at:
    zip(267794 bytes)Available download formats
    Dataset updated
    Mar 21, 2024
    Authors
    Murilo Zangari
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The data will be used to predict whether a customer of the bank will churn. If a customer churns, it means they left the bank and took their business elsewhere. If you can predict which customers are likely to churn, you can take measures to retain them before they do. These measures could be promotions, discounts, or other incentives to boost customer satisfaction and, therefore, retention.

    The dataset contains:

    10,000 rows – each row is a unique customer of the bank

    14 columns:

    RowNumber: Row numbers from 1 to 10,000

    CustomerId: Customer’s unique ID assigned by bank

    Surname: Customer’s last name

    CreditScore: Customer’s credit score. This number can range from 300 to 850.

    Geography: Customer’s country of residence

    Gender: Categorical indicator

    Age: Customer’s age (years)

    Tenure: Number of years customer has been with bank

    Balance: Customer’s bank balance (Euros)

    NumOfProducts: Number of products the customer has with the bank

    HasCrCard: Indicates whether the customer has a credit card with the bank

    IsActiveMember: Indicates whether the customer is considered active

    EstimatedSalary: Customer’s estimated annual salary (Euros)

    Exited: Indicates whether the customer churned (left the bank)

  4. Bank Customer Churn

    • kaggle.com
    zip
    Updated Aug 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandile Desmond Mfazi (2024). Bank Customer Churn [Dataset]. https://www.kaggle.com/datasets/sandiledesmondmfazi/bank-customer-churn
    Explore at:
    zip(12679114 bytes)Available download formats
    Dataset updated
    Aug 8, 2024
    Authors
    Sandile Desmond Mfazi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Botswana Bank Customer Churn Dataset

    Dataset Overview

    This synthetic dataset simulates customer data for a fictional bank in Botswana, specifically designed to model customer churn behavior. It includes a comprehensive set of customer demographics, financial data, product usage, and behavioral indicators that could influence whether a customer decides to leave the bank. The dataset is generated using the Python Faker library, ensuring realistic but entirely fictional data points for educational, testing, and modeling purposes.

    Dataset Highlights

    Number of Records: 115,640 customers Churn Rate: Determined by a calculated churn risk score based on several customer attributes Geographical Focus: Botswana Data Structure: The dataset is organized in a tabular format, with each row representing a unique customer

    Use Cases

    This dataset is ideal for the following applications:

    Churn Prediction Modeling: Building and evaluating machine learning models to predict customer churn. Customer Segmentation: Analyzing customer profiles and segmenting them based on various demographics and financial attributes. Product Analysis: Understanding which products are most associated with customer retention or churn. Educational Purposes: Teaching data science and machine learning concepts using a realistic dataset.

  5. Data from: Bank Customer Churn Prediction

    • kaggle.com
    zip
    Updated Jan 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aarushi Kamboj (2024). Bank Customer Churn Prediction [Dataset]. https://www.kaggle.com/aarushikamboj/bank-customer-churn-prediction
    Explore at:
    zip(267815 bytes)Available download formats
    Dataset updated
    Jan 17, 2024
    Authors
    Aarushi Kamboj
    Description

    Dataset

    This dataset was created by Aarushi Kamboj

    Contents

  6. Bank Customer Attrition Insights

    • kaggle.com
    zip
    Updated Jan 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sagar Maru (2025). Bank Customer Attrition Insights [Dataset]. https://www.kaggle.com/datasets/marusagar/bank-customer-attrition-insights
    Explore at:
    zip(314647 bytes)Available download formats
    Dataset updated
    Jan 9, 2025
    Authors
    Sagar Maru
    Description

    Dataset Overview for XYZ Multistate Bank:

    This dataset is for XYZ Multistate Bank and contains various columns that capture key aspects of customer behavior and attributes. Each column provides valuable insights into the factors influencing customer churn, with the goal of predicting which customers are most likely to leave the bank. Below is an explanation of each column and its relevance to customer retention.

    1. RowNumber:
    The "RowNumber" column corresponds to the unique record number for each customer entry. It has no impact on the outcome of customer churn but is used to identify and organize data within the dataset. Since it doesn't contain any meaningful information related to customer behavior, it is not relevant for churn prediction and can be excluded in analysis.

    2. CustomerId:
    The "CustomerId" column consists of randomly generated identifiers for each customer. While this ID helps to uniquely distinguish each customer, it has no impact on the likelihood of a customer leaving the bank. As a categorical feature, it does not contribute to the analysis of churn and can be omitted when building predictive models.

    3. Surname:
    The "Surname" column holds the last names of customers. Although this information is useful for identification purposes, it does not have a direct relationship with customer churn. Since a customer's surname is not an influencing factor in their decision to stay or leave the bank, it is not considered relevant for churn prediction and can be disregarded.

    4. CreditScore:
    "CreditScore" is an important variable that can significantly affect customer churn. Customers with higher credit scores are generally considered more financially stable and less likely to leave the bank, as they are less likely to face issues with financial institutions. Therefore, this feature can provide valuable insights into customer retention and should be included in churn analysis.

    5. Geography:
    "Geography" refers to the geographical location of the customer, which can influence their likelihood of leaving the bank. Customers living in different regions may have varying experiences with the bank’s services, fees, or offerings, making this an important factor to explore. Understanding regional differences helps tailor retention strategies for specific locations and improve overall customer satisfaction.

    6. Gender:
    "Gender" is an interesting demographic factor to consider in churn prediction. While gender itself may not directly affect the likelihood of a customer leaving, it could correlate with other behavioral patterns or preferences that influence retention. Analyzing gender in combination with other features may reveal potential insights, making it worthwhile to examine as part of the churn model.

    7. Age:
    The "Age" column is a key factor in understanding customer behavior. Typically, older customers are less likely to churn because they tend to be more established with their financial institutions and may have a greater sense of loyalty. In contrast, younger customers may be more likely to switch banks, especially if they are seeking better services or offers. This feature is essential for predicting churn and should be analyzed in detail.

    8. Tenure:
    "Tenure" refers to the number of years a customer has been with the bank. Longer-tenured customers are often more loyal and less likely to leave the bank. The correlation between tenure and churn is strong, as established relationships tend to make customers less susceptible to leaving. This is a critical factor for churn prediction and should be given high consideration when modeling customer retention.

    9. Balance:
    The "Balance" column reflects the amount of money a customer holds in their bank account. Customers with higher balances are typically more invested in the bank and are less likely to leave. In contrast, customers with low balances may be more willing to switch to other financial institutions offering better rates or services. This feature plays a significant role in churn prediction, as financial stakes are directly tied to loyalty.

    10. NumOfProducts:
    "NumOfProducts" refers to the number of products (e.g., savings accounts, loans, credit cards) that a customer has with the bank. Customers with multiple products are usually more invested in the bank, making them less likely to leave. The greater the number of products, the higher the customer's commitment to the bank, making this feature highly relevant in understanding churn patterns and developing retention strategies.

    11. HasCrCard:
    "HasCrCard" indicates whether or not a customer holds a credit card with the bank. Having a credit card typically reduces the likelihood of customer churn, as credit cards are a widely used financial product that locks customers into a long-term relatio...

  7. h

    Banking_Customer_Churn_Prediction_Dataset

    • huggingface.co
    Updated Nov 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ethan Gabis (2025). Banking_Customer_Churn_Prediction_Dataset [Dataset]. https://huggingface.co/datasets/EthanGabis/Banking_Customer_Churn_Prediction_Dataset
    Explore at:
    Dataset updated
    Nov 20, 2025
    Authors
    Ethan Gabis
    Description

    Assignment #1: EDA & Dataset - Bank Customer Churn Analysis

      Overview
    

    This project presents an exploratory data analysis (EDA) of a bank customer churn dataset. The analysis aims to uncover key factors influencing customer churn and provide actionable insights for customer retention strategies.

      Dataset Information
    

    Source: Bank Customer Churn Dataset Size: 10,000 records Features: Multiple demographic and behavioral attributes including age, credit score… See the full description on the dataset page: https://huggingface.co/datasets/EthanGabis/Banking_Customer_Churn_Prediction_Dataset.

  8. h

    churn-prediction

    • huggingface.co
    Updated Apr 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    scikit-learn (2023). churn-prediction [Dataset]. https://huggingface.co/datasets/scikit-learn/churn-prediction
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 1, 2023
    Dataset authored and provided by
    scikit-learn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Customer churn prediction dataset of a fictional telecommunication company made by IBM Sample Datasets. Context Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs. Content Each row represents a customer, each column contains customer’s attributes described on the column metadata. The data set includes information about:

    Customers who left within the last month: the column is called Churn Services that each customer… See the full description on the dataset page: https://huggingface.co/datasets/scikit-learn/churn-prediction.

  9. Comparison results of different model.

    • plos.figshare.com
    xls
    Updated Dec 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Comparison results of different model. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.

  10. Results of genetic algorithm tuning parameters.

    • plos.figshare.com
    xls
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Results of genetic algorithm tuning parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.

  11. G

    Bank Customer Segmentation

    • gomask.ai
    csv, json
    Updated Nov 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GoMask.ai (2025). Bank Customer Segmentation [Dataset]. https://gomask.ai/marketplace/datasets/bank-customer-segmentation
    Explore at:
    json, csv(10 MB)Available download formats
    Dataset updated
    Nov 3, 2025
    Dataset provided by
    GoMask.ai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2024 - 2025
    Area covered
    Global
    Variables measured
    email, gender, is_active, last_name, first_name, customer_id, account_type, address_city, num_accounts, phone_number, and 15 more
    Description

    This dataset provides comprehensive behavioral and transaction profiles for bank customers, including demographics, account activity, channel preferences, and churn risk scores. It is designed for advanced customer segmentation, targeted marketing, and predictive analytics to drive retention and personalized banking strategies.

  12. Chi-square test for selected features.

    • figshare.com
    xls
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Chi-square test for selected features. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.

  13. Bank Turnover Dataset

    • kaggle.com
    zip
    Updated Mar 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarun Sunkaraneni (2018). Bank Turnover Dataset [Dataset]. https://www.kaggle.com/datasets/barelydedicated/bank-customer-churn-modeling
    Explore at:
    zip(267794 bytes)Available download formats
    Dataset updated
    Mar 20, 2018
    Authors
    Tarun Sunkaraneni
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Tarun Sunkaraneni

    Released under CC0: Public Domain

    Contents

  14. Confusion matrix.

    • plos.figshare.com
    xls
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Confusion matrix. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.

  15. The summary of the literature review.

    • plos.figshare.com
    xls
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). The summary of the literature review. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.

  16. Bank Customer Churn Out Prediction

    • kaggle.com
    zip
    Updated Oct 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kishor Pansare (2023). Bank Customer Churn Out Prediction [Dataset]. https://www.kaggle.com/datasets/kishorbpansare/bank-customer-churn-out-prediction
    Explore at:
    zip(142183 bytes)Available download formats
    Dataset updated
    Oct 28, 2023
    Authors
    Kishor Pansare
    Description

    Dataset

    This dataset was created by Kishor Pansare

    Contents

  17. Comparison of GA-XGBoost with XGBoost and LightGBM test results.

    • figshare.com
    xls
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Comparison of GA-XGBoost with XGBoost and LightGBM test results. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of GA-XGBoost with XGBoost and LightGBM test results.

  18. D

    Bank Customer Experience Platform Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Bank Customer Experience Platform Market Research Report 2033 [Dataset]. https://dataintelo.com/report/bank-customer-experience-platform-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Bank Customer Experience Platform Market Outlook



    As per our latest research, the global bank customer experience platform market size in 2024 stands at USD 10.7 billion, reflecting robust adoption across banking institutions worldwide. The market is forecasted to grow at a compound annual growth rate (CAGR) of 14.2% from 2025 to 2033, reaching a projected value of USD 32.5 billion by 2033. This growth is primarily driven by the rising demand for personalized banking, digital transformation initiatives, and the integration of advanced technologies such as artificial intelligence and machine learning within banking platforms.




    One of the primary growth factors for the bank customer experience platform market is the increasing emphasis on customer-centric banking. Banks are increasingly recognizing the importance of delivering seamless, omnichannel experiences to retain and attract customers in an era where digital convenience is paramount. The proliferation of smartphones and digital channels has raised customer expectations for instant, consistent, and personalized interactions. As a result, financial institutions are investing heavily in platforms that unify customer data, streamline workflows, and enable real-time engagement. These platforms not only enhance satisfaction but also drive cross-selling and up-selling opportunities, directly contributing to revenue growth.




    Another significant driver is the rapid advancement in technology, particularly in artificial intelligence (AI), machine learning (ML), and data analytics. These technologies empower banks to analyze vast volumes of customer data, extract actionable insights, and deliver hyper-personalized experiences. AI-driven chatbots and virtual assistants are revolutionizing customer support, reducing response times, and improving query resolution rates. Furthermore, predictive analytics enable banks to anticipate customer needs, proactively offer relevant products, and mitigate churn. The integration of these intelligent solutions within bank customer experience platforms is accelerating digital transformation and setting new standards for customer engagement in the banking sector.




    Regulatory compliance and data security concerns are also fueling the adoption of advanced customer experience platforms. With stringent regulations such as GDPR, PSD2, and various local data protection laws, banks must ensure secure handling of sensitive customer information. Modern experience platforms offer robust security features, audit trails, and compliance management capabilities, enabling banks to meet regulatory requirements while delivering superior customer service. Additionally, these platforms facilitate smoother onboarding processes, faster loan approvals, and more transparent account management, all of which are critical for enhancing customer trust and loyalty in a competitive market.




    From a regional perspective, North America continues to lead the market, accounting for the largest share in 2024, driven by early digital adoption, a mature banking sector, and significant investments in fintech innovation. Europe follows closely, propelled by regulatory mandates and a strong focus on customer privacy. Meanwhile, Asia Pacific is emerging as the fastest-growing region, owing to rapid urbanization, increasing smartphone penetration, and a burgeoning middle class demanding modern banking experiences. Latin America and the Middle East & Africa are also witnessing steady growth, supported by digital infrastructure improvements and rising financial inclusion initiatives. This regional diversity highlights the global momentum behind the bank customer experience platform market.



    Component Analysis



    The bank customer experience platform market by component is segmented into software and services. The software segment dominates the market, accounting for the majority of revenue in 2024, as banks prioritize scalable, modular, and customizable solutions to enhance customer engagement. These software platforms typically encompass customer relationship management (CRM), analytics, workflow automation, and omnichannel communication tools. The growing need for integrated systems that unify customer data, streamline operations, and enable real-time insights is driving the adoption of software solutions. Banks are seeking platforms that not only support day-to-day operations but also facilitate digital transformation and innovation in customer experience.
    </p&g

  19. Performance comparison of different adoption algorithms in XGBoost model.

    • plos.figshare.com
    xls
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Performance comparison of different adoption algorithms in XGBoost model. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance comparison of different adoption algorithms in XGBoost model.

  20. Telco Customer Churn

    • kaggle.com
    zip
    Updated Feb 23, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BlastChar (2018). Telco Customer Churn [Dataset]. https://www.kaggle.com/datasets/blastchar/telco-customer-churn
    Explore at:
    zip(175758 bytes)Available download formats
    Dataset updated
    Feb 23, 2018
    Authors
    BlastChar
    Description

    Context

    "Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]

    Content

    Each row represents a customer, each column contains customer’s attributes described on the column Metadata.

    The data set includes information about:

    • Customers who left within the last month – the column is called Churn
    • Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
    • Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges
    • Demographic info about customers – gender, age range, and if they have partners and dependents

    Inspiration

    To explore this type of models and learn more about the subject.

    New version from IBM: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Saurabh Badole (2024). Banking Customer Churn Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/saurabhbadole/bank-customer-churn-prediction-dataset
Organization logo

Banking Customer Churn Prediction Dataset

Understanding Customer Behavior and Predicting Churn in Banking Institutions

Explore at:
26 scholarly articles cite this dataset (View in Google Scholar)
zip(267794 bytes)Available download formats
Dataset updated
May 16, 2024
Authors
Saurabh Badole
License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

Description:

This dataset contains information about bank customers and their churn status, which indicates whether they have exited the bank or not. It is suitable for exploring and analyzing factors influencing customer churn in banking institutions and for building predictive models to identify customers at risk of churning.

Features:

RowNumber: The sequential number assigned to each row in the dataset.

CustomerId: A unique identifier for each customer.

Surname: The surname of the customer.

CreditScore: The credit score of the customer.

Geography: The geographical location of the customer (e.g., country or region).

Gender: The gender of the customer.

Age: The age of the customer.

Tenure: The number of years the customer has been with the bank.

Balance: The account balance of the customer.

NumOfProducts: The number of bank products the customer has.

HasCrCard: Indicates whether the customer has a credit card (binary: yes/no).

IsActiveMember: Indicates whether the customer is an active member (binary: yes/no).

EstimatedSalary: The estimated salary of the customer.

Exited: Indicates whether the customer has exited the bank (binary: yes/no).

Usage:

  • This dataset can be used for exploratory data analysis to understand the factors influencing customer churn in banks.
  • It can also be used to build machine learning models for predicting customer churn based on the given features.

License:

This dataset is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Search
Clear search
Close search
Google apps
Main menu