Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset contains information about bank customers and their churn status, which indicates whether they have exited the bank or not. It is suitable for exploring and analyzing factors influencing customer churn in banking institutions and for building predictive models to identify customers at risk of churning.
RowNumber: The sequential number assigned to each row in the dataset.
CustomerId: A unique identifier for each customer.
Surname: The surname of the customer.
CreditScore: The credit score of the customer.
Geography: The geographical location of the customer (e.g., country or region).
Gender: The gender of the customer.
Age: The age of the customer.
Tenure: The number of years the customer has been with the bank.
Balance: The account balance of the customer.
NumOfProducts: The number of bank products the customer has.
HasCrCard: Indicates whether the customer has a credit card (binary: yes/no).
IsActiveMember: Indicates whether the customer is an active member (binary: yes/no).
EstimatedSalary: The estimated salary of the customer.
Exited: Indicates whether the customer has exited the bank (binary: yes/no).
This dataset is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
the churn prediction dataset, which contains raw data of 28,382 customers. The dataset includes the following columns:
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The data will be used to predict whether a customer of the bank will churn. If a customer churns, it means they left the bank and took their business elsewhere. If you can predict which customers are likely to churn, you can take measures to retain them before they do. These measures could be promotions, discounts, or other incentives to boost customer satisfaction and, therefore, retention.
The dataset contains:
10,000 rows – each row is a unique customer of the bank
14 columns:
RowNumber: Row numbers from 1 to 10,000
CustomerId: Customer’s unique ID assigned by bank
Surname: Customer’s last name
CreditScore: Customer’s credit score. This number can range from 300 to 850.
Geography: Customer’s country of residence
Gender: Categorical indicator
Age: Customer’s age (years)
Tenure: Number of years customer has been with bank
Balance: Customer’s bank balance (Euros)
NumOfProducts: Number of products the customer has with the bank
HasCrCard: Indicates whether the customer has a credit card with the bank
IsActiveMember: Indicates whether the customer is considered active
EstimatedSalary: Customer’s estimated annual salary (Euros)
Exited: Indicates whether the customer churned (left the bank)
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This synthetic dataset simulates customer data for a fictional bank in Botswana, specifically designed to model customer churn behavior. It includes a comprehensive set of customer demographics, financial data, product usage, and behavioral indicators that could influence whether a customer decides to leave the bank. The dataset is generated using the Python Faker library, ensuring realistic but entirely fictional data points for educational, testing, and modeling purposes.
Number of Records: 115,640 customers Churn Rate: Determined by a calculated churn risk score based on several customer attributes Geographical Focus: Botswana Data Structure: The dataset is organized in a tabular format, with each row representing a unique customer
This dataset is ideal for the following applications:
Churn Prediction Modeling: Building and evaluating machine learning models to predict customer churn. Customer Segmentation: Analyzing customer profiles and segmenting them based on various demographics and financial attributes. Product Analysis: Understanding which products are most associated with customer retention or churn. Educational Purposes: Teaching data science and machine learning concepts using a realistic dataset.
Facebook
TwitterThis dataset was created by Aarushi Kamboj
Facebook
TwitterDataset Overview for XYZ Multistate Bank:
This dataset is for XYZ Multistate Bank and contains various columns that capture key aspects of customer behavior and attributes. Each column provides valuable insights into the factors influencing customer churn, with the goal of predicting which customers are most likely to leave the bank. Below is an explanation of each column and its relevance to customer retention.
1. RowNumber:
The "RowNumber" column corresponds to the unique record number for each customer entry. It has no impact on the outcome of customer churn but is used to identify and organize data within the dataset. Since it doesn't contain any meaningful information related to customer behavior, it is not relevant for churn prediction and can be excluded in analysis.
2. CustomerId:
The "CustomerId" column consists of randomly generated identifiers for each customer. While this ID helps to uniquely distinguish each customer, it has no impact on the likelihood of a customer leaving the bank. As a categorical feature, it does not contribute to the analysis of churn and can be omitted when building predictive models.
3. Surname:
The "Surname" column holds the last names of customers. Although this information is useful for identification purposes, it does not have a direct relationship with customer churn. Since a customer's surname is not an influencing factor in their decision to stay or leave the bank, it is not considered relevant for churn prediction and can be disregarded.
4. CreditScore:
"CreditScore" is an important variable that can significantly affect customer churn. Customers with higher credit scores are generally considered more financially stable and less likely to leave the bank, as they are less likely to face issues with financial institutions. Therefore, this feature can provide valuable insights into customer retention and should be included in churn analysis.
5. Geography:
"Geography" refers to the geographical location of the customer, which can influence their likelihood of leaving the bank. Customers living in different regions may have varying experiences with the bank’s services, fees, or offerings, making this an important factor to explore. Understanding regional differences helps tailor retention strategies for specific locations and improve overall customer satisfaction.
6. Gender:
"Gender" is an interesting demographic factor to consider in churn prediction. While gender itself may not directly affect the likelihood of a customer leaving, it could correlate with other behavioral patterns or preferences that influence retention. Analyzing gender in combination with other features may reveal potential insights, making it worthwhile to examine as part of the churn model.
7. Age:
The "Age" column is a key factor in understanding customer behavior. Typically, older customers are less likely to churn because they tend to be more established with their financial institutions and may have a greater sense of loyalty. In contrast, younger customers may be more likely to switch banks, especially if they are seeking better services or offers. This feature is essential for predicting churn and should be analyzed in detail.
8. Tenure:
"Tenure" refers to the number of years a customer has been with the bank. Longer-tenured customers are often more loyal and less likely to leave the bank. The correlation between tenure and churn is strong, as established relationships tend to make customers less susceptible to leaving. This is a critical factor for churn prediction and should be given high consideration when modeling customer retention.
9. Balance:
The "Balance" column reflects the amount of money a customer holds in their bank account. Customers with higher balances are typically more invested in the bank and are less likely to leave. In contrast, customers with low balances may be more willing to switch to other financial institutions offering better rates or services. This feature plays a significant role in churn prediction, as financial stakes are directly tied to loyalty.
10. NumOfProducts:
"NumOfProducts" refers to the number of products (e.g., savings accounts, loans, credit cards) that a customer has with the bank. Customers with multiple products are usually more invested in the bank, making them less likely to leave. The greater the number of products, the higher the customer's commitment to the bank, making this feature highly relevant in understanding churn patterns and developing retention strategies.
11. HasCrCard:
"HasCrCard" indicates whether or not a customer holds a credit card with the bank. Having a credit card typically reduces the likelihood of customer churn, as credit cards are a widely used financial product that locks customers into a long-term relatio...
Facebook
TwitterAssignment #1: EDA & Dataset - Bank Customer Churn Analysis
Overview
This project presents an exploratory data analysis (EDA) of a bank customer churn dataset. The analysis aims to uncover key factors influencing customer churn and provide actionable insights for customer retention strategies.
Dataset Information
Source: Bank Customer Churn Dataset Size: 10,000 records Features: Multiple demographic and behavioral attributes including age, credit score… See the full description on the dataset page: https://huggingface.co/datasets/EthanGabis/Banking_Customer_Churn_Prediction_Dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Customer churn prediction dataset of a fictional telecommunication company made by IBM Sample Datasets. Context Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs. Content Each row represents a customer, each column contains customer’s attributes described on the column metadata. The data set includes information about:
Customers who left within the last month: the column is called Churn Services that each customer… See the full description on the dataset page: https://huggingface.co/datasets/scikit-learn/churn-prediction.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides comprehensive behavioral and transaction profiles for bank customers, including demographics, account activity, channel preferences, and churn risk scores. It is designed for advanced customer segmentation, targeted marketing, and predictive analytics to drive retention and personalized banking strategies.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Tarun Sunkaraneni
Released under CC0: Public Domain
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
Facebook
TwitterThis dataset was created by Kishor Pansare
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of GA-XGBoost with XGBoost and LightGBM test results.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
As per our latest research, the global bank customer experience platform market size in 2024 stands at USD 10.7 billion, reflecting robust adoption across banking institutions worldwide. The market is forecasted to grow at a compound annual growth rate (CAGR) of 14.2% from 2025 to 2033, reaching a projected value of USD 32.5 billion by 2033. This growth is primarily driven by the rising demand for personalized banking, digital transformation initiatives, and the integration of advanced technologies such as artificial intelligence and machine learning within banking platforms.
One of the primary growth factors for the bank customer experience platform market is the increasing emphasis on customer-centric banking. Banks are increasingly recognizing the importance of delivering seamless, omnichannel experiences to retain and attract customers in an era where digital convenience is paramount. The proliferation of smartphones and digital channels has raised customer expectations for instant, consistent, and personalized interactions. As a result, financial institutions are investing heavily in platforms that unify customer data, streamline workflows, and enable real-time engagement. These platforms not only enhance satisfaction but also drive cross-selling and up-selling opportunities, directly contributing to revenue growth.
Another significant driver is the rapid advancement in technology, particularly in artificial intelligence (AI), machine learning (ML), and data analytics. These technologies empower banks to analyze vast volumes of customer data, extract actionable insights, and deliver hyper-personalized experiences. AI-driven chatbots and virtual assistants are revolutionizing customer support, reducing response times, and improving query resolution rates. Furthermore, predictive analytics enable banks to anticipate customer needs, proactively offer relevant products, and mitigate churn. The integration of these intelligent solutions within bank customer experience platforms is accelerating digital transformation and setting new standards for customer engagement in the banking sector.
Regulatory compliance and data security concerns are also fueling the adoption of advanced customer experience platforms. With stringent regulations such as GDPR, PSD2, and various local data protection laws, banks must ensure secure handling of sensitive customer information. Modern experience platforms offer robust security features, audit trails, and compliance management capabilities, enabling banks to meet regulatory requirements while delivering superior customer service. Additionally, these platforms facilitate smoother onboarding processes, faster loan approvals, and more transparent account management, all of which are critical for enhancing customer trust and loyalty in a competitive market.
From a regional perspective, North America continues to lead the market, accounting for the largest share in 2024, driven by early digital adoption, a mature banking sector, and significant investments in fintech innovation. Europe follows closely, propelled by regulatory mandates and a strong focus on customer privacy. Meanwhile, Asia Pacific is emerging as the fastest-growing region, owing to rapid urbanization, increasing smartphone penetration, and a burgeoning middle class demanding modern banking experiences. Latin America and the Middle East & Africa are also witnessing steady growth, supported by digital infrastructure improvements and rising financial inclusion initiatives. This regional diversity highlights the global momentum behind the bank customer experience platform market.
The bank customer experience platform market by component is segmented into software and services. The software segment dominates the market, accounting for the majority of revenue in 2024, as banks prioritize scalable, modular, and customizable solutions to enhance customer engagement. These software platforms typically encompass customer relationship management (CRM), analytics, workflow automation, and omnichannel communication tools. The growing need for integrated systems that unify customer data, streamline operations, and enable real-time insights is driving the adoption of software solutions. Banks are seeking platforms that not only support day-to-day operations but also facilitate digital transformation and innovation in customer experience.
</p&g
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance comparison of different adoption algorithms in XGBoost model.
Facebook
Twitter"Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]
Each row represents a customer, each column contains customer’s attributes described on the column Metadata.
The data set includes information about:
To explore this type of models and learn more about the subject.
New version from IBM: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset contains information about bank customers and their churn status, which indicates whether they have exited the bank or not. It is suitable for exploring and analyzing factors influencing customer churn in banking institutions and for building predictive models to identify customers at risk of churning.
RowNumber: The sequential number assigned to each row in the dataset.
CustomerId: A unique identifier for each customer.
Surname: The surname of the customer.
CreditScore: The credit score of the customer.
Geography: The geographical location of the customer (e.g., country or region).
Gender: The gender of the customer.
Age: The age of the customer.
Tenure: The number of years the customer has been with the bank.
Balance: The account balance of the customer.
NumOfProducts: The number of bank products the customer has.
HasCrCard: Indicates whether the customer has a credit card (binary: yes/no).
IsActiveMember: Indicates whether the customer is an active member (binary: yes/no).
EstimatedSalary: The estimated salary of the customer.
Exited: Indicates whether the customer has exited the bank (binary: yes/no).
This dataset is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.