Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
the churn prediction dataset, which contains raw data of 28,382 customers. The dataset includes the following columns:
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset contains information about bank customers and their churn status, which indicates whether they have exited the bank or not. It is suitable for exploring and analyzing factors influencing customer churn in banking institutions and for building predictive models to identify customers at risk of churning.
RowNumber: The sequential number assigned to each row in the dataset.
CustomerId: A unique identifier for each customer.
Surname: The surname of the customer.
CreditScore: The credit score of the customer.
Geography: The geographical location of the customer (e.g., country or region).
Gender: The gender of the customer.
Age: The age of the customer.
Tenure: The number of years the customer has been with the bank.
Balance: The account balance of the customer.
NumOfProducts: The number of bank products the customer has.
HasCrCard: Indicates whether the customer has a credit card (binary: yes/no).
IsActiveMember: Indicates whether the customer is an active member (binary: yes/no).
EstimatedSalary: The estimated salary of the customer.
Exited: Indicates whether the customer has exited the bank (binary: yes/no).
This dataset is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Facebook
TwitterBusiness Problem A business manager of a consumer credit card bank is facing the problem of customer attrition. They want to analyze the data to find out the reason behind this and leverage the same to predict customers who are likely to drop off.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The data will be used to predict whether a customer of the bank will churn. If a customer churns, it means they left the bank and took their business elsewhere. If you can predict which customers are likely to churn, you can take measures to retain them before they do. These measures could be promotions, discounts, or other incentives to boost customer satisfaction and, therefore, retention.
The dataset contains:
10,000 rows – each row is a unique customer of the bank
14 columns:
RowNumber: Row numbers from 1 to 10,000
CustomerId: Customer’s unique ID assigned by bank
Surname: Customer’s last name
CreditScore: Customer’s credit score. This number can range from 300 to 850.
Geography: Customer’s country of residence
Gender: Categorical indicator
Age: Customer’s age (years)
Tenure: Number of years customer has been with bank
Balance: Customer’s bank balance (Euros)
NumOfProducts: Number of products the customer has with the bank
HasCrCard: Indicates whether the customer has a credit card with the bank
IsActiveMember: Indicates whether the customer is considered active
EstimatedSalary: Customer’s estimated annual salary (Euros)
Exited: Indicates whether the customer churned (left the bank)
Facebook
TwitterSudeendraMG/Bank-Customer-Churn dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This synthetic dataset simulates customer data for a fictional bank in Botswana, specifically designed to model customer churn behavior. It includes a comprehensive set of customer demographics, financial data, product usage, and behavioral indicators that could influence whether a customer decides to leave the bank. The dataset is generated using the Python Faker library, ensuring realistic but entirely fictional data points for educational, testing, and modeling purposes.
Number of Records: 115,640 customers Churn Rate: Determined by a calculated churn risk score based on several customer attributes Geographical Focus: Botswana Data Structure: The dataset is organized in a tabular format, with each row representing a unique customer
This dataset is ideal for the following applications:
Churn Prediction Modeling: Building and evaluating machine learning models to predict customer churn. Customer Segmentation: Analyzing customer profiles and segmenting them based on various demographics and financial attributes. Product Analysis: Understanding which products are most associated with customer retention or churn. Educational Purposes: Teaching data science and machine learning concepts using a realistic dataset.
Facebook
Twitterhttps://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/
kusha7/bank-churn dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterThis dataset was created by Aarushi Kamboj
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global customer churn prediction for banking market size reached USD 2.17 billion in 2024, with a robust compound annual growth rate (CAGR) of 18.3%. This dynamic market is forecasted to reach USD 9.94 billion by 2033, driven by increasing digital transformation initiatives, the proliferation of advanced analytics, and the growing importance of customer retention in the highly competitive banking sector. As per our latest research, the surge in adoption of artificial intelligence (AI) and machine learning (ML) technologies, coupled with mounting regulatory requirements, is propelling the demand for sophisticated churn prediction solutions globally.
One of the primary growth factors fueling the customer churn prediction for banking market is the intensifying competition in the global banking landscape. Financial institutions are under constant pressure to retain their existing customer base, as acquiring new customers is significantly more costly than retaining current ones. With the rise of neobanks and fintech disruptors, traditional banks are increasingly leveraging predictive analytics to identify at-risk customers and proactively implement retention strategies. Furthermore, the shift toward personalized banking experiences has necessitated the use of churn prediction tools that analyze vast datasets to uncover behavioral patterns, transaction anomalies, and sentiment trends. This, in turn, enables banks to tailor their offerings and communication, thereby reducing churn rates and improving overall customer loyalty.
Another key driver for the market is the rapid advancement and integration of AI and ML technologies in banking operations. These technologies empower banks to process and analyze massive volumes of structured and unstructured data from multiple sources such as transaction records, social media, and customer service interactions. By deploying sophisticated algorithms, banks can detect early warning signs of customer dissatisfaction and predict potential churn with remarkable accuracy. The increased availability of cloud-based analytics platforms further accelerates adoption, as banks of all sizes can now access scalable, cost-effective churn prediction solutions without the need for heavy upfront investments in infrastructure. This democratization of technology is particularly beneficial for small and medium-sized enterprises (SMEs) in the banking sector.
Regulatory compliance and risk management are also significant contributors to market growth. As regulatory bodies worldwide impose stricter requirements on customer data management and transparency, banks are compelled to invest in advanced analytics to monitor customer behavior and mitigate risks associated with churn. Predictive models help institutions not only to comply with regulations but also to anticipate and address potential issues before they escalate. The integration of churn prediction tools into risk management frameworks enhances banks' ability to maintain stable customer portfolios, minimize revenue losses, and uphold reputational integrity in an increasingly scrutinized environment.
Regionally, North America continues to dominate the customer churn prediction for banking market, accounting for the largest share in 2024 due to the presence of major banking institutions, early technology adoption, and a mature digital infrastructure. However, the Asia Pacific region is exhibiting the fastest growth, driven by rapid urbanization, expanding digital banking ecosystems, and increasing investments in AI-driven analytics. Europe also remains a significant market, bolstered by regulatory mandates such as GDPR and the growing focus on customer-centric banking models. The Middle East & Africa and Latin America are emerging markets, with rising awareness and gradual adoption of churn prediction technologies as banks seek to modernize their operations and enhance customer engagement.
The customer churn prediction for banking market by component is segmented into software and services, each playing a pivotal role in the deployment and effectiveness of churn prediction systems. The software segment encompasses purpose-built analytics platforms, AI-driven modeling tools, and integrated customer relationship management (CRM) systems specifically designed for churn analysis. These solutions enable banks to collect, process,
Facebook
Twitterhaneuris1/bank-customer-churn dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Omar Belfeki
Released under Apache 2.0
Facebook
TwitterParthipan00410/Bank-Customer-Churn-Data dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
Facebook
Twittersasipriyank/bank-customer-churn dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterdpanchali/bank-customer-churn dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
Facebook
TwitterDataset Overview for XYZ Multistate Bank:
This dataset is for XYZ Multistate Bank and contains various columns that capture key aspects of customer behavior and attributes. Each column provides valuable insights into the factors influencing customer churn, with the goal of predicting which customers are most likely to leave the bank. Below is an explanation of each column and its relevance to customer retention.
1. RowNumber:
The "RowNumber" column corresponds to the unique record number for each customer entry. It has no impact on the outcome of customer churn but is used to identify and organize data within the dataset. Since it doesn't contain any meaningful information related to customer behavior, it is not relevant for churn prediction and can be excluded in analysis.
2. CustomerId:
The "CustomerId" column consists of randomly generated identifiers for each customer. While this ID helps to uniquely distinguish each customer, it has no impact on the likelihood of a customer leaving the bank. As a categorical feature, it does not contribute to the analysis of churn and can be omitted when building predictive models.
3. Surname:
The "Surname" column holds the last names of customers. Although this information is useful for identification purposes, it does not have a direct relationship with customer churn. Since a customer's surname is not an influencing factor in their decision to stay or leave the bank, it is not considered relevant for churn prediction and can be disregarded.
4. CreditScore:
"CreditScore" is an important variable that can significantly affect customer churn. Customers with higher credit scores are generally considered more financially stable and less likely to leave the bank, as they are less likely to face issues with financial institutions. Therefore, this feature can provide valuable insights into customer retention and should be included in churn analysis.
5. Geography:
"Geography" refers to the geographical location of the customer, which can influence their likelihood of leaving the bank. Customers living in different regions may have varying experiences with the bank’s services, fees, or offerings, making this an important factor to explore. Understanding regional differences helps tailor retention strategies for specific locations and improve overall customer satisfaction.
6. Gender:
"Gender" is an interesting demographic factor to consider in churn prediction. While gender itself may not directly affect the likelihood of a customer leaving, it could correlate with other behavioral patterns or preferences that influence retention. Analyzing gender in combination with other features may reveal potential insights, making it worthwhile to examine as part of the churn model.
7. Age:
The "Age" column is a key factor in understanding customer behavior. Typically, older customers are less likely to churn because they tend to be more established with their financial institutions and may have a greater sense of loyalty. In contrast, younger customers may be more likely to switch banks, especially if they are seeking better services or offers. This feature is essential for predicting churn and should be analyzed in detail.
8. Tenure:
"Tenure" refers to the number of years a customer has been with the bank. Longer-tenured customers are often more loyal and less likely to leave the bank. The correlation between tenure and churn is strong, as established relationships tend to make customers less susceptible to leaving. This is a critical factor for churn prediction and should be given high consideration when modeling customer retention.
9. Balance:
The "Balance" column reflects the amount of money a customer holds in their bank account. Customers with higher balances are typically more invested in the bank and are less likely to leave. In contrast, customers with low balances may be more willing to switch to other financial institutions offering better rates or services. This feature plays a significant role in churn prediction, as financial stakes are directly tied to loyalty.
10. NumOfProducts:
"NumOfProducts" refers to the number of products (e.g., savings accounts, loans, credit cards) that a customer has with the bank. Customers with multiple products are usually more invested in the bank, making them less likely to leave. The greater the number of products, the higher the customer's commitment to the bank, making this feature highly relevant in understanding churn patterns and developing retention strategies.
11. HasCrCard:
"HasCrCard" indicates whether or not a customer holds a credit card with the bank. Having a credit card typically reduces the likelihood of customer churn, as credit cards are a widely used financial product that locks customers into a long-term relatio...
Facebook
TwitterThe rise of digital disruptors, challenger banks, and sustainability-focused financial institutions has reshaped the banking landscape, drawing billions in investment. To compete with established players, these newcomers have had to balance rapid customer acquisition with long-term retention. While digital banks once displayed wide swings in retention rates - some enjoying strong loyalty while others faced steep churn - recent trends suggest that retention has begun to stabilize. In the first quarter of 2025, for example, Monzo reported a positive retention ratio, while Starling Bank experienced a modest decline. Biggest winners In the first quarter of 2025, Nationwide and Monzo emerged as the leaders in customer retention, achieving an impressive ratio of *** and**** new customers for every one lost, respectively. Danske Bank, HSBC, The Co-operative Bank, and Triodos Bank also achieved good results, with *** customers switching to their services for every departing customer. In stark contrast, AIB Group faced significant challenges, with a concerning ratio of **** customers leaving for each new customer acquired. Customer growth of digital banks Digital-only banks have achieved remarkable growth in the European financial sector, with London-based Revolut leading the charge. In November 2024, Revolut reported a significant milestone of over ** million global customers, building on its strong momentum from 2024 when monthly app downloads surpassed *** million.
Facebook
TwitterAlthough the results were close, the industry in the United States where customers were most likely to leave their current provider due to poor customer service appears to be cable television, with a 25 percent churn rate in 2020.
Churn rate
Churn rate, sometimes also called attrition rate, is the percentage of customers that stop utilizing a service within a time given period. It is often used to measure businesses which have a contractual customer base, especially subscriber-based service models.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of GA-XGBoost with XGBoost and LightGBM test results.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
the churn prediction dataset, which contains raw data of 28,382 customers. The dataset includes the following columns: