Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset contains information about bank customers and their churn status, which indicates whether they have exited the bank or not. It is suitable for exploring and analyzing factors influencing customer churn in banking institutions and for building predictive models to identify customers at risk of churning.
RowNumber: The sequential number assigned to each row in the dataset.
CustomerId: A unique identifier for each customer.
Surname: The surname of the customer.
CreditScore: The credit score of the customer.
Geography: The geographical location of the customer (e.g., country or region).
Gender: The gender of the customer.
Age: The age of the customer.
Tenure: The number of years the customer has been with the bank.
Balance: The account balance of the customer.
NumOfProducts: The number of bank products the customer has.
HasCrCard: Indicates whether the customer has a credit card (binary: yes/no).
IsActiveMember: Indicates whether the customer is an active member (binary: yes/no).
EstimatedSalary: The estimated salary of the customer.
Exited: Indicates whether the customer has exited the bank (binary: yes/no).
This dataset is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
the churn prediction dataset, which contains raw data of 28,382 customers. The dataset includes the following columns:
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This synthetic dataset simulates customer data for a fictional bank in Botswana, specifically designed to model customer churn behavior. It includes a comprehensive set of customer demographics, financial data, product usage, and behavioral indicators that could influence whether a customer decides to leave the bank. The dataset is generated using the Python Faker library, ensuring realistic but entirely fictional data points for educational, testing, and modeling purposes.
Number of Records: 115,640 customers Churn Rate: Determined by a calculated churn risk score based on several customer attributes Geographical Focus: Botswana Data Structure: The dataset is organized in a tabular format, with each row representing a unique customer
This dataset is ideal for the following applications:
Churn Prediction Modeling: Building and evaluating machine learning models to predict customer churn. Customer Segmentation: Analyzing customer profiles and segmenting them based on various demographics and financial attributes. Product Analysis: Understanding which products are most associated with customer retention or churn. Educational Purposes: Teaching data science and machine learning concepts using a realistic dataset.
Facebook
Twitterhttps://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/
kusha7/bank-churn dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The data will be used to predict whether a customer of the bank will churn. If a customer churns, it means they left the bank and took their business elsewhere. If you can predict which customers are likely to churn, you can take measures to retain them before they do. These measures could be promotions, discounts, or other incentives to boost customer satisfaction and, therefore, retention.
The dataset contains:
10,000 rows – each row is a unique customer of the bank
14 columns:
RowNumber: Row numbers from 1 to 10,000
CustomerId: Customer’s unique ID assigned by bank
Surname: Customer’s last name
CreditScore: Customer’s credit score. This number can range from 300 to 850.
Geography: Customer’s country of residence
Gender: Categorical indicator
Age: Customer’s age (years)
Tenure: Number of years customer has been with bank
Balance: Customer’s bank balance (Euros)
NumOfProducts: Number of products the customer has with the bank
HasCrCard: Indicates whether the customer has a credit card with the bank
IsActiveMember: Indicates whether the customer is considered active
EstimatedSalary: Customer’s estimated annual salary (Euros)
Exited: Indicates whether the customer churned (left the bank)
Facebook
TwitterSudeendraMG/Bank-Customer-Churn dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterDataset Overview for XYZ Multistate Bank:
This dataset is for XYZ Multistate Bank and contains various columns that capture key aspects of customer behavior and attributes. Each column provides valuable insights into the factors influencing customer churn, with the goal of predicting which customers are most likely to leave the bank. Below is an explanation of each column and its relevance to customer retention.
1. RowNumber:
The "RowNumber" column corresponds to the unique record number for each customer entry. It has no impact on the outcome of customer churn but is used to identify and organize data within the dataset. Since it doesn't contain any meaningful information related to customer behavior, it is not relevant for churn prediction and can be excluded in analysis.
2. CustomerId:
The "CustomerId" column consists of randomly generated identifiers for each customer. While this ID helps to uniquely distinguish each customer, it has no impact on the likelihood of a customer leaving the bank. As a categorical feature, it does not contribute to the analysis of churn and can be omitted when building predictive models.
3. Surname:
The "Surname" column holds the last names of customers. Although this information is useful for identification purposes, it does not have a direct relationship with customer churn. Since a customer's surname is not an influencing factor in their decision to stay or leave the bank, it is not considered relevant for churn prediction and can be disregarded.
4. CreditScore:
"CreditScore" is an important variable that can significantly affect customer churn. Customers with higher credit scores are generally considered more financially stable and less likely to leave the bank, as they are less likely to face issues with financial institutions. Therefore, this feature can provide valuable insights into customer retention and should be included in churn analysis.
5. Geography:
"Geography" refers to the geographical location of the customer, which can influence their likelihood of leaving the bank. Customers living in different regions may have varying experiences with the bank’s services, fees, or offerings, making this an important factor to explore. Understanding regional differences helps tailor retention strategies for specific locations and improve overall customer satisfaction.
6. Gender:
"Gender" is an interesting demographic factor to consider in churn prediction. While gender itself may not directly affect the likelihood of a customer leaving, it could correlate with other behavioral patterns or preferences that influence retention. Analyzing gender in combination with other features may reveal potential insights, making it worthwhile to examine as part of the churn model.
7. Age:
The "Age" column is a key factor in understanding customer behavior. Typically, older customers are less likely to churn because they tend to be more established with their financial institutions and may have a greater sense of loyalty. In contrast, younger customers may be more likely to switch banks, especially if they are seeking better services or offers. This feature is essential for predicting churn and should be analyzed in detail.
8. Tenure:
"Tenure" refers to the number of years a customer has been with the bank. Longer-tenured customers are often more loyal and less likely to leave the bank. The correlation between tenure and churn is strong, as established relationships tend to make customers less susceptible to leaving. This is a critical factor for churn prediction and should be given high consideration when modeling customer retention.
9. Balance:
The "Balance" column reflects the amount of money a customer holds in their bank account. Customers with higher balances are typically more invested in the bank and are less likely to leave. In contrast, customers with low balances may be more willing to switch to other financial institutions offering better rates or services. This feature plays a significant role in churn prediction, as financial stakes are directly tied to loyalty.
10. NumOfProducts:
"NumOfProducts" refers to the number of products (e.g., savings accounts, loans, credit cards) that a customer has with the bank. Customers with multiple products are usually more invested in the bank, making them less likely to leave. The greater the number of products, the higher the customer's commitment to the bank, making this feature highly relevant in understanding churn patterns and developing retention strategies.
11. HasCrCard:
"HasCrCard" indicates whether or not a customer holds a credit card with the bank. Having a credit card typically reduces the likelihood of customer churn, as credit cards are a widely used financial product that locks customers into a long-term relatio...
Facebook
Twitterhaneuris1/bank-customer-churn dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterBusiness Problem A business manager of a consumer credit card bank is facing the problem of customer attrition. They want to analyze the data to find out the reason behind this and leverage the same to predict customers who are likely to drop off.
Facebook
TwitterThis dataset was created by Aarushi Kamboj
Facebook
Twittersubratm62/bank-customer-churn dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
Facebook
TwitterThe rise of digital disruptors, challenger banks, and sustainability-focused financial institutions has reshaped the banking landscape, drawing billions in investment. To compete with established players, these newcomers have had to balance rapid customer acquisition with long-term retention. While digital banks once displayed wide swings in retention rates - some enjoying strong loyalty while others faced steep churn - recent trends suggest that retention has begun to stabilize. In the first quarter of 2025, for example, Monzo reported a positive retention ratio, while Starling Bank experienced a modest decline. Biggest winners In the first quarter of 2025, Nationwide and Monzo emerged as the leaders in customer retention, achieving an impressive ratio of *** and**** new customers for every one lost, respectively. Danske Bank, HSBC, The Co-operative Bank, and Triodos Bank also achieved good results, with *** customers switching to their services for every departing customer. In stark contrast, AIB Group faced significant challenges, with a concerning ratio of **** customers leaving for each new customer acquired. Customer growth of digital banks Digital-only banks have achieved remarkable growth in the European financial sector, with London-based Revolut leading the charge. In November 2024, Revolut reported a significant milestone of over ** million global customers, building on its strong momentum from 2024 when monthly app downloads surpassed *** million.
Facebook
Twittersasipriyank/bank-customer-churn dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Tarun Sunkaraneni
Released under CC0: Public Domain
Facebook
TwitterThis dataset was created by Rubel Mia
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of GA-XGBoost with XGBoost and LightGBM test results.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
About Dataset This dataset is for ABC Multistate bank with following columns:
Facebook
TwitterThe statistic presents the share of respondents who would consider joining a bank with no branch locations in the United States in 2014, by age group. It was found that 39 percent of the respondents from the 18-34 years old age group would consider switching to a bank without branch locations.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset contains information about bank customers and their churn status, which indicates whether they have exited the bank or not. It is suitable for exploring and analyzing factors influencing customer churn in banking institutions and for building predictive models to identify customers at risk of churning.
RowNumber: The sequential number assigned to each row in the dataset.
CustomerId: A unique identifier for each customer.
Surname: The surname of the customer.
CreditScore: The credit score of the customer.
Geography: The geographical location of the customer (e.g., country or region).
Gender: The gender of the customer.
Age: The age of the customer.
Tenure: The number of years the customer has been with the bank.
Balance: The account balance of the customer.
NumOfProducts: The number of bank products the customer has.
HasCrCard: Indicates whether the customer has a credit card (binary: yes/no).
IsActiveMember: Indicates whether the customer is an active member (binary: yes/no).
EstimatedSalary: The estimated salary of the customer.
Exited: Indicates whether the customer has exited the bank (binary: yes/no).
This dataset is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.