Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Bank Customer Churn Dataset is a collection of data related to customers of a bank who have either left (churned) or stayed with the bank. This dataset is typically used for predictive modeling to identify patterns and factors that lead to customer churn, enabling banks to take proactive measures to retain customers.
id: Unique identifier for each customer.
CustomerId: Unique identifier for the customer account.
Surname: Last name of the customer.
CreditScore: Numeric representation of the customer's creditworthiness.
Geography:str, Gender:str:Country or region where the customer resides ,Gender of the customer (e.g., Male, Female).
Age: Age of the customer.
Tenure: Number of years the customer has been with the bank.
Balance: Current balance in the customer's account.
NumOfProducts: Number of bank products the customer uses.
HasCrCard: Binary indicator (0 or 1) for whether the customer has a credit card.
IsActiveMember: Binary indicator (0 or 1) for whether the customer is an active member.
EstimatedSalary: Estimated salary of the customer.
Exited: Binary indicator (0 or 1) for whether the customer has churned (the target).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Ayush Singh Verma
Released under Apache 2.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Bank Turnover Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/barelydedicated/bank-customer-churn-modeling on 28 January 2022.
--- No further description of dataset provided by original source ---
--- Original source retains full ownership of the source dataset ---
This dataset was created by Vikas Satheesh
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Omar Belfeki
Released under Apache 2.0
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by NITANT TYAGI
Released under Apache 2.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of GA-XGBoost with XGBoost and LightGBM test results.
This dataset was created by Shashank Moon
This dataset was created by Vanshika Narang
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance comparison of different adoption algorithms in XGBoost model.
This dataset was created by Amrut Nikam
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Churn Modelling’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shrutimechlearn/churn-modelling on 12 November 2021.
--- Dataset description provided by original source is as follows ---
This data set contains details of a bank's customers and the target variable is a binary variable reflecting the fact whether the customer left the bank (closed his account) or he continues to be a customer.
Big thanks to https://www.superdatascience.com/pages/deep-learning Banner Photo by Sharon McCutcheon on Unsplash
--- Original source retains full ownership of the source dataset ---
This dataset was created by avazjon isoboev
This dataset was created by Venugopal Adep
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides insights into customer churn patterns and behaviors for Kiwibank, a leading New Zealand-owned financial institution. It includes demographic information (such as age, gender, geography), banking metrics (credit score, balance, products), and customer activity indicators. The dataset is suitable for predictive modeling tasks (e.g., predicting customer churn using machine learning algorithms like Naive Bayes, Random Forest, and Decision Tree) and clustering analysis (e.g., K-Means clustering to identify customer segments). Analyzing this dataset can help financial analysts, data scientists, and business strategists understand factors influencing customer retention and optimize strategies to improve customer satisfaction and loyalty. Key Features: Customer demographics: Age, gender, geography. Banking metrics: Credit score, balance, number of products. Customer activity: Tenure, usage of credit cards, activity level. Target variable: Churn (1 if the customer has churned, 0 otherwise). Potential Use Cases: Predictive modeling for customer churn prevention. Segmentation analysis to target marketing campaigns. Insights for enhancing customer retention strategies.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2Ff48666dbf6dc3882a23c91000928c455%2FDesign%20sem%20nome.png?generation=1706043006289244&alt=media" alt="">In the synthetic dataset for the Playground Series S4 E1 Binary Classification with a Bank Churn Dataset, various features have been engineered to capture relevant information about customers. The dataset includes label-encoded surnames and features derived from them using the TFIDF vectorizer. The credit score serves as a numerical representation of a customer's creditworthiness, while the geography feature indicates the country of residence, with one-hot encoding for France, Spain, and Germany.
Gender is represented with one-hot encoding for male and female categories. Age, tenure, balance, and the number of products used by the customer offer insights into their banking behavior. The presence of a credit card, active membership status, and estimated salary are also included as binary features.
Notable engineered features provide additional insights. Mem_no_Products is the product of the number of products and active membership status, offering a combined metric. Cred_Bal_Sal represents the ratio of the product of credit score and balance to estimated salary, providing a relative measure of financial health. The balance-to-salary ratio (Bal_sal) and the tenure-to-age ratio (Tenure_Age) offer further dimensions for analysis. Finally, Age_Tenure_product is a feature capturing the interaction between age and tenure.
The target variable, 'Exited,' indicates whether a customer has churned, with a value of 1 for churned customers and 0 for those who have not. This dataset, with its diverse set of features and engineered metrics, provides a comprehensive foundation for binary classification tasks, enabling the exploration of factors influencing customer churn in the banking domain. Analysts and data scientists can leverage these features to build predictive models and gain insights into the dynamics of customer retention.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
the churn prediction dataset, which contains raw data of 28,382 customers. The dataset includes the following columns:
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by ASMIT BANDYOPADHYAY
Released under Apache 2.0
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Bank Customer Churn Dataset is a collection of data related to customers of a bank who have either left (churned) or stayed with the bank. This dataset is typically used for predictive modeling to identify patterns and factors that lead to customer churn, enabling banks to take proactive measures to retain customers.
id: Unique identifier for each customer.
CustomerId: Unique identifier for the customer account.
Surname: Last name of the customer.
CreditScore: Numeric representation of the customer's creditworthiness.
Geography:str, Gender:str:Country or region where the customer resides ,Gender of the customer (e.g., Male, Female).
Age: Age of the customer.
Tenure: Number of years the customer has been with the bank.
Balance: Current balance in the customer's account.
NumOfProducts: Number of bank products the customer uses.
HasCrCard: Binary indicator (0 or 1) for whether the customer has a credit card.
IsActiveMember: Binary indicator (0 or 1) for whether the customer is an active member.
EstimatedSalary: Estimated salary of the customer.
Exited: Binary indicator (0 or 1) for whether the customer has churned (the target).