6 datasets found
  1. f

    Details of feature variables of the data set.

    • plos.figshare.com
    xls
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Details of feature variables of the data set. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.

  2. Bank customer churn predictions model

    • kaggle.com
    Updated Sep 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    avazjon isoboev (2022). Bank customer churn predictions model [Dataset]. https://www.kaggle.com/datasets/avazisoboev/bank-customer-churn-predictions-model
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 2, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    avazjon isoboev
    Description

    Dataset

    This dataset was created by avazjon isoboev

    Contents

  3. Bank Churn Prediction

    • kaggle.com
    Updated Jan 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    willian oliveira gibin (2024). Bank Churn Prediction [Dataset]. http://doi.org/10.34740/kaggle/dsv/7466166
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 23, 2024
    Dataset provided by
    Kaggle
    Authors
    willian oliveira gibin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2Ff48666dbf6dc3882a23c91000928c455%2FDesign%20sem%20nome.png?generation=1706043006289244&alt=media" alt="">In the synthetic dataset for the Playground Series S4 E1 Binary Classification with a Bank Churn Dataset, various features have been engineered to capture relevant information about customers. The dataset includes label-encoded surnames and features derived from them using the TFIDF vectorizer. The credit score serves as a numerical representation of a customer's creditworthiness, while the geography feature indicates the country of residence, with one-hot encoding for France, Spain, and Germany.

    Gender is represented with one-hot encoding for male and female categories. Age, tenure, balance, and the number of products used by the customer offer insights into their banking behavior. The presence of a credit card, active membership status, and estimated salary are also included as binary features.

    Notable engineered features provide additional insights. Mem_no_Products is the product of the number of products and active membership status, offering a combined metric. Cred_Bal_Sal represents the ratio of the product of credit score and balance to estimated salary, providing a relative measure of financial health. The balance-to-salary ratio (Bal_sal) and the tenure-to-age ratio (Tenure_Age) offer further dimensions for analysis. Finally, Age_Tenure_product is a feature capturing the interaction between age and tenure.

    The target variable, 'Exited,' indicates whether a customer has churned, with a value of 1 for churned customers and 0 for those who have not. This dataset, with its diverse set of features and engineered metrics, provides a comprehensive foundation for binary classification tasks, enabling the exploration of factors influencing customer churn in the banking domain. Analysts and data scientists can leverage these features to build predictive models and gain insights into the dynamics of customer retention.

  4. f

    Comparison of GA-XGBoost with XGBoost and LightGBM test results.

    • figshare.com
    xls
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Comparison of GA-XGBoost with XGBoost and LightGBM test results. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of GA-XGBoost with XGBoost and LightGBM test results.

  5. Bank_ Customer_ Churn _Prediction_ Model_1

    • kaggle.com
    zip
    Updated Apr 14, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abraz Laskar (2021). Bank_ Customer_ Churn _Prediction_ Model_1 [Dataset]. https://www.kaggle.com/abrazlaskar/bank-customer-churn-prediction-model-1
    Explore at:
    zip(220458 bytes)Available download formats
    Dataset updated
    Apr 14, 2021
    Authors
    Abraz Laskar
    Description

    Dataset

    This dataset was created by Abraz Laskar

    Contents

  6. Bank Database

    • kaggle.com
    Updated Sep 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CeloCruz (2023). Bank Database [Dataset]. https://www.kaggle.com/datasets/celocruz/bank-database
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 24, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    CeloCruz
    Description

    Context We are a bank that has a database with a large amount of information about our customers. Our goal is to help analysts predict the churn rate of these customers in order to reduce it. The database includes demographic information such as age, gender, marital status and income category. It also contains information on card type, number of months in portfolio and inactive periods. In addition, it has key data on the spending behavior of customers approaching their cancellation decision. Among the latter information are the total renewable balance, the credit limit, the average open-to-buy rate and analyzable metrics such as the total amount of change from the fourth quarter to the first quarter or the average utilization rate.

    Against this data set we can capture up-to-date information that can determine the long-term stability of the account or its imminent exit.

    Objective Help analysts predict the churn rate of these customers in order to reduce it.

    Create a predictive classification model in order to classify the data in the test file.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ke Peng; Yan Peng; Wenguang Li (2023). Details of feature variables of the data set. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t002

Details of feature variables of the data set.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Dec 8, 2023
Dataset provided by
PLOS ONE
Authors
Ke Peng; Yan Peng; Wenguang Li
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.

Search
Clear search
Close search
Google apps
Main menu