1 dataset found
  1. Bank Churn Prediction

    • kaggle.com
    Updated Jan 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    willian oliveira gibin (2024). Bank Churn Prediction [Dataset]. http://doi.org/10.34740/kaggle/dsv/7466166
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 23, 2024
    Dataset provided by
    Kaggle
    Authors
    willian oliveira gibin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2Ff48666dbf6dc3882a23c91000928c455%2FDesign%20sem%20nome.png?generation=1706043006289244&alt=media" alt="">In the synthetic dataset for the Playground Series S4 E1 Binary Classification with a Bank Churn Dataset, various features have been engineered to capture relevant information about customers. The dataset includes label-encoded surnames and features derived from them using the TFIDF vectorizer. The credit score serves as a numerical representation of a customer's creditworthiness, while the geography feature indicates the country of residence, with one-hot encoding for France, Spain, and Germany.

    Gender is represented with one-hot encoding for male and female categories. Age, tenure, balance, and the number of products used by the customer offer insights into their banking behavior. The presence of a credit card, active membership status, and estimated salary are also included as binary features.

    Notable engineered features provide additional insights. Mem_no_Products is the product of the number of products and active membership status, offering a combined metric. Cred_Bal_Sal represents the ratio of the product of credit score and balance to estimated salary, providing a relative measure of financial health. The balance-to-salary ratio (Bal_sal) and the tenure-to-age ratio (Tenure_Age) offer further dimensions for analysis. Finally, Age_Tenure_product is a feature capturing the interaction between age and tenure.

    The target variable, 'Exited,' indicates whether a customer has churned, with a value of 1 for churned customers and 0 for those who have not. This dataset, with its diverse set of features and engineered metrics, provides a comprehensive foundation for binary classification tasks, enabling the exploration of factors influencing customer churn in the banking domain. Analysts and data scientists can leverage these features to build predictive models and gain insights into the dynamics of customer retention.

  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
willian oliveira gibin (2024). Bank Churn Prediction [Dataset]. http://doi.org/10.34740/kaggle/dsv/7466166
Organization logo

Bank Churn Prediction

The dataset includes label-encoded surnames and features derived from them using

Explore at:
57 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 23, 2024
Dataset provided by
Kaggle
Authors
willian oliveira gibin
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2Ff48666dbf6dc3882a23c91000928c455%2FDesign%20sem%20nome.png?generation=1706043006289244&alt=media" alt="">In the synthetic dataset for the Playground Series S4 E1 Binary Classification with a Bank Churn Dataset, various features have been engineered to capture relevant information about customers. The dataset includes label-encoded surnames and features derived from them using the TFIDF vectorizer. The credit score serves as a numerical representation of a customer's creditworthiness, while the geography feature indicates the country of residence, with one-hot encoding for France, Spain, and Germany.

Gender is represented with one-hot encoding for male and female categories. Age, tenure, balance, and the number of products used by the customer offer insights into their banking behavior. The presence of a credit card, active membership status, and estimated salary are also included as binary features.

Notable engineered features provide additional insights. Mem_no_Products is the product of the number of products and active membership status, offering a combined metric. Cred_Bal_Sal represents the ratio of the product of credit score and balance to estimated salary, providing a relative measure of financial health. The balance-to-salary ratio (Bal_sal) and the tenure-to-age ratio (Tenure_Age) offer further dimensions for analysis. Finally, Age_Tenure_product is a feature capturing the interaction between age and tenure.

The target variable, 'Exited,' indicates whether a customer has churned, with a value of 1 for churned customers and 0 for those who have not. This dataset, with its diverse set of features and engineered metrics, provides a comprehensive foundation for binary classification tasks, enabling the exploration of factors influencing customer churn in the banking domain. Analysts and data scientists can leverage these features to build predictive models and gain insights into the dynamics of customer retention.

Search
Clear search
Close search
Google apps
Main menu