13 datasets found
  1. Bank Customer Churn Dataset

    • kaggle.com
    Updated Jul 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhuvi Ranga (2023). Bank Customer Churn Dataset [Dataset]. https://www.kaggle.com/datasets/bhuviranga/customer-churn-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 11, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bhuvi Ranga
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    The customer churn dataset is a collection of customer data that focuses on predicting customer churn, which refers to the tendency of customers to stop using a company's products or services. The dataset contains various features that describe each customer, such as their credit score, country, gender, age, tenure, balance, number of products, credit card status, active membership, estimated salary, and churn status. The churn status indicates whether a customer has churned or not. The dataset is used to analyze and understand factors that contribute to customer churn and to build predictive models to identify customers at risk of churning. The goal is to develop strategies and interventions to reduce churn and improve customer retention

  2. Predicting Credit Card Customer Segmentation

    • kaggle.com
    Updated Mar 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2024). Predicting Credit Card Customer Segmentation [Dataset]. https://www.kaggle.com/datasets/thedevastator/predicting-credit-card-customer-attrition-with-m
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 10, 2024
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Predicting Credit Card Customer Segmentation

    Exploring Key Customer Characteristics

    By [source]

    About this dataset

    This dataset contains a wealth of customer information collected from within a consumer credit card portfolio, with the aim of helping analysts predict customer attrition. It includes comprehensive demographic details such as age, gender, marital status and income category, as well as insight into each customer’s relationship with the credit card provider such as the card type, number of months on book and inactive periods. Additionally it holds key data about customers’ spending behavior drawing closer to their churn decision such as total revolving balance, credit limit, average open to buy rate and analyzable metrics like total amount of change from quarter 4 to quarter 1, average utilization ratio and Naive Bayes classifier attrition flag (Card category is combined with contacts count in 12months period alongside dependent count plus education level & months inactive). Faced with this set of useful predicted data points across multiple variables capture up-to-date information that can determine long term account stability or an impending departure therefore offering us an equipped understanding when seeking to manage a portfolio or serve individual customers

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset can be used to analyze the key factors that influence customer attrition. Analysts can use this dataset to understand customer demographics, spending patterns, and relationship with the credit card provider to better predict customer attrition.

    Research Ideas

    • Using the customer demographics, such as gender, marital status, education level and income category to determine which customer demographic is more likely to churn.
    • Analyzing the customer’s spending behavior leading up to churning and using this data to better predict the likelihood of a customer of churning in the future.
    • Creating a classifier that can predict potential customers who are more susceptible to attrition based on their credit score, credit limit, utilization ratio and other spending behavior metrics over time; this could be used as an early warning system for predicting potential attrition before it happens

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: BankChurners.csv | Column name | Description | |:---------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------| | CLIENTNUM | Unique identifier for each customer. (Integer) | | Attrition_Flag | Flag indicating whether or not the customer has churned out. (Boolean) | | Customer_Age | Age of customer. (Integer) | | Gender | Gender of customer. (String) | | Dependent_count | Number of dependents that customer has. (Integer) | | Education_Level ...

  3. Credit Card Customer Churn Prediction

    • kaggle.com
    Updated Feb 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avinash Bhardwaz (2024). Credit Card Customer Churn Prediction [Dataset]. https://www.kaggle.com/datasets/avinashbhardwaz/credit-card-customer-churn-prediction/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 6, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Avinash Bhardwaz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Avinash Bhardwaz

    Released under CC0: Public Domain

    Contents

  4. s4e1 Original Data Extended

    • kaggle.com
    Updated Jan 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samvel Kocharyan (2024). s4e1 Original Data Extended [Dataset]. https://www.kaggle.com/datasets/samvelkoch/s4e1-original-data-extended
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 23, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Samvel Kocharyan
    Description

    Bank Customer Churn Prediction dataset Source: https://huggingface.co/datasets/krisnadwipaj/customer-churn

    In comparison to the original dataset mentioned in S4E1 Playground Series Data description this dataset has 4 additional columns: Complain, Satisfaction Score, Card Type, Points Earned

    • Customer ID: A unique identifier for each customer
    • Surname:The customer's surname or last name
    • Credit Score: A numerical value representing the customer's credit score
    • Geography: The country where the customer resides
    • Gender:The customer's gender
    • Age: The customer's age.
    • Tenure: The number of years the customer has been with the bank
    • Balance:The customer's account balance
    • NumOfProducts: The number of bank products the customer uses (e.g., savings account, credit card)
    • HasCrCard: Whether the customer has a credit card
    • IsActiveMember: Whether the customer is an active member
    • EstimatedSalary: The estimated salary of the customer
    • **Exited: **Whether the customer has churned (Target Variable)
    • Complain: Whether the customer has complains
    • Satisfaction Score: Customer Satisfaction Score
    • Card Type: Type of card
    • Points Earned: Customer points

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10356799%2Fcd443895ee4018e0b563a722695cb2d6%2FScreenshot%202024-01-23%20at%2022.12.01.png?generation=1706044758886339&alt=media" alt="">

  5. A

    ‘Churn for Bank Customers’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Churn for Bank Customers’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-churn-for-bank-customers-7c12/09da12a2/?iid=026-795&v=presentation
    Explore at:
    Dataset updated
    Nov 12, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Churn for Bank Customers’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mathchi/churn-for-bank-customers on 12 November 2021.

    --- Dataset description provided by original source is as follows ---

    Content

    • RowNumber—corresponds to the record (row) number and has no effect on the output.
    • CustomerId—contains random values and has no effect on customer leaving the bank.
    • Surname—the surname of a customer has no impact on their decision to leave the bank.
    • CreditScore—can have an effect on customer churn, since a customer with a higher credit score is less likely to leave the bank.
    • Geography—a customer’s location can affect their decision to leave the bank.
    • Gender—it’s interesting to explore whether gender plays a role in a customer leaving the bank.
    • Age—this is certainly relevant, since older customers are less likely to leave their bank than younger ones.
    • Tenure—refers to the number of years that the customer has been a client of the bank. Normally, older clients are more loyal and less likely to leave a bank.
      • Balance—also a very good indicator of customer churn, as people with a higher balance in their accounts are less likely to leave the bank compared to those with lower balances.
      • NumOfProducts—refers to the number of products that a customer has purchased through the bank.
      • HasCrCard—denotes whether or not a customer has a credit card. This column is also relevant, since people with a credit card are less likely to leave the bank.
      • IsActiveMember—active customers are less likely to leave the bank.
      • EstimatedSalary—as with balance, people with lower salaries are more likely to leave the bank compared to those with higher salaries.
      • Exited—whether or not the customer left the bank.

    Acknowledgements

    As we know, it is much more expensive to sign in a new client than keeping an existing one.

    It is advantageous for banks to know what leads a client towards the decision to leave the company.

    Churn prevention allows companies to develop loyalty programs and retention campaigns to keep as many customers as possible.

    --- Original source retains full ownership of the source dataset ---

  6. Churn Modelling - Classification Training

    • kaggle.com
    Updated Jan 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aly El-badry (2025). Churn Modelling - Classification Training [Dataset]. https://www.kaggle.com/datasets/alyelbadry/churn-modelling-cluster-training
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aly El-badry
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Customer Churn Modelling

    This dataset provides comprehensive information about a bank's customers, focusing on their demographic, financial, and account activity details. It is designed to help analyze factors influencing customer churn and develop predictive models for customer retention strategies.

    Dataset Highlights:

    • Customer Demographics: Information such as Gender, Age, and Geographic Location (e.g., country) helps identify trends and patterns in churn across different customer segments.
    • Financial Data:
      • Credit Score: A measure of creditworthiness.
      • Balance: Account balance details for each customer.
      • Estimated Salary: Insights into customers' earning potential.
    • Account Features:
      • Number of Products: Count of products the customer is subscribed to (e.g., savings accounts, loans).
      • IsActiveMember: Indicates if the customer is actively using the bank’s services.
      • HasCrCard: Identifies customers with a credit card.
    • Churn Label: A binary indicator specifying whether the customer exited (1) or stayed (0).
    • Tenure: Duration (in years) the customer has been associated with the bank.

    Unique Features:

    • The dataset is highly structured, making it ideal for cluster tasks.
    • Balanced mix of numerical and categorical features, enabling both exploratory data analysis (EDA) and advanced machine learning models.
    • Offers insights into customer behavior and retention strategies.

    Suggested Use Cases:

    • Customer Retention Analysis: Explore demographic and financial factors influencing churn rates.
    • Predictive Modeling: Build machine learning models to predict churn and identify at-risk customers.
    • Business Insights: Develop strategies for targeted marketing or improving customer loyalty.
    • Feature Engineering: Generate new features to enhance prediction accuracy (e.g., balance-to-salary ratio).

    This dataset is perfect for beginners and professionals alike to explore customer churn prediction, develop insights, and create impactful business solutions.

  7. Data from: Bank Customer Churn Prediction

    • kaggle.com
    Updated Mar 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murilo Zangari (2024). Bank Customer Churn Prediction [Dataset]. https://www.kaggle.com/datasets/murilozangari/customer-churn-from-a-bank/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 21, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Murilo Zangari
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The data will be used to predict whether a customer of the bank will churn. If a customer churns, it means they left the bank and took their business elsewhere. If you can predict which customers are likely to churn, you can take measures to retain them before they do. These measures could be promotions, discounts, or other incentives to boost customer satisfaction and, therefore, retention.

    The dataset contains:

    10,000 rows – each row is a unique customer of the bank

    14 columns:

    RowNumber: Row numbers from 1 to 10,000

    CustomerId: Customer’s unique ID assigned by bank

    Surname: Customer’s last name

    CreditScore: Customer’s credit score. This number can range from 300 to 850.

    Geography: Customer’s country of residence

    Gender: Categorical indicator

    Age: Customer’s age (years)

    Tenure: Number of years customer has been with bank

    Balance: Customer’s bank balance (Euros)

    NumOfProducts: Number of products the customer has with the bank

    HasCrCard: Indicates whether the customer has a credit card with the bank

    IsActiveMember: Indicates whether the customer is considered active

    EstimatedSalary: Customer’s estimated annual salary (Euros)

    Exited: Indicates whether the customer churned (left the bank)

  8. Synthetic Telecom Customer Churn Data

    • kaggle.com
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulrahman Qaten (2025). Synthetic Telecom Customer Churn Data [Dataset]. https://www.kaggle.com/datasets/abdulrahmanqaten/synthetic-customer-churn/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 27, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abdulrahman Qaten
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    If you found the dataset useful, your upvote will help others discover it. Thanks for your support!

    This dataset simulates customer behavior for a fictional telecommunications company. It contains demographic information, account details, services subscribed to, and whether the customer ultimately churned (stopped using the service) or not. The data is synthetically generated but designed to reflect realistic patterns often found in telecom churn scenarios.

    Purpose:

    The primary goal of this dataset is to provide a clean and straightforward resource for beginners learning about:

    • Exploratory Data Analysis (EDA): Understanding customer characteristics and identifying potential drivers of churn through visualization and statistical summaries.
    • Data Preprocessing: Handling categorical features (like converting text to numbers) and scaling numerical features.
    • Classification Modeling: Building and evaluating simple machine learning models (like Logistic Regression or Decision Trees) to predict customer churn.

    Features:

    The dataset includes the following columns:

    • CustomerID: Unique identifier for each customer.
    • Age: Customer's age in years.
    • Gender: Customer's gender (Male/Female).
    • Location: General location of the customer (e.g., New York, Los Angeles).
    • SubscriptionDurationMonths: How many months the customer has been subscribed.
    • MonthlyCharges: The amount the customer is charged each month.
    • TotalCharges: The total amount the customer has been charged over their subscription period.
    • ContractType: The type of contract the customer has (Month-to-month, One year, Two year).
    • PaymentMethod: How the customer pays their bill (e.g., Electronic check, Credit card).
    • OnlineSecurity: Whether the customer has online security service (Yes, No, No internet service).
    • TechSupport: Whether the customer has tech support service (Yes, No, No internet service).
    • StreamingTV: Whether the customer has TV streaming service (Yes, No, No internet service).
    • StreamingMovies: Whether the customer has movie streaming service (Yes, No, No internet service).
    • Churn: (Target Variable) Whether the customer churned (1 = Yes, 0 = No).

    Data Quality:

    This dataset is intentionally clean with no missing values, making it easy for beginners to focus on analysis and modeling concepts without complex data cleaning steps.

    Inspiration:

    Understanding customer churn is crucial for many businesses. This dataset provides a sandbox environment to practice the fundamental techniques used in churn analysis and prediction.

  9. Credit card dataset for visualization

    • kaggle.com
    Updated Sep 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peachji (2023). Credit card dataset for visualization [Dataset]. https://www.kaggle.com/datasets/peachji/credit-card-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 30, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Peachji
    License

    https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/

    Description

    This dataset had adapted from 'Credit Card Churn Prediction: https://www.kaggle.com/datasets/anwarsan/credit-card-bank-churn ' for visualization in our university project. We have modified customer information, spending behavior, and also added revenue targets.

    Scenario 🕶️ In 2019, the marketing team launched a campaign to attract millennial customers (born 1980-1996) with the goal of increasing revenue and enhancing the brand's appeal to a younger audience.
    As the BI team, your task is to create a dashboard for users. 1. The Vice President of Sales wants to view the performance of the credit business. 2. The marketing team is interested in understanding customer segments and customer spending to measure Customer Lifetime Value (CLV) and Marketing Cost per Acquired Customer (MCAC).

    ⚠️Note: This is just a suggestion to guide the creation of the dashboard

    Example in Tableau

    Executive summary https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10099382%2F508a2d2d89dabdfd368743f86c2a71e1%2Fexecutive%20overview.JPG?generation=1696110593484137&alt=media" alt=""> Customer behavior https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10099382%2F1e4a1f62a25eab3c6707d002243894c7%2Fcustomer_behaviour.JPG?generation=1696110689732332&alt=media" alt="">

  10. Mobile Customer Churn Dataset

    • kaggle.com
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dyuti Dasmahaptra (2025). Mobile Customer Churn Dataset [Dataset]. https://www.kaggle.com/datasets/dyutidasmahaptra/mobile-customer-churn-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 22, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Dyuti Dasmahaptra
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Description This dataset contains information about 8,500+ mobile service customers, including demographic details, device usage, billing patterns, and call behavior. The primary goal of this dataset is to enable analysis and modeling to predict customer churn — i.e., customers who decide to drop their mobile service provider.

    The data includes 33 features and one binary target column (customer_dropped). This dataset is ideal for exploring churn prediction models, customer segmentation, lifetime value analysis, and marketing strategy development.

    Features - customer_id: Unique identifier for each customer - age: Age of the customer - job: Occupation or profession of the customer - urban_rural: Indicates whether the customer resides in an urban or rural area - marital_status: Marital status of the customer - kids: Number of children the customer has - disposable_income: Disposable income of the customer - mobiles_changed: Number of times the customer has changed their mobile device - mobile_age: Age of the current mobile device - own_smartphone: Indicates whether the customer owns a smartphone - current_mobile_price: Price of the customer's current mobile device - credit_card_type: Type of credit card held - own_house: Indicates whether the customer owns a house - own_cr_card: Indicates whether the customer owns a credit card - monthly_bill: Monthly bill for mobile service - call_mins: Total call minutes used - basic_plan_amount: Basic mobile plan amount - extra_mins: Extra minutes used beyond the plan - roam_call_mins: Roaming call minutes - call_mins_delta: Change in call minutes compared to the previous billing period - bill_amount_delta: Change in bill amount compared to the previous billing period - incoming_call_mins: Total incoming call minutes - outgoing_calls: Number of outgoing calls - incoming_calls: Number of incoming calls - day_night_call_ratio: Ratio of call minutes during the day versus night - day_night_call_delta: Change in day vs night call minutes compared to the previous period - calls_dropped: Number of calls dropped - loyalty_months: Customer tenure in months - complaint_calls: Number of complaint calls made - promo_calls_made: Number of promotional calls made - promo_offers_accepted: Number of promotional offers accepted - new_numbers_called: Number of new contacts called - customer_dropped: Target column indicating churn (1 = churned, 0 = retained)

    Use Cases - Develop machine learning models for churn prediction - Perform customer segmentation and behavioral profiling - Analyze call usage trends and billing sensitivity - Identify key drivers of customer loyalty or attrition - Design data-driven retention strategies

  11. 🛒 E-commerce Customer Data For Behavior Analysis

    • kaggle.com
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shriyash Jagtap (2023). 🛒 E-commerce Customer Data For Behavior Analysis [Dataset]. https://www.kaggle.com/datasets/shriyashjagtap/e-commerce-customer-for-behavior-analysis/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 15, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shriyash Jagtap
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Data Description:

    The "E-commerce Customer Behavior and Purchase Dataset" is a synthetic dataset generated using the Faker Python library. It simulates a comprehensive e-commerce environment, capturing various aspects of customer behavior and purchase history within a digital marketplace. This dataset has been designed for data analysis and predictive modeling in the field of e-commerce. It is suitable for tasks such as customer churn prediction, market basket analysis, recommendation systems, and trend analysis.

    Column Information:

    The dataset contains the following columns:

    Customer ID: A unique identifier for each customer. Customer Name: The name of the customer (generated by Faker). Customer Age: The age of the customer (generated by Faker). Gender: The gender of the customer (generated by Faker). Purchase Date: The date of each purchase made by the customer. Product Category: The category or type of the purchased product. Product Price: The price of the purchased product. Quantity: The quantity of the product purchased. Total Purchase Amount: The total amount spent by the customer in each transaction. Payment Method: The method of payment used by the customer (e.g., credit card, PayPal). Returns: Whether the customer returned any products from the order (binary: 0 for no return, 1 for return). Churn: A binary column indicating whether the customer has churned (0 for retained, 1 for churned).

    Note:

  12. Credit Card Fraud Dataset

    • kaggle.com
    Updated Jan 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vishal Painjane (2025). Credit Card Fraud Dataset [Dataset]. https://www.kaggle.com/datasets/vishalpainjane/dataset101
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vishal Painjane
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Credit risk assessment remains a critical function within financial services, influencing lending decisions, portfolio risk management, and regulatory compliance. It integrates multiple categories of financial, transactional, and behavioral data to enable advanced machine learning applications in the domain of financial risk modeling.

    Data Composition and Structure

    The dataset comprises a total of 1,212 distinct features, systematically grouped into four principal categories, alongside a binary target variable. Each feature category represents a specific dimension of credit risk assessment, reflecting both internal transactional data and externally sourced credit bureau information.

    Target Variable

    The dependent variable, denoted as bad_flag, represents a binary risk classification outcome associated with each customer account. The variable takes the following values:

    • 0: Denotes a low-risk, creditworthy customer
    • 1: Denotes a high-risk, default-prone customer

    This variable serves as the target for binary classification models aimed at predicting credit risk propensity.

    Feature Groups

    CategoryNumber of FeaturesDescription
    Transaction Attributes664Customer-level transaction behavior, repayment patterns, financial habits
    Bureau Credit Data452Credit scores, external bureau records, delinquency flags, historical credit data
    Bureau Enquiries50Credit inquiry history, frequency and type of external credit applications
    ONUS Attributes48Internal bank relationship metrics, account engagement indicators

    Each feature within a category follows a systematic sequential naming convention (e.g., transaction_attribute_1, bureau_1), facilitating programmatic identification and group-level analysis.

    Data Characteristics

    The dataset exhibits several characteristics that mirror operational credit risk data environments:

    • High Dimensionality: The feature space exceeds 1,200 variables
    • Mixed Data Types: Numerical values (continuous and discrete), binary indicators
    • High Sparsity: A substantial proportion of features contain zero values or missing entries
    • Value Range Disparity: Feature values exhibit significant variance, with magnitudes ranging from small ratios (0.001) to large transaction amounts (288,500)

    Methodological Rationale

    The dataset was constructed by simulating data generation processes typical within financial services institutions. Transactional behaviors, bureau records, and inquiry histories were aggregated and engineered into derivative features.

  13. scikit-survival

    • kaggle.com
    Updated Feb 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AnthonyTherrien (2025). scikit-survival [Dataset]. https://www.kaggle.com/datasets/anthonytherrien/scikit-survival/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    AnthonyTherrien
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📝 Overview

    This dataset provides the scikit-survival 0.23.1 Python package in .whl format, enabling users to perform survival analysis using machine learning techniques. scikit-survival is a powerful library that extends scikit-learn to handle censored data, commonly encountered in medical research, reliability engineering, and event-time prediction tasks.

    📥 Installation

    To install the package, first, download the .whl file from this Kaggle dataset. Then, install it using pip:

    pip install scikit_survival-0.23.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    

    Ensure that you have Python 3.13 installed, as this wheel is built specifically for that version.

    🔬 Features

    • Kaplan-Meier and Cox Proportional Hazards models
    • Random survival forests for non-linear survival relationships
    • Concordance index for model evaluation
    • Integration with scikit-learn for easy model training and validation
    • Handling of right-censored data for accurate event-time predictions

    🏥 Use Cases

    • Medical research: Predict patient survival times based on clinical features.
    • Reliability engineering: Estimate the lifespan of mechanical components.
    • Churn prediction: Analyze customer retention and attrition timelines.
    • Credit risk modeling: Assess time until loan default.
  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bhuvi Ranga (2023). Bank Customer Churn Dataset [Dataset]. https://www.kaggle.com/datasets/bhuviranga/customer-churn-data
Organization logo

Bank Customer Churn Dataset

The customer churn dataset for churn prediction. Predictive Analysis

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bhuvi Ranga
License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

The customer churn dataset is a collection of customer data that focuses on predicting customer churn, which refers to the tendency of customers to stop using a company's products or services. The dataset contains various features that describe each customer, such as their credit score, country, gender, age, tenure, balance, number of products, credit card status, active membership, estimated salary, and churn status. The churn status indicates whether a customer has churned or not. The dataset is used to analyze and understand factors that contribute to customer churn and to build predictive models to identify customers at risk of churning. The goal is to develop strategies and interventions to reduce churn and improve customer retention

Search
Clear search
Close search
Google apps
Main menu