http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The customer churn dataset is a collection of customer data that focuses on predicting customer churn, which refers to the tendency of customers to stop using a company's products or services. The dataset contains various features that describe each customer, such as their credit score, country, gender, age, tenure, balance, number of products, credit card status, active membership, estimated salary, and churn status. The churn status indicates whether a customer has churned or not. The dataset is used to analyze and understand factors that contribute to customer churn and to build predictive models to identify customers at risk of churning. The goal is to develop strategies and interventions to reduce churn and improve customer retention
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This Synthetic Customer Churn Prediction Dataset has been designed as an educational resource for exploring data science, machine learning, and predictive modelling techniques in a customer retention context. The dataset simulates key attributes relevant to customer churn analysis, such as service usage, contract details, and customer demographics. It allows users to practice data manipulation, visualization, and the development of models to predict churn behaviour in industries like telecommunications, subscription services, or utilities.
https://storage.googleapis.com/opendatabay_public/images/churn_c4aae9d4-3939-4866-a249-35d81c5965dc.png" alt="Synthetic Customer Churn Prediction Dataset Distribution">
This dataset is useful for a variety of applications, including:
This dataset is synthetic and anonymized, making it a safe tool for experimentation and learning without compromising real patient privacy.
CCO (Public Domain)
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The data will be used to predict whether a customer of the bank will churn. If a customer churns, it means they left the bank and took their business elsewhere. If you can predict which customers are likely to churn, you can take measures to retain them before they do. These measures could be promotions, discounts, or other incentives to boost customer satisfaction and, therefore, retention.
The dataset contains:
10,000 rows â each row is a unique customer of the bank
14 columns:
RowNumber: Row numbers from 1 to 10,000
CustomerId: Customerâs unique ID assigned by bank
Surname: Customerâs last name
CreditScore: Customerâs credit score. This number can range from 300 to 850.
Geography: Customerâs country of residence
Gender: Categorical indicator
Age: Customerâs age (years)
Tenure: Number of years customer has been with bank
Balance: Customerâs bank balance (Euros)
NumOfProducts: Number of products the customer has with the bank
HasCrCard: Indicates whether the customer has a credit card with the bank
IsActiveMember: Indicates whether the customer is considered active
EstimatedSalary: Customerâs estimated annual salary (Euros)
Exited: Indicates whether the customer churned (left the bank)
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
If you found the dataset useful, your upvote will help others discover it. Thanks for your support!
This dataset simulates customer behavior for a fictional telecommunications company. It contains demographic information, account details, services subscribed to, and whether the customer ultimately churned (stopped using the service) or not. The data is synthetically generated but designed to reflect realistic patterns often found in telecom churn scenarios.
Purpose:
The primary goal of this dataset is to provide a clean and straightforward resource for beginners learning about:
Features:
The dataset includes the following columns:
CustomerID
: Unique identifier for each customer.Age
: Customer's age in years.Gender
: Customer's gender (Male/Female).Location
: General location of the customer (e.g., New York, Los Angeles).SubscriptionDurationMonths
: How many months the customer has been subscribed.MonthlyCharges
: The amount the customer is charged each month.TotalCharges
: The total amount the customer has been charged over their subscription period.ContractType
: The type of contract the customer has (Month-to-month, One year, Two year).PaymentMethod
: How the customer pays their bill (e.g., Electronic check, Credit card).OnlineSecurity
: Whether the customer has online security service (Yes, No, No internet service).TechSupport
: Whether the customer has tech support service (Yes, No, No internet service).StreamingTV
: Whether the customer has TV streaming service (Yes, No, No internet service).StreamingMovies
: Whether the customer has movie streaming service (Yes, No, No internet service).Churn
: (Target Variable) Whether the customer churned (1 = Yes, 0 = No).Data Quality:
This dataset is intentionally clean with no missing values, making it easy for beginners to focus on analysis and modeling concepts without complex data cleaning steps.
Inspiration:
Understanding customer churn is crucial for many businesses. This dataset provides a sandbox environment to practice the fundamental techniques used in churn analysis and prediction.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of âChurn for Bank Customersâ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mathchi/churn-for-bank-customers on 28 January 2022.
--- Dataset description provided by original source is as follows ---
As we know, it is much more expensive to sign in a new client than keeping an existing one.
It is advantageous for banks to know what leads a client towards the decision to leave the company.
Churn prevention allows companies to develop loyalty programs and retention campaigns to keep as many customers as possible.
--- Original source retains full ownership of the source dataset ---
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
This dataset had adapted from 'Credit Card Churn Prediction: https://www.kaggle.com/datasets/anwarsan/credit-card-bank-churn ' for visualization in our university project. We have modified customer information, spending behavior, and also added revenue targets.
Scenario đ¶ïž
In 2019, the marketing team launched a campaign to attract millennial customers (born 1980-1996) with the goal of increasing revenue and enhancing the brand's appeal to a younger audience.
As the BI team, your task is to create a dashboard for users.
1. The Vice President of Sales wants to view the performance of the credit business.
2. The marketing team is interested in understanding customer segments and customer spending to measure Customer Lifetime Value (CLV) and Marketing Cost per Acquired Customer (MCAC).
â ïžNote: This is just a suggestion to guide the creation of the dashboard
Example in Tableau
Executive summary
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10099382%2F508a2d2d89dabdfd368743f86c2a71e1%2Fexecutive%20overview.JPG?generation=1696110593484137&alt=media" alt="">
Customer behavior
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10099382%2F1e4a1f62a25eab3c6707d002243894c7%2Fcustomer_behaviour.JPG?generation=1696110689732332&alt=media" alt="">
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Bank Customer Churn Dataset is a collection of data related to customers of a bank who have either left (churned) or stayed with the bank. This dataset is typically used for predictive modeling to identify patterns and factors that lead to customer churn, enabling banks to take proactive measures to retain customers.
id: Unique identifier for each customer.
CustomerId: Unique identifier for the customer account.
Surname: Last name of the customer.
CreditScore: Numeric representation of the customer's creditworthiness.
Geography:str, Gender:str:Country or region where the customer resides ,Gender of the customer (e.g., Male, Female).
Age: Age of the customer.
Tenure: Number of years the customer has been with the bank.
Balance: Current balance in the customer's account.
NumOfProducts: Number of bank products the customer uses.
HasCrCard: Binary indicator (0 or 1) for whether the customer has a credit card.
IsActiveMember: Binary indicator (0 or 1) for whether the customer is an active member.
EstimatedSalary: Estimated salary of the customer.
Exited: Binary indicator (0 or 1) for whether the customer has churned (the target).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Credit risk assessment remains a critical function within financial services, influencing lending decisions, portfolio risk management, and regulatory compliance. It integrates multiple categories of financial, transactional, and behavioral data to enable advanced machine learning applications in the domain of financial risk modeling.
The dataset comprises a total of 1,212 distinct features, systematically grouped into four principal categories, alongside a binary target variable. Each feature category represents a specific dimension of credit risk assessment, reflecting both internal transactional data and externally sourced credit bureau information.
The dependent variable, denoted as bad_flag, represents a binary risk classification outcome associated with each customer account. The variable takes the following values:
This variable serves as the target for binary classification models aimed at predicting credit risk propensity.
Category | Number of Features | Description |
---|---|---|
Transaction Attributes | 664 | Customer-level transaction behavior, repayment patterns, financial habits |
Bureau Credit Data | 452 | Credit scores, external bureau records, delinquency flags, historical credit data |
Bureau Enquiries | 50 | Credit inquiry history, frequency and type of external credit applications |
ONUS Attributes | 48 | Internal bank relationship metrics, account engagement indicators |
Each feature within a category follows a systematic sequential naming convention (e.g., transaction_attribute_1
, bureau_1
), facilitating programmatic identification and group-level analysis.
The dataset exhibits several characteristics that mirror operational credit risk data environments:
The dataset was constructed by simulating data generation processes typical within financial services institutions. Transactional behaviors, bureau records, and inquiry histories were aggregated and engineered into derivative features.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides simulated retail transaction data, offering valuable insights into customer purchasing behaviour and store operations. It is designed to facilitate market basket analysis, customer segmentation, and a variety of other retail analytics tasks. Each row captures detailed transaction information, including a unique identifier, the date and time of purchase, customer details, a list of purchased products, total items, total cost, payment method, and location details such as city and store type. Furthermore, it includes indicators for discounts and promotions applied, along with a customer category based on background or age group, and the season of purchase. This dataset is entirely synthetic, generated using the Python Faker library, making it a safe and versatile resource for researchers, data scientists, and analysts to develop and test algorithms, models, and analytical tools without using real customer data.
This dataset is typically provided in a CSV file format. It contains approximately 1 million individual transaction records. The data spans a time range from 2020-01-01 to 2024-05-19. There are 329,738 unique customer names and 571,947 unique product entries. Payment methods are distributed with 25% Cash, 25% Debit Card, and 50% Other. Transaction locations include Boston (10%), Dallas (10%), and other cities (80%). Store types are categorised as Supermarket (17%), Pharmacy (17%), and other types (67%). Discounts were applied to approximately 50% of the transactions.
This dataset is ideally suited for: * Market Basket Analysis: Uncovering associations between products and identifying common buying patterns. * Customer Segmentation: Grouping customers based on their purchasing behaviour to target specific offers. * Pricing Optimisation: Developing strategies to optimise pricing and identify opportunities for discounts and promotions. * Retail Analytics: Analysing overall store performance and emerging customer trends. * Algorithmic Development: Testing and refining machine learning models for retail forecasting or recommendation systems.
The dataset's geographic coverage includes transactions from various cities, such as Boston and Dallas, representing a broad, though simulated, global scope. The time range of the transactions extends from 1st January 2020 to 19th May 2024. Demographic insights are provided through the Customer_Category column, which classifies customers based on background or age group, allowing for demographic-based analyses. As a synthetic dataset, specific real-world demographic notes are not applicable.
CC0
This dataset is beneficial for a wide range of users, including: * Researchers: For academic studies on consumer behaviour and retail economics. * Data Scientists: To develop and validate predictive models, such as recommender systems or churn prediction models. * Analysts: For performing in-depth retail analytics, market basket analysis, and customer segmentation to inform business decisions. * Students: As a practical, realistic dataset for learning and applying data analysis techniques in a retail context.
Original Dat
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides the scikit-survival 0.23.1
Python package in .whl
format, enabling users to perform survival analysis using machine learning techniques. scikit-survival
is a powerful library that extends scikit-learn
to handle censored data, commonly encountered in medical research, reliability engineering, and event-time prediction tasks.
To install the package, first, download the .whl
file from this Kaggle dataset. Then, install it using pip
:
pip install scikit_survival-0.23.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Ensure that you have Python 3.13 installed, as this wheel is built specifically for that version.
scikit-learn
for easy model training and validation Not seeing a result you expected?
Learn how you can add new datasets to our index.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The customer churn dataset is a collection of customer data that focuses on predicting customer churn, which refers to the tendency of customers to stop using a company's products or services. The dataset contains various features that describe each customer, such as their credit score, country, gender, age, tenure, balance, number of products, credit card status, active membership, estimated salary, and churn status. The churn status indicates whether a customer has churned or not. The dataset is used to analyze and understand factors that contribute to customer churn and to build predictive models to identify customers at risk of churning. The goal is to develop strategies and interventions to reduce churn and improve customer retention