Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
259
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The customer churn dataset is a collection of customer data that focuses on predicting customer churn, which refers to the tendency of customers to stop using a company's products or services. The dataset contains various features that describe each customer, such as their credit score, country, gender, age, tenure, balance, number of products, credit card status, active membership, estimated salary, and churn status. The churn status indicates whether a customer has churned or not. The dataset is used to analyze and understand factors that contribute to customer churn and to build predictive models to identify customers at risk of churning. The goal is to develop strategies and interventions to reduce churn and improve customer retention
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset belongs to a leading online E-commerce company. The company wants to identify customers who are likely to churn, so they can proactively approach these customers with promotional offers.
The dataset contains various features related to customer behavior and characteristics, which can be used to predict customer churn.
The main task is to predict customer churn based on the given features. This is a binary classification problem where the target variable is 'Churn'.
This dataset is provided for educational purposes. While it represents a real-world scenario, the data itself may be simulated or anonymized.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Nowadays
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Customer Churn’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hassanamin/customer-churn on 14 February 2022.
--- Dataset description provided by original source is as follows ---
A marketing agency has many customers that use their service to produce ads for the client/customer websites. They've noticed that they have quite a bit of churn in clients. They basically randomly assign account managers right now, but want you to create a machine learning model that will help predict which customers will churn (stop buying their service) so that they can correctly assign the customers most at risk to churn an account manager. Luckily they have some historical data, can you help them out? Create a classification algorithm that will help classify whether or not a customer churned. Then the company can test this against incoming data for future customers to predict which customers will churn and assign them an account manager.
The data is saved as customer_churn.csv. Here are the fields and their definitions:
Name : Name of the latest contact at Company
Age: Customer Age
Total_Purchase: Total Ads Purchased
Account_Manager: Binary 0=No manager, 1= Account manager assigned
Years: Totaly Years as a customer
Num_sites: Number of websites that use the service.
Onboard_date: Date that the name of the latest contact was onboarded
Location: Client HQ Address
Company: Name of Client Company
Once you've created the model and evaluated it, test out the model on some new data (you can think of this almost like a hold-out set) that your client has provided, saved under new_customers.csv. The client wants to know which customers are most likely to churn given this data (they don't have the label yet).
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset originates from the research domain of Customer Churn Prediction in the Telecom Industry. It was created as part of the project "Data-Driven Churn Prediction: ML Solutions for the Telecom Industry," completed within the Data Stewardship course (Master programme Data Science, TU Wien).
The primary purpose of this dataset is to support machine learning model development for predicting customer churn based on customer demographics, service usage, and account information.
The dataset enables the training, testing, and evaluation of classification algorithms, allowing researchers and practitioners to explore techniques for customer retention optimization.
The dataset was originally obtained from the IBM Accelerator Catalog and adapted for academic use. It was uploaded to TU Wien’s DBRepo test system and accessed via SQLAlchemy connections to the MariaDB environment.
The dataset has a tabular structure and was initially stored in CSV format. It contains:
Rows: 7,043 customer records
Columns: 21 features including customer attributes (gender, senior citizen status, partner status), account information (tenure, contract type, payment method), service usage (internet service, streaming TV, tech support), and the target variable (Churn: Yes/No).
Naming Convention:
The table in the database is named telco_customer_churn_data
.
Software Requirements:
To open and work with the dataset, any standard database client or programming language supporting MariaDB connections can be used (e.g., Python etc).
For machine learning applications, libraries such as pandas
, scikit-learn
, and joblib
are typically used.
Additional Resources:
Source code for data loading, preprocessing, model training, and evaluation is available at the associated GitHub repository: https://github.com/nazerum/fair-ml-customer-churn
When reusing the dataset, users should be aware:
Licensing: The dataset is shared under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Use Case Suitability: The dataset is best suited for classification tasks, particularly binary classification (churn vs. no churn).
Metadata Standards: Metadata describing the dataset adheres to FAIR principles and is supplemented by CodeMeta and Croissant standards for improved interoperability.
This data was imported from the zindi platform in the context of competition and here is the link to the competition The objective of the competition is to develop a predictive model that determines the likelihood for a customer to churn - to stop purchasing airtime and data from Expresso.
The data describes 2.5 million Expresso clients. * Train.csv - contains information about 2 million customers. There is a column called CHURN that indicates if a client churned or did not churn. This is the target. You must estimate the likelihood that these clients churned. You will use this file to train your model. * Test.csv - is similar to train, but without the Churn column. You will use this file to test your model on. * SampleSubmission.csv - is an example of what your submission should look like. The order of the rows does not matter but the name of the user_id must be correct.
A certain premium club boasts a large customer membership. The members pay an annual membership fee in return for using the exclusive facilities offered by this club. The fees are customized for every member's personal package. In the last few years, however, the club has been facing an issue with a lot of members cancelling their memberships. The club management plans to address this issue by proactively addressing customer grievances. They, however, do not have enough bandwidth to reach out to the entire customer base individually and are looking to see whether a statistical approach can help them identify customers at risk. Can you help them ? Relevant historical data is provided in the “club_churn_train.csv”
Club Data Set
Club Data Set
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
COFINFAD: Colombian Fintech Financial Analytics Dataset
COFINFAD (Colombian Fintech Financial Analytics Dataset) is a dataset containing almost 12 months of transactional and demographic data from an anonymous Colombian fintech company. This dataset is designed to facilitate research in customer behavior analysis, churn prediction, and financial pattern recognition in the Latin American fintech sector.
Files
customer_data.csv: Contains demographic, behavioral… See the full description on the dataset page: https://huggingface.co/datasets/luisdavidtrejosrojas/cofinfad.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
🚀**# BCG Data Science Job Simulation | Forage** This notebook focuses on feature engineering techniques to enhance a dataset for churn prediction modeling. As part of the BCG Data Science Job Simulation, I transformed raw customer data into valuable features to improve predictive performance.
📊 What’s Inside? ✅ Data Cleaning: Removing irrelevant columns to reduce noise ✅ Date-Based Feature Extraction: Converting raw dates into useful insights like activation year, contract length, and renewal month ✅ New Predictive Features:
consumption_trend → Measures if a customer’s last-month usage is increasing or decreasing total_gas_and_elec → Aggregates total energy consumption ✅ Final Processed Dataset: Ready for churn prediction modeling
📂Dataset Used: 📌 clean_data_after_eda.csv → Original dataset after Exploratory Data Analysis (EDA) 📌 clean_data_with_new_features.csv → Final dataset after feature engineering
🛠 Technologies Used: 🔹 Python (Pandas, NumPy) 🔹 Data Preprocessing & Feature Engineering
🌟 Why Feature Engineering? Feature engineering is one of the most critical steps in machine learning. Well-engineered features improve model accuracy and uncover deeper insights into customer behavior.
🚀 This notebook is a great reference for anyone learning data preprocessing, feature selection, and predictive modeling in Data Science!
📩 Connect with Me: 🔗 GitHub Repo: https://github.com/Pavitr-Swain/BCG-Data-Science-Job-Simulation 💼 LinkedIn: https://www.linkedin.com/in/pavitr-kumar-swain-ab708b227/
🔍 Let’s explore churn prediction insights together! 🎯
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains information about customer activity and demographics related to an airline's loyalty program, including a promotional campaign aimed at enhancing program enrollment.
Field | Description |
---|---|
Loyalty Number | Customer's unique loyalty number |
Year | Year of the period |
Month | Month of the period |
Flights Booked | Number of flights booked for member only in the period |
Flights with Companions | Number of flights booked with additional passengers in the period |
Total Flights | Sum of Flights Booked and Flights with Companions |
Distance | Flight distance traveled in the period (km) |
Points Accumulated | Loyalty points accumulated in the period |
Points Redeemed | Loyalty points redeemed in the period |
Dollar Cost Points Redeemed | Dollar equivalent for points redeemed in the period in CDN |
Field | Description |
---|---|
Loyalty Number | Customer's unique loyalty number |
Country | Country of residence |
Province | Province of residence |
City | City of residence |
Postal Code | Postal code of residence |
Gender | Gender |
Education | Highest education level (High school or lower > College > Bachelor > Master > Doctor) |
Salary | Annual income |
Marital Status | Marital status (Single, Married, Divorced) |
Loyalty Card | Loyalty card status (Star > Nova > Aurora) |
CLV | Customer lifetime value - total invoice value for all flights ever booked by member |
Enrollment Type | Enrollment type (Standard / 2018 Promotion) |
Enrollment Year | Year Member enrolled in membership program |
Enrollment Month | Month Member enrolled in membership program |
Cancellation Year | Year Member cancelled their membership |
Cancellation Month | Month Member cancelled their membership |
The airline implemented a promotional campaign (2018 Promotion) aimed at enhancing program enrollment. The dataset encompasses information regarding: - Customer flight activity and loyalty points - Program signups and enrollment details - Cancellations within the loyalty program - Comprehensive customer demographics
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
259