10 datasets found

Telco Customer Churn
kaggle.com
zip
Updated Feb 23, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BlastChar (2018). Telco Customer Churn [Dataset]. https://www.kaggle.com/datasets/blastchar/telco-customer-churn
Explore at:
zip(175758 bytes)Available download formats
Dataset updated
Feb 23, 2018
Authors
BlastChar
Description
Context

"Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]

Content

Each row represents a customer, each column contains customer’s attributes described on the column Metadata.

The data set includes information about:

Customers who left within the last month – the column is called Churn

Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies

Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges

Demographic info about customers – gender, age range, and if they have partners and dependents

Inspiration

To explore this type of models and learn more about the subject.

New version from IBM: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113
Bank Customer Churn
kaggle.com
zip
Updated Aug 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandile Desmond Mfazi (2024). Bank Customer Churn [Dataset]. https://www.kaggle.com/datasets/sandiledesmondmfazi/bank-customer-churn
Explore at:
zip(12679114 bytes)Available download formats
Dataset updated
Aug 8, 2024
Authors
Sandile Desmond Mfazi
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Botswana Bank Customer Churn Dataset

Dataset Overview

This synthetic dataset simulates customer data for a fictional bank in Botswana, specifically designed to model customer churn behavior. It includes a comprehensive set of customer demographics, financial data, product usage, and behavioral indicators that could influence whether a customer decides to leave the bank. The dataset is generated using the Python Faker library, ensuring realistic but entirely fictional data points for educational, testing, and modeling purposes.

Dataset Highlights

Number of Records: 115,640 customers Churn Rate: Determined by a calculated churn risk score based on several customer attributes Geographical Focus: Botswana Data Structure: The dataset is organized in a tabular format, with each row representing a unique customer

Use Cases

This dataset is ideal for the following applications:

Churn Prediction Modeling: Building and evaluating machine learning models to predict customer churn. Customer Segmentation: Analyzing customer profiles and segmenting them based on various demographics and financial attributes. Product Analysis: Understanding which products are most associated with customer retention or churn. Educational Purposes: Teaching data science and machine learning concepts using a realistic dataset.
S1 Data -
plos.figshare.com
zip
Updated Oct 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang (2023). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0292466.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292466.s001
Dataset updated
Oct 11, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analyzing customers’ characteristics and giving the early warning of customer churn based on machine learning algorithms, can help enterprises provide targeted marketing strategies and personalized services, and save a lot of operating costs. Data cleaning, oversampling, data standardization and other preprocessing operations are done on 900,000 telecom customer personal characteristics and historical behavior data set based on Python language. Appropriate model parameters were selected to build BPNN (Back Propagation Neural Network). Random Forest (RF) and Adaboost, the two classic ensemble learning models were introduced, and the Adaboost dual-ensemble learning model with RF as the base learner was put forward. The four models and the other four classical machine learning models-decision tree, naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) were utilized respectively to analyze the customer churn data. The results show that the four models have better performance in terms of recall rate, precision rate, F1 score and other indicators, and the RF-Adaboost dual-ensemble model has the best performance. Among them, the recall rates of BPNN, RF, Adaboost and RF-Adaboost dual-ensemble model on positive samples are respectively 79%, 90%, 89%,93%, the precision rates are 97%, 99%, 98%, 99%, and the F1 scores are 87%, 95%, 94%, 96%. The RF-Adaboost dual-ensemble model has the best performance, and the three indicators are 10%, 1%, and 6% higher than the reference. The prediction results of customer churn provide strong data support for telecom companies to adopt appropriate retention strategies for pre-churn customers and reduce customer churn.
Telecom Data Analysis(Customer Churn Analysis)
kaggle.com
zip
Updated Jun 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajdeep Kaur Bajwa (2022). Telecom Data Analysis(Customer Churn Analysis) [Dataset]. https://www.kaggle.com/datasets/rajdeepkaurbajwa/telecom-data-analysis
Explore at:
zip(245420 bytes)Available download formats
Dataset updated
Jun 25, 2022
Authors
Rajdeep Kaur Bajwa
Description
Dataset

This dataset was created by Rajdeep Kaur Bajwa

Contents
Customer Churn Prediction
kaggle.com
zip
Updated May 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rashad Mammadov (2024). Customer Churn Prediction [Dataset]. https://www.kaggle.com/datasets/rashadrmammadov/customer-churn-dataset/code
Explore at:
zip(121952 bytes)Available download formats
Dataset updated
May 29, 2024
Authors
Rashad Mammadov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
File Description:

The dataset contains information about customers and their churn status. Each row represents a customer, and each column contains customer attributes and information.

Column Descriptions:

customerID: Unique identifier for each customer.

gender: Gender of the customer (Male, Female).

SeniorCitizen: Whether the customer is a senior citizen or not (1: Yes, 0: No).

Partner: Whether the customer has a partner or not (Yes, No).

Dependents: Whether the customer has dependents or not (Yes, No).

tenure: Number of months the customer has stayed with the company.

PhoneService: Whether the customer has a phone service or not (Yes, No).

MultipleLines: Whether the customer has multiple lines or not (Yes, No, No phone service).

InternetService: Type of internet service the customer has (DSL, Fiber optic, No).

OnlineSecurity: Whether the customer has online security or not (Yes, No, No internet service).

OnlineBackup: Whether the customer has online backup or not (Yes, No, No internet service).

DeviceProtection: Whether the customer has device protection or not (Yes, No, No internet service).

TechSupport: Whether the customer has tech support or not (Yes, No, No internet service).

StreamingTV: Whether the customer has streaming TV or not (Yes, No, No internet service).

StreamingMovies: Whether the customer has streaming movies or not (Yes, No, No internet service).

Contract: The contract term of the customer (Month-to-month, One year, Two year).

PaperlessBilling: Whether the customer has paperless billing or not (Yes, No).

PaymentMethod: The payment method of the customer (Electronic check, Mailed check, Bank transfer, Credit card).

MonthlyCharges: The amount charged to the customer monthly.

TotalCharges: The total amount charged to the customer.

Churn: Whether the customer churned or not (Yes, No).
Confusion matrix.
plos.figshare.com
xls
Updated Oct 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang (2023). Confusion matrix. [Dataset]. http://doi.org/10.1371/journal.pone.0292466.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292466.t005
Dataset updated
Oct 11, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analyzing customers’ characteristics and giving the early warning of customer churn based on machine learning algorithms, can help enterprises provide targeted marketing strategies and personalized services, and save a lot of operating costs. Data cleaning, oversampling, data standardization and other preprocessing operations are done on 900,000 telecom customer personal characteristics and historical behavior data set based on Python language. Appropriate model parameters were selected to build BPNN (Back Propagation Neural Network). Random Forest (RF) and Adaboost, the two classic ensemble learning models were introduced, and the Adaboost dual-ensemble learning model with RF as the base learner was put forward. The four models and the other four classical machine learning models-decision tree, naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) were utilized respectively to analyze the customer churn data. The results show that the four models have better performance in terms of recall rate, precision rate, F1 score and other indicators, and the RF-Adaboost dual-ensemble model has the best performance. Among them, the recall rates of BPNN, RF, Adaboost and RF-Adaboost dual-ensemble model on positive samples are respectively 79%, 90%, 89%,93%, the precision rates are 97%, 99%, 98%, 99%, and the F1 scores are 87%, 95%, 94%, 96%. The RF-Adaboost dual-ensemble model has the best performance, and the three indicators are 10%, 1%, and 6% higher than the reference. The prediction results of customer churn provide strong data support for telecom companies to adopt appropriate retention strategies for pre-churn customers and reduce customer churn.
churn_modelling
kaggle.com
zip
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manasvi Kirti (2025). churn_modelling [Dataset]. https://www.kaggle.com/datasets/manasvikirti/churn-modelling
Explore at:
zip(267787 bytes)Available download formats
Dataset updated
Jun 27, 2025
Authors
Manasvi Kirti
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset contains information on customer demographics, account details, and service usage patterns to analyze and predict customer churn. It is commonly used in churn modeling projects to develop machine learning models that classify whether a customer is likely to leave (churn) or stay. The dataset is suitable for tasks such as Exploratory Data Analysis (EDA), feature engineering, model training, and evaluation.

Key Features May Include:

CustomerID: Unique identifier for each customer

Gender, Age: Demographic details

Tenure: Number of months the customer has stayed

Balance, EstimatedSalary: Financial features

IsActiveMember, HasCrCard: Behavioral indicators

Exited: Target variable indicating churn (1 = churned, 0 = retained)
Data from: scikit-survival
kaggle.com
zip
Updated Feb 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AnthonyTherrien (2025). scikit-survival [Dataset]. https://www.kaggle.com/anthonytherrien/scikit-survival
Explore at:
zip(3684823 bytes)Available download formats
Dataset updated
Feb 8, 2025
Authors
AnthonyTherrien
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
📝 Overview

This dataset provides the scikit-survival 0.23.1 Python package in .whl format, enabling users to perform survival analysis using machine learning techniques. scikit-survival is a powerful library that extends scikit-learn to handle censored data, commonly encountered in medical research, reliability engineering, and event-time prediction tasks.

📥 Installation

To install the package, first, download the .whl file from this Kaggle dataset. Then, install it using pip:

pip install scikit_survival-0.23.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Ensure that you have Python 3.13 installed, as this wheel is built specifically for that version.

🔬 Features

Kaplan-Meier and Cox Proportional Hazards models

Random survival forests for non-linear survival relationships

Concordance index for model evaluation

Integration with scikit-learn for easy model training and validation

Handling of right-censored data for accurate event-time predictions

🏥 Use Cases

Medical research: Predict patient survival times based on clinical features.

Reliability engineering: Estimate the lifespan of mechanical components.

Churn prediction: Analyze customer retention and attrition timelines.

Credit risk modeling: Assess time until loan default.

E-commerce_dataset

kaggle.com

zip

Updated Nov 16, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Abhay Ayare (2025). E-commerce_dataset [Dataset]. https://www.kaggle.com/datasets/abhayayare/e-commerce-dataset

Explore at:

zip(644123 bytes)Available download formats

Dataset updated

Nov 16, 2025

Authors

Abhay Ayare

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

E-commerce_dataset

This dataset is a synthetic yet realistic E-commerce retail dataset generated programmatically using Python (Faker + NumPy + Pandas).
It is designed to closely mimic real-world online shopping behavior, user patterns, product interactions, seasonal trends, and marketplace events.

You can use this dataset for:

Machine Learning & Deep Learning
Recommender Systems
Customer Segmentation
Sales Forecasting
A/B Testing
E-commerce Behaviour Analysis
Data Cleaning / Feature Engineering Practice
SQL practice

📁Dataset Contents

The dataset contains 6 CSV files: ~~~ File Rows Description users.csv ~10,000 User profiles, demographics & signup info products.csv ~2,000 Product catalog with rating and pricing orders.csv ~20,000 Order-level transactions order_items.csv ~60,000 Items purchased per order reviews.csv ~15,000 Customer-written product reviews events.csv ~80,000 User event logs: view, cart, wishlist, purchase ~~~

🧬 Data Dictionary

1. Users (users.csv)
Column Description
user_id Unique user identifier
name  Full customer name
email  Email (synthetic, no real emails)
gender Male / Female / Other
city  City of residence
signup_date Account creation date

2. Products (products.csv)
Column Description
product_id Unique product identifier
product_name  Product title
category  Electronics, Clothing, Beauty, Home, Sports, etc.
price  Actual selling price
rating Average product rating

3. Orders (orders.csv)
Column Description
order_id  Unique order identifier
user_id User who placed the order
order_date Timestamp of the order
order_status  Completed / Cancelled / Returned
total_amount  Total order value

4. Order Items (order_items.csv)
Column Description
order_item_id  Unique identifier
order_id  Associated order
product_id Purchased product
quantity  Quantity purchased
item_price Price per unit

5. Reviews (reviews.csv)
Column Description
review_id  Unique review identifier
user_id User who submitted review
product_id Reviewed product
rating 1–5 star rating
review_text Short synthetic review
review_date Submission date

6. Events (events.csv)
Column Description
event_id  Unique event identifier
user_id User performing event
product_id Viewed/added/purchased product
event_type view/cart/wishlist/purchase
event_timestamp Timestamp of event

🧠 Possible Use Cases (Ideas & Projects)

🔍 Machine Learning

Customer churn prediction
Review sentiment analysis (NLP)
Recommendation engines
Price optimization models
Demand forecasting (Time-series)

📦 Business Analytics

Market basket analysis
RFM segmentation
Cohort analysis
Funnel conversion tracking
A/B testing simulations

🧮 SQL Practice

Joins
Window functions
Aggregations
CTE-based funnels
Complex queries

🛠 How the Dataset Was Generated

The dataset was generated entirely in Python using:

Faker for realistic user and review generation
NumPy for probability-based event modeling
Pandas for data processing

Custom logic for:

demand variation
user behavior simulation
return/cancel probabilities
seasonal order timestamp distribution
The dataset does not include any real personal data.
Everything is generated synthetically.

⚠️ License

This dataset is released under CC BY 4.0 — free to use for:
Research
Education
Commercial projects
Kaggle competitions
Machine learning pipelines
Just provide attribution.

⭐ If you found this dataset helpful, please:

Upvote the dataset
Leave a comment
Share your notebooks/notebooks using it

SaaS Subscription & Churn Analytics Dataset
kaggle.com
zip
Updated Jul 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
River | Datasets for SQL Practice (2025). SaaS Subscription & Churn Analytics Dataset [Dataset]. https://www.kaggle.com/datasets/rivalytics/saas-subscription-and-churn-analytics-dataset/discussion
Explore at:
zip(600149 bytes)Available download formats
Dataset updated
Jul 21, 2025
Authors
River | Datasets for SQL Practice
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
RavenStack is a fictional AI-powered collaboration platform used to simulate a real-world SaaS business. This simulated dataset was created using Python and ChatGPT specifically for people learning data analysis, business intelligence, or data science. It offers a realistic environment to practice SQL joins, cohort analysis, churn modeling, revenue tracking, and support analytics using a multi-table relational structure.

The dataset spans 5 CSV files:

accounts.csv – customer metadata

subscriptions.csv – subscription lifecycles and revenue

feature_usage.csv – daily product interaction logs

support_tickets.csv – support activity and satisfaction scores

churn_events.csv – churn dates, reasons, and refund behaviors

Users can explore trial-to-paid conversion, MRR trends, upgrade funnels, feature adoption, support patterns, churn drivers, and reactivation cycles. The dataset supports temporal and cohort analyses, and has built-in edge cases for testing real-world logic.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

BlastChar (2018). Telco Customer Churn [Dataset]. https://www.kaggle.com/datasets/blastchar/telco-customer-churn

Telco Customer Churn

Focused customer retention programs

Explore at:

84 scholarly articles cite this dataset (View in Google Scholar)

zip(175758 bytes)Available download formats

Dataset updated

Feb 23, 2018

Authors

BlastChar

Description

Context

"Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]

Content

Each row represents a customer, each column contains customer’s attributes described on the column Metadata.

The data set includes information about:

Customers who left within the last month – the column is called Churn
Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges
Demographic info about customers – gender, age range, and if they have partners and dependents

Inspiration

To explore this type of models and learn more about the subject.

New version from IBM: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113

Clear search

Close search

Google apps

Main menu

Telco Customer Churn

Context

Content

Inspiration

Bank Customer Churn

Botswana Bank Customer Churn Dataset

Dataset Overview

Dataset Highlights

Use Cases

S1 Data -

Telecom Data Analysis(Customer Churn Analysis)

Dataset

Contents

Customer Churn Prediction

File Description:

Column Descriptions:

Confusion matrix.

churn_modelling

Data from: scikit-survival

📝 Overview

📥 Installation

🔬 Features

🏥 Use Cases

E-commerce_dataset

E-commerce_dataset

You can use this dataset for:

📁**Dataset Contents**

🧬 Data Dictionary

🧠 Possible Use Cases (Ideas & Projects)

🔍 Machine Learning

📦 Business Analytics

🧮 SQL Practice

🛠 How the Dataset Was Generated

The dataset was generated entirely in Python using:

Custom logic for:

⚠️ License

⭐ If you found this dataset helpful, please:

SaaS Subscription & Churn Analytics Dataset

Telco Customer ChurnSee More Versions

Focused customer retention programs

Context

Content

Inspiration

📁Dataset Contents

Telco Customer Churn