Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Customer churn prediction dataset of a fictional telecommunication company made by IBM Sample Datasets. Context Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs. Content Each row represents a customer, each column contains customer’s attributes described on the column metadata. The data set includes information about:
Customers who left within the last month: the column is called Churn Services that each customer… See the full description on the dataset page: https://huggingface.co/datasets/scikit-learn/churn-prediction.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset originates from the research domain of Customer Churn Prediction in the Telecom Industry. It was created as part of the project "Data-Driven Churn Prediction: ML Solutions for the Telecom Industry," completed within the Data Stewardship course (Master programme Data Science, TU Wien).
The primary purpose of this dataset is to support machine learning model development for predicting customer churn based on customer demographics, service usage, and account information.
The dataset enables the training, testing, and evaluation of classification algorithms, allowing researchers and practitioners to explore techniques for customer retention optimization.
The dataset was originally obtained from the IBM Accelerator Catalog and adapted for academic use. It was uploaded to TU Wien’s DBRepo test system and accessed via SQLAlchemy connections to the MariaDB environment.
The dataset has a tabular structure and was initially stored in CSV format. It contains:
Rows: 7,043 customer records
Columns: 21 features including customer attributes (gender, senior citizen status, partner status), account information (tenure, contract type, payment method), service usage (internet service, streaming TV, tech support), and the target variable (Churn: Yes/No).
Naming Convention:
The table in the database is named telco_customer_churn_data.
Software Requirements:
To open and work with the dataset, any standard database client or programming language supporting MariaDB connections can be used (e.g., Python etc).
For machine learning applications, libraries such as pandas, scikit-learn, and joblib are typically used.
Additional Resources:
Source code for data loading, preprocessing, model training, and evaluation is available at the associated GitHub repository: https://github.com/nazerum/fair-ml-customer-churn
When reusing the dataset, users should be aware:
Licensing: The dataset is shared under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Use Case Suitability: The dataset is best suited for classification tasks, particularly binary classification (churn vs. no churn).
Metadata Standards: Metadata describing the dataset adheres to FAIR principles and is supplemented by CodeMeta and Croissant standards for improved interoperability.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
If you found the dataset useful, your upvote will help others discover it. Thanks for your support!
This dataset simulates customer behavior for a fictional telecommunications company. It contains demographic information, account details, services subscribed to, and whether the customer ultimately churned (stopped using the service) or not. The data is synthetically generated but designed to reflect realistic patterns often found in telecom churn scenarios.
Purpose:
The primary goal of this dataset is to provide a clean and straightforward resource for beginners learning about:
Features:
The dataset includes the following columns:
CustomerID: Unique identifier for each customer.Age: Customer's age in years.Gender: Customer's gender (Male/Female).Location: General location of the customer (e.g., New York, Los Angeles).SubscriptionDurationMonths: How many months the customer has been subscribed.MonthlyCharges: The amount the customer is charged each month.TotalCharges: The total amount the customer has been charged over their subscription period.ContractType: The type of contract the customer has (Month-to-month, One year, Two year).PaymentMethod: How the customer pays their bill (e.g., Electronic check, Credit card).OnlineSecurity: Whether the customer has online security service (Yes, No, No internet service).TechSupport: Whether the customer has tech support service (Yes, No, No internet service).StreamingTV: Whether the customer has TV streaming service (Yes, No, No internet service).StreamingMovies: Whether the customer has movie streaming service (Yes, No, No internet service).Churn: (Target Variable) Whether the customer churned (1 = Yes, 0 = No).Data Quality:
This dataset is intentionally clean with no missing values, making it easy for beginners to focus on analysis and modeling concepts without complex data cleaning steps.
Inspiration:
Understanding customer churn is crucial for many businesses. This dataset provides a sandbox environment to practice the fundamental techniques used in churn analysis and prediction.
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Telco Customer Churn Dataset includes carrier customer service usage, account information, demographics and churn, which can be used to predict and analyze customer churn.
2) Data Utilization (1) Telco Customer Churn Dataset has characteristics that: • This dataset includes a variety of customer and service characteristics, including gender, age group, partner and dependents, service subscription status (telephone, Internet, security, backup, device protection, technical support, streaming, etc.), contract type, payment method, monthly fee, total fee, and departure. (2) Telco Customer Churn Dataset can be used to: • Development of customer churn prediction model: Using customer service usage patterns and account information, we can build a machine learning-based churn prediction model to proactively identify customers at risk of churn.
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Customer Churn Dataset is a dataset that collects various customer characteristics and service usage information to predict whether or not communication service customers will turn.
2) Data Utilization (1) Customer Churn Dataset has characteristics that: • The dataset consists of several categorical and numerical variables, including customer demographics, service types, contract information, charges, usage patterns, and Turn. (2) Customer Churn Dataset can be used to: • Development of customer churn prediction model : Machine learning and deep learning techniques can be used to develop classification models that predict churn based on customer characteristics and service use data. • Segmenting customers and developing marketing strategies : It can be used to analyze customer groups at high risk of departure and to design custom retention strategies or targeted marketing campaigns.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This is a sample dataset of Telco Customer Churn. It's inspired by the original dataset of "Telco customer churn (11.1.3+)" from IBM Business Analytics Community. This sample dataset is being cleaned and aggregated from the original dataset. It would be good for telco customer churn analysis or prediction by the classification or regression model for experiment and learning purposes.
Column Description: * customerID: A unique ID that identifies each customer. * gender: The customer’s gender: Male (1), Female (0). * SeniorCitizen: Indicates if the customer is 65 or older: No (0), Yes (1). * Partner: Service contract is resold by the partner: No (0), Yes (1). * Dependents: Indicates if the customer lives with any dependents: No (0), Yes (1). * Tenure: Indicates the total amount of months that the customer has been with the company. * PhoneService: Indicates if the customer subscribes to home phone service with the company: No (0), Yes (1). * MultipleLines: Indicates if the customer subscribes to multiple telephone lines with the company: No (0), Yes (1). * InternetService: Indicates if the customer subscribes to Internet service with the company: No (0), DSL (1), Fiber optic (2). * OnlineSecurity: Indicates if the customer subscribes to an additional online security service provided by the company: No (0), Yes (1), NA (2). * OnlineBackup: Indicates if the customer subscribes to an additional online backup service provided by the company: No (0), Yes (1), NA (2). * DeviceProtection: Indicates if the customer subscribes to an additional device protection plan for their Internet equipment provided by the company: No (0), Yes (1), NA (2). * TechSupport: Indicates if the customer subscribes to an additional technical support plan from the company with reduced wait times: No (0), Yes (1), NA (2). * StreamingTV: Indicates if the customer uses their Internet service to stream television programing from a third party provider: No (0), Yes (1), NA (2). The company does not charge an additional fee for this service. * StreamingMovies: Indicates if the customer uses their Internet service to stream movies from a third party provider: No (0), Yes (1), NA (2). The company does not charge an additional fee for this service. * Contract: Indicates the customer’s current contract type: Month-to-Month (0), One Year (1), Two Year (2). * PaperlessBilling: Indicates if the customer has chosen paperless billing: No (0), Yes (1). * PaymentMethod: Indicates how the customer pays their bill: Bank transfer - automatic (0), Credit card - automatic (1), Electronic cheque (2), Mailed cheque (3). * MonthlyCharges: Indicates the customer’s current total monthly charge for all their services from the company. * TotalCharges: Indicates the customer’s total charges. * Churn: Indicates if the customer churn or not: No (0), Yes (1).
Facebook
Twitterhttps://www.marketresearchintellect.com/privacy-policyhttps://www.marketresearchintellect.com/privacy-policy
Find detailed analysis in Market Research Intellect's Customer Churn Analysis Software Market Report, estimated at USD 2.1 billion in 2024 and forecasted to climb to USD 4.8 billion by 2033, reflecting a CAGR of 10.2%.Stay informed about adoption trends, evolving technologies, and key market participants.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
"Telecom Customer Churn Prediction Dataset" is a synthetic dataset designed to simulate customer data for a telecommunications company. This dataset is created for the purpose of predicting customer churn, which refers to the phenomenon of customers discontinuing their services with the company. The dataset contains a variety of features that capture different aspects of customer behavior and characteristics.
The dataset includes information such as customer age, gender, contract type, monthly charges, total amount spent, number of devices connected, and the number of customer support calls made. The key focus of this dataset is the binary target variable "Churn," which indicates whether a customer has churned (1) or not (0). This variable is essential for training and evaluating predictive models aimed at identifying customers who are likely to leave the service.
Facebook
TwitterDataset Card for Telco Customer Churn
This dataset contains information about customers of a fictional telecommunications company, including demographic information, services subscribed to, location details, and churn behavior. This merged dataset combines the information from the original Telco Customer Churn dataset with additional details.
Dataset Details
Dataset Description
This merged Telco Customer Churn dataset provides a comprehensive view of customer… See the full description on the dataset page: https://huggingface.co/datasets/aai510-group1/telco-customer-churn.
Facebook
Twitter
According to our latest research, the AI-powered customer churn prediction market size reached USD 1.96 billion globally in 2024, with a robust CAGR of 18.3% projected through the forecast period. By 2033, the market is expected to hit USD 8.87 billion, driven by the increasing adoption of AI and machine learning solutions across multiple industries to proactively manage and reduce customer attrition. The rapid digital transformation and the growing emphasis on customer experience optimization have emerged as primary growth factors fueling the expansion of this dynamic market.
One of the core growth factors propelling the AI-powered customer churn prediction market is the exponential increase in customer data generation across industries. As businesses increasingly digitize their operations, vast amounts of customer interactions, behavioral data, and transactional records are being accumulated every day. AI-powered churn prediction tools leverage advanced analytics and machine learning algorithms to extract actionable insights from this data, allowing companies to identify at-risk customers with high accuracy. This enables organizations to implement timely retention strategies, reduce churn rates, and ultimately boost long-term profitability. The continuous evolution of AI algorithms, including deep learning and natural language processing, further enhances the predictive capabilities of these solutions, making them indispensable in highly competitive sectors such as telecommunications, BFSI, and retail.
Another significant driver is the escalating demand for personalized customer experiences. Modern consumers expect brands to anticipate their needs and deliver tailored interactions across all touchpoints. AI-powered customer churn prediction systems empower businesses to segment their customer base, understand individual preferences, and proactively address potential pain points. This targeted approach not only improves customer satisfaction but also increases the effectiveness of marketing campaigns and retention efforts. Moreover, the integration of AI with CRM platforms and omnichannel engagement tools has streamlined the deployment of churn prediction models, making them accessible even to small and medium-sized enterprises. The ability to automate and scale these insights across large customer populations is a critical factor stimulating market growth.
The rising cost of customer acquisition compared to retention is also amplifying the importance of AI-powered churn prediction solutions. As competition intensifies and customer loyalty becomes harder to secure, organizations are prioritizing strategies that maximize the lifetime value of existing clients. AI-driven churn analytics provide a cost-effective means to identify early warning signals and intervene before customers decide to leave. This not only reduces the financial impact of churn but also enhances brand reputation and customer advocacy. The scalability, real-time processing, and predictive accuracy offered by AI solutions are attracting investments from both established enterprises and emerging startups, further accelerating market expansion.
Regionally, North America continues to dominate the AI-powered customer churn prediction market, accounting for the largest revenue share in 2024. The regionÂ’s advanced technological infrastructure, high digital adoption rates, and concentration of leading AI vendors are key contributors to its leadership position. However, the Asia Pacific region is poised for the fastest growth, fueled by the rapid digitization of economies, increasing mobile and internet penetration, and rising investments in AI and analytics by enterprises. Europe also presents significant opportunities, particularly in sectors like BFSI and retail, where regulatory pressures and customer-centricity are driving early adoption of churn prediction tools. The market landscape in Latin America and the Middle East & Africa is evolving, with organizations gradually recognizing the value of proactive churn management in enhancing competitiveness and customer loyalty.
The telecommunications industry, in particular, has been at the forefront of adopting AI-powered churn prediction tools due to its high customer turnover rates and competitive market dynamics. <a href="https://growthmarketreports.com
Facebook
TwitterThis is a collection of the data used for analysis (master dataset, training data, test data), and the code and processes that have been used to conduct the analysis for this research project.
Facebook
TwitterThis dataset was created by Al Amin
It contains the following files:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analyzing customers’ characteristics and giving the early warning of customer churn based on machine learning algorithms, can help enterprises provide targeted marketing strategies and personalized services, and save a lot of operating costs. Data cleaning, oversampling, data standardization and other preprocessing operations are done on 900,000 telecom customer personal characteristics and historical behavior data set based on Python language. Appropriate model parameters were selected to build BPNN (Back Propagation Neural Network). Random Forest (RF) and Adaboost, the two classic ensemble learning models were introduced, and the Adaboost dual-ensemble learning model with RF as the base learner was put forward. The four models and the other four classical machine learning models-decision tree, naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) were utilized respectively to analyze the customer churn data. The results show that the four models have better performance in terms of recall rate, precision rate, F1 score and other indicators, and the RF-Adaboost dual-ensemble model has the best performance. Among them, the recall rates of BPNN, RF, Adaboost and RF-Adaboost dual-ensemble model on positive samples are respectively 79%, 90%, 89%,93%, the precision rates are 97%, 99%, 98%, 99%, and the F1 scores are 87%, 95%, 94%, 96%. The RF-Adaboost dual-ensemble model has the best performance, and the three indicators are 10%, 1%, and 6% higher than the reference. The prediction results of customer churn provide strong data support for telecom companies to adopt appropriate retention strategies for pre-churn customers and reduce customer churn.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Customer Churn Software market is experiencing robust growth, driven by the increasing need for businesses to retain customers and improve profitability. The market, estimated at $15 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $45 billion by 2033. This expansion is fueled by several key factors: the rising adoption of cloud-based solutions offering scalability and cost-effectiveness, the increasing availability of sophisticated analytics and AI-powered prediction models enabling proactive churn management, and the growing focus on delivering personalized customer experiences to enhance loyalty. Major players like IBM, Adobe, Salesforce, and Microsoft are actively shaping the market through continuous innovation and strategic acquisitions, contributing to a competitive landscape that fosters further growth. However, the market also faces certain restraints. The high initial investment costs associated with implementing sophisticated churn prediction software can be a barrier for smaller businesses. Furthermore, the complexity of integrating these solutions with existing CRM and data management systems can pose challenges, requiring significant expertise and resources. Despite these challenges, the long-term benefits of reduced customer churn significantly outweigh the initial investment, driving market expansion. The segmentation within the market is diverse, encompassing solutions catering to specific industry verticals and customer sizes, allowing for targeted solutions addressing unique churn drivers within each sector. The increasing prevalence of subscription-based business models further fuels the demand for effective churn management tools.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global customer churn prediction for banking market size reached USD 2.17 billion in 2024, with a robust compound annual growth rate (CAGR) of 18.3%. This dynamic market is forecasted to reach USD 9.94 billion by 2033, driven by increasing digital transformation initiatives, the proliferation of advanced analytics, and the growing importance of customer retention in the highly competitive banking sector. As per our latest research, the surge in adoption of artificial intelligence (AI) and machine learning (ML) technologies, coupled with mounting regulatory requirements, is propelling the demand for sophisticated churn prediction solutions globally.
One of the primary growth factors fueling the customer churn prediction for banking market is the intensifying competition in the global banking landscape. Financial institutions are under constant pressure to retain their existing customer base, as acquiring new customers is significantly more costly than retaining current ones. With the rise of neobanks and fintech disruptors, traditional banks are increasingly leveraging predictive analytics to identify at-risk customers and proactively implement retention strategies. Furthermore, the shift toward personalized banking experiences has necessitated the use of churn prediction tools that analyze vast datasets to uncover behavioral patterns, transaction anomalies, and sentiment trends. This, in turn, enables banks to tailor their offerings and communication, thereby reducing churn rates and improving overall customer loyalty.
Another key driver for the market is the rapid advancement and integration of AI and ML technologies in banking operations. These technologies empower banks to process and analyze massive volumes of structured and unstructured data from multiple sources such as transaction records, social media, and customer service interactions. By deploying sophisticated algorithms, banks can detect early warning signs of customer dissatisfaction and predict potential churn with remarkable accuracy. The increased availability of cloud-based analytics platforms further accelerates adoption, as banks of all sizes can now access scalable, cost-effective churn prediction solutions without the need for heavy upfront investments in infrastructure. This democratization of technology is particularly beneficial for small and medium-sized enterprises (SMEs) in the banking sector.
Regulatory compliance and risk management are also significant contributors to market growth. As regulatory bodies worldwide impose stricter requirements on customer data management and transparency, banks are compelled to invest in advanced analytics to monitor customer behavior and mitigate risks associated with churn. Predictive models help institutions not only to comply with regulations but also to anticipate and address potential issues before they escalate. The integration of churn prediction tools into risk management frameworks enhances banks' ability to maintain stable customer portfolios, minimize revenue losses, and uphold reputational integrity in an increasingly scrutinized environment.
Regionally, North America continues to dominate the customer churn prediction for banking market, accounting for the largest share in 2024 due to the presence of major banking institutions, early technology adoption, and a mature digital infrastructure. However, the Asia Pacific region is exhibiting the fastest growth, driven by rapid urbanization, expanding digital banking ecosystems, and increasing investments in AI-driven analytics. Europe also remains a significant market, bolstered by regulatory mandates such as GDPR and the growing focus on customer-centric banking models. The Middle East & Africa and Latin America are emerging markets, with rising awareness and gradual adoption of churn prediction technologies as banks seek to modernize their operations and enhance customer engagement.
The customer churn prediction for banking market by component is segmented into software and services, each playing a pivotal role in the deployment and effectiveness of churn prediction systems. The software segment encompasses purpose-built analytics platforms, AI-driven modeling tools, and integrated customer relationship management (CRM) systems specifically designed for churn analysis. These solutions enable banks to collect, process,
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Churn prediction aims to detect customers intended to leave a service provider. Retaining one customer costs an organization from 5 to 10 times than gaining a new one. Predictive models can provide correct identification of possible churners in the near future in order to provide a retention solution. This paper presents a new prediction model based on Data Mining (DM) techniques. The proposed model is composed of six steps which are; identify problem domain, data selection, investigate data set, classification, clustering and knowledge usage. A data set with 23 attributes and 5000 instances is used. 4000 instances used for training the model and 1000 instances used as a testing set. The predicted churners are clustered into 3 categories in case of using in a retention strategy. The data mining techniques used in this paper are Decision Tree, Support Vector Machine and Neural Network throughout an open source software name WEKA.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides detailed call log records linked to customer churn events, including call metadata, customer demographics, churn reasons, and resolution outcomes. It enables comprehensive analysis of why customers leave, how call center interactions influence churn, and supports the development of targeted retention strategies. The dataset is ideal for churn prediction modeling, root cause analysis, and customer experience optimization.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Customer Churn’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hassanamin/customer-churn on 13 November 2021.
--- Dataset description provided by original source is as follows ---
A marketing agency has many customers that use their service to produce ads for the client/customer websites. They've noticed that they have quite a bit of churn in clients. They basically randomly assign account managers right now, but want you to create a machine learning model that will help predict which customers will churn (stop buying their service) so that they can correctly assign the customers most at risk to churn an account manager. Luckily they have some historical data, can you help them out? Create a classification algorithm that will help classify whether or not a customer churned. Then the company can test this against incoming data for future customers to predict which customers will churn and assign them an account manager.
The data is saved as customer_churn.csv. Here are the fields and their definitions:
Name : Name of the latest contact at Company
Age: Customer Age
Total_Purchase: Total Ads Purchased
Account_Manager: Binary 0=No manager, 1= Account manager assigned
Years: Totaly Years as a customer
Num_sites: Number of websites that use the service.
Onboard_date: Date that the name of the latest contact was onboarded
Location: Client HQ Address
Company: Name of Client Company
Once you've created the model and evaluated it, test out the model on some new data (you can think of this almost like a hold-out set) that your client has provided, saved under new_customers.csv. The client wants to know which customers are most likely to churn given this data (they don't have the label yet).
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
--- Original source retains full ownership of the source dataset ---
Facebook
Twitterhttps://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Customer Churn Analysis Software Market size was valued at USD 1.9 Billion in 2024 and is projected to reach USD 8.4 Billion by 2032, growing at a CAGR of 19.80% during the forecast period 2026-2032.Global Customer Churn Analysis Software Market DriversThe market drivers for the Customer Churn Analysis Software Market can be influenced by various factors. These may include:Customer Retention Methods: As obtaining new consumers is becoming more expensive, greater emphasis is placed on retaining existing ones. Churn analysis software is used to forecast and reduce turnover, resulting in increased customer lifetime value.An Increase in the Usage of Predictive Analytics and AI Technologies: To examine big data sets, churn prediction technologies now incorporate artificial intelligence and machine learning. Their application is allowing for more accurate churn forecasting and targeted actions.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Multimedia Data-Driven Customer Churn Prediction Using an Enhanced Extreme Learning Machine
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Customer churn prediction dataset of a fictional telecommunication company made by IBM Sample Datasets. Context Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs. Content Each row represents a customer, each column contains customer’s attributes described on the column metadata. The data set includes information about:
Customers who left within the last month: the column is called Churn Services that each customer… See the full description on the dataset page: https://huggingface.co/datasets/scikit-learn/churn-prediction.