http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The customer churn dataset is a collection of customer data that focuses on predicting customer churn, which refers to the tendency of customers to stop using a company's products or services. The dataset contains various features that describe each customer, such as their credit score, country, gender, age, tenure, balance, number of products, credit card status, active membership, estimated salary, and churn status. The churn status indicates whether a customer has churned or not. The dataset is used to analyze and understand factors that contribute to customer churn and to build predictive models to identify customers at risk of churning. The goal is to develop strategies and interventions to reduce churn and improve customer retention
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset originates from the research domain of Customer Churn Prediction in the Telecom Industry. It was created as part of the project "Data-Driven Churn Prediction: ML Solutions for the Telecom Industry," completed within the Data Stewardship course (Master programme Data Science, TU Wien).
The primary purpose of this dataset is to support machine learning model development for predicting customer churn based on customer demographics, service usage, and account information.
The dataset enables the training, testing, and evaluation of classification algorithms, allowing researchers and practitioners to explore techniques for customer retention optimization.
The dataset was originally obtained from the IBM Accelerator Catalog and adapted for academic use. It was uploaded to TU Wien’s DBRepo test system and accessed via SQLAlchemy connections to the MariaDB environment.
The dataset has a tabular structure and was initially stored in CSV format. It contains:
Rows: 7,043 customer records
Columns: 21 features including customer attributes (gender, senior citizen status, partner status), account information (tenure, contract type, payment method), service usage (internet service, streaming TV, tech support), and the target variable (Churn: Yes/No).
Naming Convention:
The table in the database is named telco_customer_churn_data
.
Software Requirements:
To open and work with the dataset, any standard database client or programming language supporting MariaDB connections can be used (e.g., Python etc).
For machine learning applications, libraries such as pandas
, scikit-learn
, and joblib
are typically used.
Additional Resources:
Source code for data loading, preprocessing, model training, and evaluation is available at the associated GitHub repository: https://github.com/nazerum/fair-ml-customer-churn
When reusing the dataset, users should be aware:
Licensing: The dataset is shared under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Use Case Suitability: The dataset is best suited for classification tasks, particularly binary classification (churn vs. no churn).
Metadata Standards: Metadata describing the dataset adheres to FAIR principles and is supplemented by CodeMeta and Croissant standards for improved interoperability.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
259
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Churn Prediction Software market is experiencing robust growth, driven by the increasing need for businesses across diverse sectors to proactively manage customer retention. The market's expansion is fueled by the rising adoption of cloud-based solutions, offering scalability and cost-effectiveness. Key applications include telecommunications, banking and finance, retail, e-commerce, and healthcare, where minimizing customer churn is crucial for profitability. The market is witnessing a shift towards sophisticated predictive analytics and machine learning algorithms that provide more accurate churn predictions, allowing businesses to implement targeted retention strategies. This includes personalized offers, proactive customer support, and improved product/service offerings. Furthermore, the integration of churn prediction software with CRM systems enhances data analysis and facilitates more effective customer relationship management. Competition is intensifying with established players like SAP, Salesforce, and Oracle competing alongside agile startups offering specialized solutions. The market's growth, while positive, also faces certain restraints, such as the high initial investment costs for implementing these sophisticated solutions and the need for skilled data scientists to interpret and leverage the insights derived from the analyses. Despite these challenges, the market's future remains promising. The increasing availability of large datasets, coupled with advancements in artificial intelligence and machine learning, is expected to drive innovation and further enhance the accuracy and effectiveness of churn prediction software. Regional growth will vary, with North America and Europe likely leading the market initially, driven by higher technology adoption rates and established business practices. However, growth in Asia-Pacific is anticipated to accelerate significantly in the coming years as businesses in developing economies prioritize customer retention strategies. The continued development of user-friendly interfaces and the increasing integration of these tools into existing business workflows will further contribute to the overall market expansion and wider adoption across various industries.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Customer Churn Dataset is a dataset that collects various customer characteristics and service usage information to predict whether or not communication service customers will turn.
2) Data Utilization (1) Customer Churn Dataset has characteristics that: • The dataset consists of several categorical and numerical variables, including customer demographics, service types, contract information, charges, usage patterns, and Turn. (2) Customer Churn Dataset can be used to: • Development of customer churn prediction model : Machine learning and deep learning techniques can be used to develop classification models that predict churn based on customer characteristics and service use data. • Segmenting customers and developing marketing strategies : It can be used to analyze customer groups at high risk of departure and to design custom retention strategies or targeted marketing campaigns.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
"It's necessary to implement an effective customer retention strategy through data analysis. The main goal is to predict the probability of customer churn for the next month, identify key customer profiles, and develop specific recommendations to improve customer retention and satisfaction. This will enable optimizing the customer experience and strengthening their loyalty."
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Telco Customer Churn Dataset includes carrier customer service usage, account information, demographics and churn, which can be used to predict and analyze customer churn.
2) Data Utilization (1) Telco Customer Churn Dataset has characteristics that: • This dataset includes a variety of customer and service characteristics, including gender, age group, partner and dependents, service subscription status (telephone, Internet, security, backup, device protection, technical support, streaming, etc.), contract type, payment method, monthly fee, total fee, and departure. (2) Telco Customer Churn Dataset can be used to: • Development of customer churn prediction model: Using customer service usage patterns and account information, we can build a machine learning-based churn prediction model to proactively identify customers at risk of churn.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The "Real World Customer Churn Dataset in Telco Domain" is a comprehensive collection of anonymized data that provides insights into customer behavior and churn prediction within the telecommunications industry.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F6361330%2F860271e0362e6c10503889f289201402%2FCustomer-churn.jpg?generation=1698182677600097&alt=media" alt="Dataset Image">
The dataset contains data on over 60,000 customers across more than 10+ distinct usage categories. Some of the key usage categories include:
The dataset consists of the following key files:
The "Real World Customer Churn Dataset in Telco Domain" offers a range of potential use cases, including:
This dataset's real-world aspect is of significant importance. It reflects actual customer interactions with a major telecommunications company in Sri Lanka, offering insights that can be directly applied to real-world scenarios. The dataset is sourced from one of the largest telco companies in the country, adding credibility and relevance to the insights it provides.
Understanding customer churn and usage behavior is pivotal for the telecommunications industry, and this dataset empowers researchers, data scientists, and businesses to gain deeper insights into these aspects.
The dataset is anonymized to protect customer privacy, and all data used is in compliance with privacy regulations and agreements. Users are encouraged to explore and contribute to the "Real World Customer Churn Dataset in Telco Domain."
Thank you for your valuable contributions to this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In order to study the issue of network customer churn, the iQiyi customer dataset was collected. Behavioral sequence features were extracted from it to build a deep learning model and experiments were conducted.Here, we provide the corresponding raw dataset, including all the data we used.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Customer Churn Analysis Software market is experiencing robust growth, driven by the increasing need for businesses to understand and mitigate customer attrition. The market's expansion is fueled by several factors, including the rising adoption of cloud-based solutions, the proliferation of big data analytics, and the growing demand for predictive analytics capabilities to proactively identify at-risk customers. Businesses across diverse sectors, including SaaS, e-commerce, and telecommunications, are increasingly leveraging these sophisticated tools to gain actionable insights into customer behavior, personalize their offerings, and improve customer retention strategies. This market is characterized by a competitive landscape with both established players like Adobe and Google, and specialized niche providers such as Infer and Churnly Technologies Limited. The integration of AI and machine learning capabilities within these platforms is a prominent trend, enabling more accurate prediction models and automated interventions to reduce churn. While the initial investment in such software can be a restraint for some smaller businesses, the long-term return on investment, in terms of improved customer retention and reduced acquisition costs, is a compelling driver for market growth. The forecast period (2025-2033) is expected to witness significant expansion, building upon the historical growth from 2019-2024. Assuming a conservative CAGR (let's estimate it at 15% based on industry trends), and a 2025 market size of $5 billion (a reasonable estimate given the presence of major players and the importance of the sector), the market is projected to reach approximately $17 billion by 2033. This expansion will be propelled by continuous technological advancements, the growing adoption of subscription-based business models, and a heightened focus on customer experience management across industries. Regional variations will likely exist, with North America and Europe leading the market initially due to higher adoption rates and technological infrastructure, but emerging markets in Asia-Pacific are expected to show significant growth in the later years of the forecast period. The competitive landscape will remain dynamic, with mergers, acquisitions, and the emergence of innovative solutions shaping the future of customer churn analysis software.
https://www.marketresearchintellect.com/privacy-policyhttps://www.marketresearchintellect.com/privacy-policy
Find detailed analysis in Market Research Intellect's Customer Churn Analysis Software Market Report, estimated at USD 2. 1 billion in 2024 and forecasted to climb to USD 4. 8 billion by 2033, reflecting a CAGR of 10. 2%. Stay informed about adoption trends, evolving technologies, and key market participants.
This dataset was created by Al Amin
It contains the following files:
The Churn dataset contains tweets about telecommunication brands identified for churn prediction, which involves predicting if a post indicates a user's intention to leave a brand.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset used for this project originated from raw telecom customer activity logs provided by a private client. It included a wide range of customer behavior and service usage metrics such as recharge patterns, call activity, SMS and data usage, rental revenues, and various time-based features.
Prior to modeling, the dataset underwent an extensive preprocessing phase which was carried out in a separate project. This involved cleaning the data, encoding categorical variables, handling missing values, and scaling the features to ensure consistent model performance. The resulting dataset consisted of 790,624 entries with 71 numeric features, including both user demographics and behavioral indicators, as well as a binary target variable (churned
).
All features were fully numeric and scaled, which enabled direct application of PCA for dimensionality reduction without additional transformations. This preprocessed dataset served as the foundation for the experiments and evaluations presented in this report.
Dataset Card for Telco Customer Churn
This dataset contains information about customers of a fictional telecommunications company, including demographic information, services subscribed to, location details, and churn behavior. This merged dataset combines the information from the original Telco Customer Churn dataset with additional details.
Dataset Details
Dataset Description
This merged Telco Customer Churn dataset provides a comprehensive view of customer… See the full description on the dataset page: https://huggingface.co/datasets/aai510-group1/telco-customer-churn.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the AI-powered customer churn prediction market size reached USD 1.58 billion globally in 2024, with a robust CAGR of 19.7% expected from 2025 to 2033. Driven by rapid digital transformation and the increasing need for predictive analytics across sectors, the market is forecasted to attain a value of USD 7.57 billion by 2033. The growth of this market is primarily attributed to the escalating adoption of AI and machine learning technologies by enterprises seeking to reduce customer attrition, optimize retention strategies, and enhance overall customer lifetime value, as per the latest industry research.
One of the fundamental growth drivers for the AI-powered customer churn prediction market is the proliferation of customer data and the imperative need for businesses to leverage this data to drive actionable insights. With the advent of digital touchpoints, organizations are now able to collect vast amounts of structured and unstructured data from various customer interactions. This data, when processed using advanced AI and machine learning algorithms, empowers companies to predict potential churn with high accuracy. As a result, businesses across industries such as telecommunications, BFSI, retail, and healthcare are increasingly investing in AI-powered churn prediction solutions to proactively identify at-risk customers and implement targeted retention strategies, thereby reducing revenue loss and improving profitability.
Another significant factor fueling market expansion is the growing emphasis on customer experience and personalization. In today's hyper-competitive landscape, retaining existing customers has become more cost-effective than acquiring new ones. AI-powered churn prediction tools enable organizations to segment their customer base, understand behavior patterns, and tailor interventions for individual customers. This level of personalization not only helps in reducing churn rates but also enhances customer satisfaction and loyalty. The integration of AI-driven insights into CRM systems and marketing automation platforms further streamlines the process, making it easier for businesses to act on predictions in real time. Moreover, the rising adoption of cloud-based solutions has made these technologies more accessible to small and medium enterprises (SMEs), broadening the market’s reach.
The surge in demand for scalable, real-time analytics platforms is also contributing to market growth. Enterprises are increasingly seeking AI-powered solutions that can integrate seamlessly with their existing IT infrastructure, deliver instant insights, and scale as their data grows. The shift towards cloud deployment models has accelerated this trend, offering cost-effective, flexible, and easily deployable churn prediction solutions. Additionally, advancements in natural language processing (NLP), deep learning, and big data analytics are further enhancing the accuracy and reliability of churn prediction models. As organizations strive to stay ahead of the competition by minimizing customer attrition, the demand for sophisticated, AI-driven predictive analytics tools continues to rise.
Regionally, North America holds the largest market share, followed by Europe and Asia Pacific. The dominance of North America can be attributed to the early adoption of AI technologies, presence of major technology vendors, and a strong focus on customer-centric strategies among enterprises in the region. Europe is also witnessing significant growth, driven by stringent regulations around data protection and a growing emphasis on customer retention in industries like BFSI and retail. The Asia Pacific region is expected to exhibit the highest CAGR during the forecast period, fueled by rapid digitalization, increasing investments in AI, and the expansion of e-commerce and telecommunications sectors. Latin America and the Middle East & Africa are also experiencing gradual adoption, primarily in financial services and telecommunications.
The component segment of the AI-powered customer churn prediction market is categorized into software and services. The software segment dominates the market, accounting for the largest share in 2024, owing to the widespread deployment of advanced AI and machine learning platforms
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
result_data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy
BASE YEAR | 2024 |
HISTORICAL DATA | 2019 - 2024 |
REPORT COVERAGE | Revenue Forecast, Competitive Landscape, Growth Factors, and Trends |
MARKET SIZE 2023 | 3.24(USD Billion) |
MARKET SIZE 2024 | 3.75(USD Billion) |
MARKET SIZE 2032 | 12.1(USD Billion) |
SEGMENTS COVERED | Deployment Mode ,Organization Size ,Industry Vertical ,Functionality ,Data Integration ,Regional |
COUNTRIES COVERED | North America, Europe, APAC, South America, MEA |
KEY MARKET DYNAMICS | AIpowered churn prediction Realtime customer insights Predictive analytics Cloudbased deployment Integration with CRM systems |
MARKET FORECAST UNITS | USD Billion |
KEY COMPANIES PROFILED | HubSpot ,Oracle ,Zoho ,Freshworks ,Pegasystems ,Mixpanel ,Zendesk ,Medallia ,Adobe ,IBM ,Salesforce ,Amplitude ,SAP ,Qualtrics ,Microsoft |
MARKET FORECAST PERIOD | 2024 - 2032 |
KEY MARKET OPPORTUNITIES | AIpowered churn prediction Personalized churn prevention strategies Predictive analytics for proactive customer retention Selfservice churn management tools Integration with CRM and other business systems |
COMPOUND ANNUAL GROWTH RATE (CAGR) | 15.79% (2024 - 2032) |
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8d3442e6c82d8026c6a448e4780ab38c%2FPicture2.png?generation=1688638685268853&alt=media" alt="">
9. Plot the decision tree
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F9ab0591e323dc30fe116c79f6d014d06%2FPicture3.png?generation=1688638747644320&alt=media" alt="">
Average customer churn is 27%. The churn can take place if the tenure is more than >=7.5 and there is no internet service
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F16080ac04d3743ec238227e1ef2c8269%2FPicture4.png?generation=1688639197455166&alt=media" alt="">
Significant variables are Internet Service, Tenure and the least significant are Streaming Movies, Tech Support.
Run library(randomForest). Here we are using the default ntree (500) and mtry (p/3) where p is the number of
independent variables.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc27fe7e83f0b53b7e067371b69c7f4a7%2FPicture6.png?generation=1688640478682685&alt=media" alt="">
Through confusion matrix, accuracy is coming 79.27%. The accuracy is marginally higher than that of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and much higher when predicting "Yes".
Plot the model showing which variables reduce the gini impunity the most and least. Total charges and tenure reduce the gini impunity the most while phone service has the least impact.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fec25fc3ba74ab9cef1a81188209512b1%2FPicture7.png?generation=1688640726235724&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F50aa40e5dd676c8285020fd2fe627bf1%2FPicture8.png?generation=1688640896763066&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F87211e1b218c595911fbe6ea2806e27a%2FPicture9.png?generation=1688641103367564&alt=media" alt="">
Tune the model mtry=2 has the lowest OOB error rate
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F6057af5bb0719b16f1a97a58c3d4aa1d%2FPicture10.png?generation=1688641391027971&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc7045eba4ee298c58f1bd0230c24c00d%2FPicture11.png?generation=1688641605829830&alt=media" alt="">
Use random forest with mtry = 2 and ntree = 200
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F01541eff1f9c6303591aa50dd707b5f5%2FPicture12.png?generation=1688641634979403&alt=media" alt="">
Through confusion matrix, accuracy is coming 79.71%. The accuracy is marginally higher than that of default (when ntree was 500 and mtry was 4) i.e 79.27% and of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and m...
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The customer churn dataset is a collection of customer data that focuses on predicting customer churn, which refers to the tendency of customers to stop using a company's products or services. The dataset contains various features that describe each customer, such as their credit score, country, gender, age, tenure, balance, number of products, credit card status, active membership, estimated salary, and churn status. The churn status indicates whether a customer has churned or not. The dataset is used to analyze and understand factors that contribute to customer churn and to build predictive models to identify customers at risk of churning. The goal is to develop strategies and interventions to reduce churn and improve customer retention