https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8d3442e6c82d8026c6a448e4780ab38c%2FPicture2.png?generation=1688638685268853&alt=media" alt="">
9. Plot the decision tree
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F9ab0591e323dc30fe116c79f6d014d06%2FPicture3.png?generation=1688638747644320&alt=media" alt="">
Average customer churn is 27%. The churn can take place if the tenure is more than >=7.5 and there is no internet service
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F16080ac04d3743ec238227e1ef2c8269%2FPicture4.png?generation=1688639197455166&alt=media" alt="">
Significant variables are Internet Service, Tenure and the least significant are Streaming Movies, Tech Support.
Run library(randomForest). Here we are using the default ntree (500) and mtry (p/3) where p is the number of
independent variables.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc27fe7e83f0b53b7e067371b69c7f4a7%2FPicture6.png?generation=1688640478682685&alt=media" alt="">
Through confusion matrix, accuracy is coming 79.27%. The accuracy is marginally higher than that of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and much higher when predicting "Yes".
Plot the model showing which variables reduce the gini impunity the most and least. Total charges and tenure reduce the gini impunity the most while phone service has the least impact.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fec25fc3ba74ab9cef1a81188209512b1%2FPicture7.png?generation=1688640726235724&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F50aa40e5dd676c8285020fd2fe627bf1%2FPicture8.png?generation=1688640896763066&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F87211e1b218c595911fbe6ea2806e27a%2FPicture9.png?generation=1688641103367564&alt=media" alt="">
Tune the model mtry=2 has the lowest OOB error rate
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F6057af5bb0719b16f1a97a58c3d4aa1d%2FPicture10.png?generation=1688641391027971&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc7045eba4ee298c58f1bd0230c24c00d%2FPicture11.png?generation=1688641605829830&alt=media" alt="">
Use random forest with mtry = 2 and ntree = 200
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F01541eff1f9c6303591aa50dd707b5f5%2FPicture12.png?generation=1688641634979403&alt=media" alt="">
Through confusion matrix, accuracy is coming 79.71%. The accuracy is marginally higher than that of default (when ntree was 500 and mtry was 4) i.e 79.27% and of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and m...
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Customer Churn Dataset is a dataset that collects various customer characteristics and service usage information to predict whether or not communication service customers will turn.
2) Data Utilization (1) Customer Churn Dataset has characteristics that: • The dataset consists of several categorical and numerical variables, including customer demographics, service types, contract information, charges, usage patterns, and Turn. (2) Customer Churn Dataset can be used to: • Development of customer churn prediction model : Machine learning and deep learning techniques can be used to develop classification models that predict churn based on customer characteristics and service use data. • Segmenting customers and developing marketing strategies : It can be used to analyze customer groups at high risk of departure and to design custom retention strategies or targeted marketing campaigns.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This Synthetic Customer Churn Prediction Dataset has been designed as an educational resource for exploring data science, machine learning, and predictive modelling techniques in a customer retention context. The dataset simulates key attributes relevant to customer churn analysis, such as service usage, contract details, and customer demographics. It allows users to practice data manipulation, visualization, and the development of models to predict churn behaviour in industries like telecommunications, subscription services, or utilities.
https://storage.googleapis.com/opendatabay_public/images/churn_c4aae9d4-3939-4866-a249-35d81c5965dc.png" alt="Synthetic Customer Churn Prediction Dataset Distribution">
This dataset is useful for a variety of applications, including:
This dataset is synthetic and anonymized, making it a safe tool for experimentation and learning without compromising real patient privacy.
CCO (Public Domain)
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The customer churn dataset is a collection of customer data that focuses on predicting customer churn, which refers to the tendency of customers to stop using a company's products or services. The dataset contains various features that describe each customer, such as their credit score, country, gender, age, tenure, balance, number of products, credit card status, active membership, estimated salary, and churn status. The churn status indicates whether a customer has churned or not. The dataset is used to analyze and understand factors that contribute to customer churn and to build predictive models to identify customers at risk of churning. The goal is to develop strategies and interventions to reduce churn and improve customer retention
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Telco Customer Churn Dataset includes carrier customer service usage, account information, demographics and churn, which can be used to predict and analyze customer churn.
2) Data Utilization (1) Telco Customer Churn Dataset has characteristics that: • This dataset includes a variety of customer and service characteristics, including gender, age group, partner and dependents, service subscription status (telephone, Internet, security, backup, device protection, technical support, streaming, etc.), contract type, payment method, monthly fee, total fee, and departure. (2) Telco Customer Churn Dataset can be used to: • Development of customer churn prediction model: Using customer service usage patterns and account information, we can build a machine learning-based churn prediction model to proactively identify customers at risk of churn.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global customer churn software market size was valued at approximately USD 1.5 billion in 2023 and is projected to reach USD 4.8 billion by 2032, growing at a CAGR of 13.7% during the forecast period. This robust growth is driven by several factors, including the increasing importance of customer retention in competitive markets, advancements in AI and machine learning technologies, and the growing adoption of digital transformation initiatives across industries.
One of the primary growth factors propelling the customer churn software market is the increasing emphasis on customer satisfaction and retention. In today's highly competitive business environment, retaining existing customers is more cost-effective than acquiring new ones. Companies are realizing the value of customer loyalty, and as a result, they are investing heavily in tools that can help predict and mitigate churn. Customer churn software offers advanced analytics and predictive capabilities, enabling organizations to identify at-risk customers and take proactive measures to retain them.
Another significant driver is the advancement in artificial intelligence (AI) and machine learning technologies. These technologies have revolutionized the way customer data is analyzed and interpreted. AI-powered customer churn software can process vast amounts of data from multiple sources, identify patterns, and generate actionable insights. This ability to leverage big data and predictive analytics is crucial for businesses aiming to stay ahead of the competition. As AI and machine learning continue to evolve, the effectiveness and efficiency of customer churn software are expected to improve further.
The increasing adoption of digital transformation initiatives across various industries is also contributing to the market growth. As businesses undergo digital transformation, they generate enormous amounts of data related to customer behavior, preferences, and interactions. Customer churn software helps organizations make sense of this data, enabling them to develop personalized strategies to enhance customer experience and loyalty. The shift towards data-driven decision-making is compelling companies to invest in advanced analytics solutions, thereby driving the demand for customer churn software.
From a regional perspective, North America holds a significant share of the customer churn software market, driven by the presence of major technology companies and the early adoption of advanced analytics solutions. However, the Asia Pacific region is expected to witness the highest growth rate during the forecast period. Factors such as the rapid digitalization of economies, increasing investments in AI and machine learning, and the growing focus on customer-centric strategies in emerging markets are fueling the demand for customer churn software in this region.
The customer churn software market is segmented into two primary components: software and services. The software segment includes the actual customer churn solutions, while the services segment encompasses implementation, training, support, and consulting services. The software segment is expected to dominate the market due to the high demand for advanced analytics and predictive tools. Companies across various industries are increasingly adopting software solutions to gain insights into customer behavior and predict churn. The software segment's growth is further supported by continuous advancements in AI and machine learning technologies, which enhance the capabilities of customer churn solutions.
The services segment, although smaller in comparison to the software segment, plays a crucial role in the market. Services such as implementation and training ensure that organizations can effectively deploy and utilize customer churn software. Support and consulting services are equally important, as they help companies optimize their software usage and develop customized strategies to address specific churn-related challenges. The demand for these services is expected to grow in tandem with the adoption of customer churn software, as businesses seek to maximize their return on investment and achieve better customer retention outcomes.
Moreover, the integration of customer churn software with existing CRM systems and other business applications is becoming increasingly important. This integration enables a seamless flow of data and enhances the overall efficiency of customer retention efforts. As a result, solutions that offer robust integration capa
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A Tour & Travels Company Wants To Predict Whether A Customer Will Churn Or Not Based On Indicators Given Below. Help Build Predictive Models And Save The Company's Money. Perform Fascinating EDAs. The Data Was Used For Practice Purposes And Also During A Mini Hackathon, Its Completely Free To Use
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset originates from the research domain of Customer Churn Prediction in the Telecom Industry. It was created as part of the project "Data-Driven Churn Prediction: ML Solutions for the Telecom Industry," completed within the Data Stewardship course (Master programme Data Science, TU Wien).
The primary purpose of this dataset is to support machine learning model development for predicting customer churn based on customer demographics, service usage, and account information.
The dataset enables the training, testing, and evaluation of classification algorithms, allowing researchers and practitioners to explore techniques for customer retention optimization.
The dataset was originally obtained from the IBM Accelerator Catalog and adapted for academic use. It was uploaded to TU Wien’s DBRepo test system and accessed via SQLAlchemy connections to the MariaDB environment.
The dataset has a tabular structure and was initially stored in CSV format. It contains:
Rows: 7,043 customer records
Columns: 21 features including customer attributes (gender, senior citizen status, partner status), account information (tenure, contract type, payment method), service usage (internet service, streaming TV, tech support), and the target variable (Churn: Yes/No).
Naming Convention:
The table in the database is named telco_customer_churn_data
.
Software Requirements:
To open and work with the dataset, any standard database client or programming language supporting MariaDB connections can be used (e.g., Python etc).
For machine learning applications, libraries such as pandas
, scikit-learn
, and joblib
are typically used.
Additional Resources:
Source code for data loading, preprocessing, model training, and evaluation is available at the associated GitHub repository: https://github.com/nazerum/fair-ml-customer-churn
When reusing the dataset, users should be aware:
Licensing: The dataset is shared under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Use Case Suitability: The dataset is best suited for classification tasks, particularly binary classification (churn vs. no churn).
Metadata Standards: Metadata describing the dataset adheres to FAIR principles and is supplemented by CodeMeta and Croissant standards for improved interoperability.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Churn prediction aims to detect customers intended to leave a service provider. Retaining one customer costs an organization from 5 to 10 times than gaining a new one. Predictive models can provide correct identification of possible churners in the near future in order to provide a retention solution. This paper presents a new prediction model based on Data Mining (DM) techniques. The proposed model is composed of six steps which are; identify problem domain, data selection, investigate data set, classification, clustering and knowledge usage. A data set with 23 attributes and 5000 instances is used. 4000 instances used for training the model and 1000 instances used as a testing set. The predicted churners are clustered into 3 categories in case of using in a retention strategy. The data mining techniques used in this paper are Decision Tree, Support Vector Machine and Neural Network throughout an open source software name WEKA.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the AI-powered customer churn prediction market size reached USD 1.58 billion globally in 2024, with a robust CAGR of 19.7% expected from 2025 to 2033. Driven by rapid digital transformation and the increasing need for predictive analytics across sectors, the market is forecasted to attain a value of USD 7.57 billion by 2033. The growth of this market is primarily attributed to the escalating adoption of AI and machine learning technologies by enterprises seeking to reduce customer attrition, optimize retention strategies, and enhance overall customer lifetime value, as per the latest industry research.
One of the fundamental growth drivers for the AI-powered customer churn prediction market is the proliferation of customer data and the imperative need for businesses to leverage this data to drive actionable insights. With the advent of digital touchpoints, organizations are now able to collect vast amounts of structured and unstructured data from various customer interactions. This data, when processed using advanced AI and machine learning algorithms, empowers companies to predict potential churn with high accuracy. As a result, businesses across industries such as telecommunications, BFSI, retail, and healthcare are increasingly investing in AI-powered churn prediction solutions to proactively identify at-risk customers and implement targeted retention strategies, thereby reducing revenue loss and improving profitability.
Another significant factor fueling market expansion is the growing emphasis on customer experience and personalization. In today's hyper-competitive landscape, retaining existing customers has become more cost-effective than acquiring new ones. AI-powered churn prediction tools enable organizations to segment their customer base, understand behavior patterns, and tailor interventions for individual customers. This level of personalization not only helps in reducing churn rates but also enhances customer satisfaction and loyalty. The integration of AI-driven insights into CRM systems and marketing automation platforms further streamlines the process, making it easier for businesses to act on predictions in real time. Moreover, the rising adoption of cloud-based solutions has made these technologies more accessible to small and medium enterprises (SMEs), broadening the market’s reach.
The surge in demand for scalable, real-time analytics platforms is also contributing to market growth. Enterprises are increasingly seeking AI-powered solutions that can integrate seamlessly with their existing IT infrastructure, deliver instant insights, and scale as their data grows. The shift towards cloud deployment models has accelerated this trend, offering cost-effective, flexible, and easily deployable churn prediction solutions. Additionally, advancements in natural language processing (NLP), deep learning, and big data analytics are further enhancing the accuracy and reliability of churn prediction models. As organizations strive to stay ahead of the competition by minimizing customer attrition, the demand for sophisticated, AI-driven predictive analytics tools continues to rise.
Regionally, North America holds the largest market share, followed by Europe and Asia Pacific. The dominance of North America can be attributed to the early adoption of AI technologies, presence of major technology vendors, and a strong focus on customer-centric strategies among enterprises in the region. Europe is also witnessing significant growth, driven by stringent regulations around data protection and a growing emphasis on customer retention in industries like BFSI and retail. The Asia Pacific region is expected to exhibit the highest CAGR during the forecast period, fueled by rapid digitalization, increasing investments in AI, and the expansion of e-commerce and telecommunications sectors. Latin America and the Middle East & Africa are also experiencing gradual adoption, primarily in financial services and telecommunications.
The component segment of the AI-powered customer churn prediction market is categorized into software and services. The software segment dominates the market, accounting for the largest share in 2024, owing to the widespread deployment of advanced AI and machine learning platforms
According to our latest research, the AI-powered customer churn prediction market size reached USD 1.96 billion globally in 2024, with a robust CAGR of 18.3% projected through the forecast period. By 2033, the market is expected to hit USD 8.87 billion, driven by the increasing adoption of AI and machine learning solutions across multiple industries to proactively manage and reduce customer attrition. The rapid digital transformation and the growing emphasis on customer experience optimization have emerged as primary growth factors fueling the expansion of this dynamic market.
One of the core growth factors propelling the AI-powered customer churn prediction market is the exponential increase in customer data generation across industries. As businesses increasingly digitize their operations, vast amounts of customer interactions, behavioral data, and transactional records are being accumulated every day. AI-powered churn prediction tools leverage advanced analytics and machine learning algorithms to extract actionable insights from this data, allowing companies to identify at-risk customers with high accuracy. This enables organizations to implement timely retention strategies, reduce churn rates, and ultimately boost long-term profitability. The continuous evolution of AI algorithms, including deep learning and natural language processing, further enhances the predictive capabilities of these solutions, making them indispensable in highly competitive sectors such as telecommunications, BFSI, and retail.
Another significant driver is the escalating demand for personalized customer experiences. Modern consumers expect brands to anticipate their needs and deliver tailored interactions across all touchpoints. AI-powered customer churn prediction systems empower businesses to segment their customer base, understand individual preferences, and proactively address potential pain points. This targeted approach not only improves customer satisfaction but also increases the effectiveness of marketing campaigns and retention efforts. Moreover, the integration of AI with CRM platforms and omnichannel engagement tools has streamlined the deployment of churn prediction models, making them accessible even to small and medium-sized enterprises. The ability to automate and scale these insights across large customer populations is a critical factor stimulating market growth.
The rising cost of customer acquisition compared to retention is also amplifying the importance of AI-powered churn prediction solutions. As competition intensifies and customer loyalty becomes harder to secure, organizations are prioritizing strategies that maximize the lifetime value of existing clients. AI-driven churn analytics provide a cost-effective means to identify early warning signals and intervene before customers decide to leave. This not only reduces the financial impact of churn but also enhances brand reputation and customer advocacy. The scalability, real-time processing, and predictive accuracy offered by AI solutions are attracting investments from both established enterprises and emerging startups, further accelerating market expansion.
Regionally, North America continues to dominate the AI-powered customer churn prediction market, accounting for the largest revenue share in 2024. The region’s advanced technological infrastructure, high digital adoption rates, and concentration of leading AI vendors are key contributors to its leadership position. However, the Asia Pacific region is poised for the fastest growth, fueled by the rapid digitization of economies, increasing mobile and internet penetration, and rising investments in AI and analytics by enterprises. Europe also presents significant opportunities, particularly in sectors like BFSI and retail, where regulatory pressures and customer-centricity are driving early adoption of churn prediction tools. The market landscape in Latin America and the Middle East & Africa is evolving, with organizations gradually recognizing the value of proactive churn management in enhancing competitiveness and customer loyalty.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset belongs to a leading online E-commerce company. The company wants to identify customers who are likely to churn, so they can proactively approach these customers with promotional offers.
The dataset contains various features related to customer behavior and characteristics, which can be used to predict customer churn.
The main task is to predict customer churn based on the given features. This is a binary classification problem where the target variable is 'Churn'.
This dataset is provided for educational purposes. While it represents a real-world scenario, the data itself may be simulated or anonymized.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
259
A Tour & Travels Company Wants To Predict Whether A Customer Will Churn Or Not Based On Indicators Given Below. Help Build Predictive Models And Save The Company's Money. Perform Fascinating EDAs. The Data Was Used For Practice Purposes And Also During A Mini Hackathon, Its Completely Free To Use
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Data Variable Discerption E Comm CustomerID Unique customer ID E Comm Churn Churn Flag E Comm Tenure Tenure of customer in organization E Comm PreferredLoginDevice Preferred login device of customer E Comm CityTier City tier E Comm WarehouseToHome Distance in between warehouse to home of customer E Comm PreferredPaymentMode Preferred payment method of customer E Comm Gender Gender of customer E Comm HourSpendOnApp Number of hours spend on mobile application or website E Comm NumberOfDeviceRegistered Total number of deceives is registered on particular customer E Comm PreferedOrderCat Preferred order category of customer in last month E Comm SatisfactionScore Satisfactory score of customer on service E Comm MaritalStatus Marital status of customer E Comm NumberOfAddress Total number of added added on particular customer E Comm Complain Any complaint has been raised in last month E Comm OrderAmountHikeFromlastYear Percentage increases in order from last year E Comm CouponUsed Total number of coupon has been used in last month E Comm OrderCount Total number of orders has been places in last month E Comm DaySinceLastOrder Day Since last order by customer E Comm CashbackAmount Average cashback in last month
https://www.marketresearchintellect.com/privacy-policyhttps://www.marketresearchintellect.com/privacy-policy
Dive into Market Research Intellect's Customer Churn Analysis Software Market Report, valued at USD 2.1 billion in 2024, and forecast to reach USD 4.8 billion by 2033, growing at a CAGR of 10.2% from 2026 to 2033.
Churn prediction (rawdata)
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The Customer Churn Software market is experiencing robust growth, driven by the increasing need for businesses across diverse sectors to improve customer retention and enhance profitability. The market's expansion is fueled by several key factors. Firstly, the rising adoption of cloud-based solutions offers scalability and cost-effectiveness, attracting a wider range of businesses. Secondly, advancements in AI and machine learning are enabling more sophisticated churn prediction and proactive customer engagement strategies. The telecommunications, banking and finance, and retail and e-commerce sectors are currently leading the adoption, leveraging the software to identify at-risk customers and implement targeted retention programs. However, factors such as high implementation costs, integration challenges with existing systems, and the need for skilled personnel to manage the software can act as restraints on market growth. We project a substantial market expansion in the coming years, with a steady compound annual growth rate (CAGR) contributing to a significant increase in market value. The competitive landscape is dynamic, with established players like IBM, Salesforce, and Microsoft competing alongside specialized churn management solution providers. This competition fosters innovation and drives the development of more advanced features and functionalities. Looking ahead, the market will witness further consolidation through mergers and acquisitions, as larger companies seek to expand their market share. The increasing emphasis on data privacy and security regulations will also shape market dynamics, with vendors focusing on compliant solutions. The market is expected to witness the rise of niche solutions tailored to specific industry segments, providing customized functionalities. The geographic distribution of the market is expected to remain concentrated in North America and Europe initially, with significant growth potential in emerging markets like Asia Pacific and the Middle East & Africa, fueled by increasing digitalization and adoption of sophisticated business analytics. The continued evolution of AI and machine learning algorithms will be crucial in improving the accuracy and efficiency of churn prediction models, further enhancing the value proposition of Customer Churn Software. This convergence of technological advancement, regulatory compliance, and industry-specific needs will shape the future trajectory of the Customer Churn Software market.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of GA-XGBoost with XGBoost and LightGBM test results.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8d3442e6c82d8026c6a448e4780ab38c%2FPicture2.png?generation=1688638685268853&alt=media" alt="">
9. Plot the decision tree
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F9ab0591e323dc30fe116c79f6d014d06%2FPicture3.png?generation=1688638747644320&alt=media" alt="">
Average customer churn is 27%. The churn can take place if the tenure is more than >=7.5 and there is no internet service
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F16080ac04d3743ec238227e1ef2c8269%2FPicture4.png?generation=1688639197455166&alt=media" alt="">
Significant variables are Internet Service, Tenure and the least significant are Streaming Movies, Tech Support.
Run library(randomForest). Here we are using the default ntree (500) and mtry (p/3) where p is the number of
independent variables.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc27fe7e83f0b53b7e067371b69c7f4a7%2FPicture6.png?generation=1688640478682685&alt=media" alt="">
Through confusion matrix, accuracy is coming 79.27%. The accuracy is marginally higher than that of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and much higher when predicting "Yes".
Plot the model showing which variables reduce the gini impunity the most and least. Total charges and tenure reduce the gini impunity the most while phone service has the least impact.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fec25fc3ba74ab9cef1a81188209512b1%2FPicture7.png?generation=1688640726235724&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F50aa40e5dd676c8285020fd2fe627bf1%2FPicture8.png?generation=1688640896763066&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F87211e1b218c595911fbe6ea2806e27a%2FPicture9.png?generation=1688641103367564&alt=media" alt="">
Tune the model mtry=2 has the lowest OOB error rate
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F6057af5bb0719b16f1a97a58c3d4aa1d%2FPicture10.png?generation=1688641391027971&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc7045eba4ee298c58f1bd0230c24c00d%2FPicture11.png?generation=1688641605829830&alt=media" alt="">
Use random forest with mtry = 2 and ntree = 200
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F01541eff1f9c6303591aa50dd707b5f5%2FPicture12.png?generation=1688641634979403&alt=media" alt="">
Through confusion matrix, accuracy is coming 79.71%. The accuracy is marginally higher than that of default (when ntree was 500 and mtry was 4) i.e 79.27% and of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and m...