100+ datasets found
  1. Customer Churn - Decision Tree & Random Forest

    • kaggle.com
    Updated Jul 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vikram amin (2023). Customer Churn - Decision Tree & Random Forest [Dataset]. https://www.kaggle.com/datasets/vikramamin/customer-churn-decision-tree-and-random-forest
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 6, 2023
    Dataset provided by
    Kaggle
    Authors
    vikram amin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description
    • Main objective: Find out customers who will churn and who will not.
    • Methodology: It is a classification problem. We will use decision tree and random forest to predict the outcome.
    • Steps Involved
    • Read the data
    • Check for data types https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F1ffb600d8a4b4b36bc25e957524a3524%2FPicture1.png?generation=1688638600831386&alt=media" alt="">
    1. Change character vector to factor vector as this is as classification problem
    2. Drop the variable which is not significant for the analysis. We drop "customerID".
    3. Check for missing values. None are found.
    4. Split the data into train and test so we can use the train data for building the model and use test data for prediction. We split this into 80-20 ratio (train/test) using the sample function.
    5. Install and run libraries (rpart, rpart.plot, rattle, RColorBrewer, caret)
    6. Run decision tree using rpart function. The dependent variable is Churn and 19 other independent variables

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8d3442e6c82d8026c6a448e4780ab38c%2FPicture2.png?generation=1688638685268853&alt=media" alt=""> 9. Plot the decision tree

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F9ab0591e323dc30fe116c79f6d014d06%2FPicture3.png?generation=1688638747644320&alt=media" alt="">

    Average customer churn is 27%. The churn can take place if the tenure is more than >=7.5 and there is no internet service

    1. Tuning the model
    2. Define the search grid using the expand.grid function
    3. Set up the control parameters through 5 fold cross validation
    4. When we print the model we get the best CP = 0.01 and an accuracy of 79.00%

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F16080ac04d3743ec238227e1ef2c8269%2FPicture4.png?generation=1688639197455166&alt=media" alt="">

    1. Predict the model
    2. Find out the variables which are most and least significant. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F61beb4224e9351cfc772147c43800502%2FPicture5.png?generation=1688639468638950&alt=media" alt="">

    Significant variables are Internet Service, Tenure and the least significant are Streaming Movies, Tech Support.

    USE RANDOM FOREST

    1. Run library(randomForest). Here we are using the default ntree (500) and mtry (p/3) where p is the number of independent variables. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc27fe7e83f0b53b7e067371b69c7f4a7%2FPicture6.png?generation=1688640478682685&alt=media" alt="">

      Through confusion matrix, accuracy is coming 79.27%. The accuracy is marginally higher than that of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and much higher when predicting "Yes".

    2. Plot the model showing which variables reduce the gini impunity the most and least. Total charges and tenure reduce the gini impunity the most while phone service has the least impact.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fec25fc3ba74ab9cef1a81188209512b1%2FPicture7.png?generation=1688640726235724&alt=media" alt="">

    1. Predict the model and create a new data frame showing the actuals vs predicted values

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F50aa40e5dd676c8285020fd2fe627bf1%2FPicture8.png?generation=1688640896763066&alt=media" alt="">

    1. Plot the model so as to find out where the OOB (out of bag ) error stops decreasing or becoming constant. As we can see that the error stops decreasing between 100 to 200 trees. So we decide to take ntree = 200 when we tune the model.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F87211e1b218c595911fbe6ea2806e27a%2FPicture9.png?generation=1688641103367564&alt=media" alt="">

    Tune the model mtry=2 has the lowest OOB error rate

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F6057af5bb0719b16f1a97a58c3d4aa1d%2FPicture10.png?generation=1688641391027971&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc7045eba4ee298c58f1bd0230c24c00d%2FPicture11.png?generation=1688641605829830&alt=media" alt="">

    Use random forest with mtry = 2 and ntree = 200

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F01541eff1f9c6303591aa50dd707b5f5%2FPicture12.png?generation=1688641634979403&alt=media" alt="">

    Through confusion matrix, accuracy is coming 79.71%. The accuracy is marginally higher than that of default (when ntree was 500 and mtry was 4) i.e 79.27% and of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and m...

  2. c

    Data from: Customer Churn Dataset

    • cubig.ai
    Updated May 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Customer Churn Dataset [Dataset]. https://cubig.ai/store/products/256/customer-churn-dataset
    Explore at:
    Dataset updated
    May 25, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Customer Churn Dataset is a dataset that collects various customer characteristics and service usage information to predict whether or not communication service customers will turn.

    2) Data Utilization (1) Customer Churn Dataset has characteristics that: • The dataset consists of several categorical and numerical variables, including customer demographics, service types, contract information, charges, usage patterns, and Turn. (2) Customer Churn Dataset can be used to: • Development of customer churn prediction model : Machine learning and deep learning techniques can be used to develop classification models that predict churn based on customer characteristics and service use data. • Segmenting customers and developing marketing strategies : It can be used to analyze customer groups at high risk of departure and to design custom retention strategies or targeted marketing campaigns.

  3. Bank Customer Churn Dataset

    • kaggle.com
    Updated Jul 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhuvi Ranga (2023). Bank Customer Churn Dataset [Dataset]. https://www.kaggle.com/datasets/bhuviranga/customer-churn-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 11, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bhuvi Ranga
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    The customer churn dataset is a collection of customer data that focuses on predicting customer churn, which refers to the tendency of customers to stop using a company's products or services. The dataset contains various features that describe each customer, such as their credit score, country, gender, age, tenure, balance, number of products, credit card status, active membership, estimated salary, and churn status. The churn status indicates whether a customer has churned or not. The dataset is used to analyze and understand factors that contribute to customer churn and to build predictive models to identify customers at risk of churning. The goal is to develop strategies and interventions to reduce churn and improve customer retention

  4. Synthetic Customer Churn Prediction Dataset

    • opendatabay.com
    .undefined
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Opendatabay Labs (2025). Synthetic Customer Churn Prediction Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/5d7ef013-5848-4367-bf3b-2ce359587b43
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    May 6, 2025
    Dataset provided by
    Buy & Sell Data | Opendatabay - AI & Synthetic Data Marketplace
    Authors
    Opendatabay Labs
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Retail & Consumer Behavior
    Description

    This Synthetic Customer Churn Prediction Dataset has been designed as an educational resource for exploring data science, machine learning, and predictive modelling techniques in a customer retention context. The dataset simulates key attributes relevant to customer churn analysis, such as service usage, contract details, and customer demographics. It allows users to practice data manipulation, visualization, and the development of models to predict churn behaviour in industries like telecommunications, subscription services, or utilities.

    Dataset Features:

    • Customer_Id: Unique identifier for each customer (not included in this dataset for privacy).
    • Gender: Gender of the customer (e.g., "Male," "Female").
    • Partner: Whether the customer has a partner (e.g., "Yes," "No").
    • Dependents: Whether the customer has dependents (e.g., "Yes," "No").
    • Tenure (Months): The number of months the customer has been with the company.
    • PhoneService: Whether the customer has a phone service (e.g., "Yes," "No").
    • MultipleLines: Whether the customer has multiple phone lines (e.g., "Yes," "No phone service").
    • InternetService: Type of internet service (e.g., "DSL," "Fiber optic," "No").
    • OnlineSecurity: Whether the customer has online security services (e.g., "Yes," "No," "No internet service").
    • OnlineBackup: Whether the customer has online backup services (e.g., "Yes," "No," "No internet service").
    • DeviceProtection: Whether the customer has device protection services (e.g., "Yes," "No," "No internet service").
    • TechSupport: Whether the customer has tech support services (e.g., "Yes," "No," "No internet service").
    • StreamingTV: Whether the customer has streaming TV services (e.g., "Yes," "No," "No internet service").
    • StreamingMovies: Whether the customer has streaming movies services (e.g., "Yes," "No," "No internet service").
    • Contract: Type of contract the customer has (e.g., "Month-to-month," "One year," "Two year").
    • PaperlessBilling: Whether the customer uses paperless billing (e.g., "Yes," "No").
    • PaymentMethod: The payment method used by the customer (e.g., "Electronic check," "Credit card," "Bank transfer").
    • MonthlyCharges: Monthly charges billed to the customer.
    • TotalCharges: Total charges incurred by the customer over their tenure.
    • Churn: Whether the customer has churned (e.g., "Yes," "No").

    Distribution:

    https://storage.googleapis.com/opendatabay_public/images/churn_c4aae9d4-3939-4866-a249-35d81c5965dc.png" alt="Synthetic Customer Churn Prediction Dataset Distribution">

    Usage:

    This dataset is useful for a variety of applications, including:

    • Customer Behavior Analysis: To understand factors influencing customer retention and churn.
    • Educational Training: To practice data cleaning, feature engineering, and visualization techniques in customer analytics.
    • Predictive Modeling: To build machine learning models for predicting customer churn based on service usage patterns and demographic information.

    Coverage:

    This dataset is synthetic and anonymized, making it a safe tool for experimentation and learning without compromising real patient privacy.

    License:

    CCO (Public Domain)

    Who can use it:

    • Data scientists and enthusiasts: For developing customer analytics skills and predictive modelling expertise.
    • Business analysts: To understand customer churn drivers and improve retention strategies.
    • Educators and students: For teaching and learning applications in data science and machine learning.
  5. f

    Data from: A Proposed Churn Prediction Model

    • figshare.com
    pdf
    Updated Feb 24, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mona Nasr; Essam Shaaban; Yehia Helmy; Dr. Ayman Khedr (2019). A Proposed Churn Prediction Model [Dataset]. http://doi.org/10.6084/m9.figshare.7763183.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 24, 2019
    Dataset provided by
    figshare
    Authors
    Mona Nasr; Essam Shaaban; Yehia Helmy; Dr. Ayman Khedr
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Churn prediction aims to detect customers intended to leave a service provider. Retaining one customer costs an organization from 5 to 10 times than gaining a new one. Predictive models can provide correct identification of possible churners in the near future in order to provide a retention solution. This paper presents a new prediction model based on Data Mining (DM) techniques. The proposed model is composed of six steps which are; identify problem domain, data selection, investigate data set, classification, clustering and knowledge usage. A data set with 23 attributes and 5000 instances is used. 4000 instances used for training the model and 1000 instances used as a testing set. The predicted churners are clustered into 3 categories in case of using in a retention strategy. The data mining techniques used in this paper are Decision Tree, Support Vector Machine and Neural Network throughout an open source software name WEKA.

  6. f

    S1 Data -

    • plos.figshare.com
    zip
    Updated Oct 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang (2023). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0292466.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 11, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analyzing customers’ characteristics and giving the early warning of customer churn based on machine learning algorithms, can help enterprises provide targeted marketing strategies and personalized services, and save a lot of operating costs. Data cleaning, oversampling, data standardization and other preprocessing operations are done on 900,000 telecom customer personal characteristics and historical behavior data set based on Python language. Appropriate model parameters were selected to build BPNN (Back Propagation Neural Network). Random Forest (RF) and Adaboost, the two classic ensemble learning models were introduced, and the Adaboost dual-ensemble learning model with RF as the base learner was put forward. The four models and the other four classical machine learning models-decision tree, naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) were utilized respectively to analyze the customer churn data. The results show that the four models have better performance in terms of recall rate, precision rate, F1 score and other indicators, and the RF-Adaboost dual-ensemble model has the best performance. Among them, the recall rates of BPNN, RF, Adaboost and RF-Adaboost dual-ensemble model on positive samples are respectively 79%, 90%, 89%,93%, the precision rates are 97%, 99%, 98%, 99%, and the F1 scores are 87%, 95%, 94%, 96%. The RF-Adaboost dual-ensemble model has the best performance, and the three indicators are 10%, 1%, and 6% higher than the reference. The prediction results of customer churn provide strong data support for telecom companies to adopt appropriate retention strategies for pre-churn customers and reduce customer churn.

  7. c

    Data from: Telco Customer Churn Dataset

    • cubig.ai
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Telco Customer Churn Dataset [Dataset]. https://cubig.ai/store/products/312/telco-customer-churn-dataset
    Explore at:
    Dataset updated
    May 28, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Telco Customer Churn Dataset includes carrier customer service usage, account information, demographics and churn, which can be used to predict and analyze customer churn.

    2) Data Utilization (1) Telco Customer Churn Dataset has characteristics that: • This dataset includes a variety of customer and service characteristics, including gender, age group, partner and dependents, service subscription status (telephone, Internet, security, backup, device protection, technical support, streaming, etc.), contract type, payment method, monthly fee, total fee, and departure. (2) Telco Customer Churn Dataset can be used to: • Development of customer churn prediction model: Using customer service usage patterns and account information, we can build a machine learning-based churn prediction model to proactively identify customers at risk of churn.

  8. Synthetic Telecom Customer Churn Data

    • kaggle.com
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulrahman Qaten (2025). Synthetic Telecom Customer Churn Data [Dataset]. https://www.kaggle.com/datasets/abdulrahmanqaten/synthetic-customer-churn/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 27, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abdulrahman Qaten
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    If you found the dataset useful, your upvote will help others discover it. Thanks for your support!

    This dataset simulates customer behavior for a fictional telecommunications company. It contains demographic information, account details, services subscribed to, and whether the customer ultimately churned (stopped using the service) or not. The data is synthetically generated but designed to reflect realistic patterns often found in telecom churn scenarios.

    Purpose:

    The primary goal of this dataset is to provide a clean and straightforward resource for beginners learning about:

    • Exploratory Data Analysis (EDA): Understanding customer characteristics and identifying potential drivers of churn through visualization and statistical summaries.
    • Data Preprocessing: Handling categorical features (like converting text to numbers) and scaling numerical features.
    • Classification Modeling: Building and evaluating simple machine learning models (like Logistic Regression or Decision Trees) to predict customer churn.

    Features:

    The dataset includes the following columns:

    • CustomerID: Unique identifier for each customer.
    • Age: Customer's age in years.
    • Gender: Customer's gender (Male/Female).
    • Location: General location of the customer (e.g., New York, Los Angeles).
    • SubscriptionDurationMonths: How many months the customer has been subscribed.
    • MonthlyCharges: The amount the customer is charged each month.
    • TotalCharges: The total amount the customer has been charged over their subscription period.
    • ContractType: The type of contract the customer has (Month-to-month, One year, Two year).
    • PaymentMethod: How the customer pays their bill (e.g., Electronic check, Credit card).
    • OnlineSecurity: Whether the customer has online security service (Yes, No, No internet service).
    • TechSupport: Whether the customer has tech support service (Yes, No, No internet service).
    • StreamingTV: Whether the customer has TV streaming service (Yes, No, No internet service).
    • StreamingMovies: Whether the customer has movie streaming service (Yes, No, No internet service).
    • Churn: (Target Variable) Whether the customer churned (1 = Yes, 0 = No).

    Data Quality:

    This dataset is intentionally clean with no missing values, making it easy for beginners to focus on analysis and modeling concepts without complex data cleaning steps.

    Inspiration:

    Understanding customer churn is crucial for many businesses. This dataset provides a sandbox environment to practice the fundamental techniques used in churn analysis and prediction.

  9. Bank customer churn model prediction

    • kaggle.com
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    T S S ABHI RAM KOTIPALLI (2023). Bank customer churn model prediction [Dataset]. https://www.kaggle.com/datasets/tssabhiramkotipalli/bank-customer-churn-model-prediction/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    T S S ABHI RAM KOTIPALLI
    Description

    Dataset

    This dataset was created by T S S ABHI RAM KOTIPALLI

    Contents

  10. Data from: Telco Customer Churn

    • kaggle.com
    Updated Jul 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sibelius_5 (2021). Telco Customer Churn [Dataset]. https://www.kaggle.com/sibelius5/telco-customer-churn/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 22, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sibelius_5
    Description

    To increase the accuracy of the telco churn prediction models, I merged the following two datasets (using ID as index):

    https://www.kaggle.com/blastchar/telco-customer-churn https://www.kaggle.com/ylchang/telco-customer-churn-1113

    The additional features of the second dataset (like satisfaction score, total revenues and cltv) can increase the accuracy of the models.

    I cleaned the merged dataset and also added the clean version here ("Telco_customer_churn_cleaned.csv").

    Inspiration:

    The dataset can be used for EDA, classification, churn prediction, segmentation etc.

  11. AI-Powered Customer Churn Prediction Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). AI-Powered Customer Churn Prediction Market Research Report 2033 [Dataset]. https://dataintelo.com/report/ai-powered-customer-churn-prediction-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Jun 28, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    AI-Powered Customer Churn Prediction Market Outlook




    According to our latest research, the AI-powered customer churn prediction market size reached USD 1.58 billion globally in 2024, with a robust CAGR of 19.7% expected from 2025 to 2033. Driven by rapid digital transformation and the increasing need for predictive analytics across sectors, the market is forecasted to attain a value of USD 7.57 billion by 2033. The growth of this market is primarily attributed to the escalating adoption of AI and machine learning technologies by enterprises seeking to reduce customer attrition, optimize retention strategies, and enhance overall customer lifetime value, as per the latest industry research.




    One of the fundamental growth drivers for the AI-powered customer churn prediction market is the proliferation of customer data and the imperative need for businesses to leverage this data to drive actionable insights. With the advent of digital touchpoints, organizations are now able to collect vast amounts of structured and unstructured data from various customer interactions. This data, when processed using advanced AI and machine learning algorithms, empowers companies to predict potential churn with high accuracy. As a result, businesses across industries such as telecommunications, BFSI, retail, and healthcare are increasingly investing in AI-powered churn prediction solutions to proactively identify at-risk customers and implement targeted retention strategies, thereby reducing revenue loss and improving profitability.




    Another significant factor fueling market expansion is the growing emphasis on customer experience and personalization. In today's hyper-competitive landscape, retaining existing customers has become more cost-effective than acquiring new ones. AI-powered churn prediction tools enable organizations to segment their customer base, understand behavior patterns, and tailor interventions for individual customers. This level of personalization not only helps in reducing churn rates but also enhances customer satisfaction and loyalty. The integration of AI-driven insights into CRM systems and marketing automation platforms further streamlines the process, making it easier for businesses to act on predictions in real time. Moreover, the rising adoption of cloud-based solutions has made these technologies more accessible to small and medium enterprises (SMEs), broadening the market’s reach.




    The surge in demand for scalable, real-time analytics platforms is also contributing to market growth. Enterprises are increasingly seeking AI-powered solutions that can integrate seamlessly with their existing IT infrastructure, deliver instant insights, and scale as their data grows. The shift towards cloud deployment models has accelerated this trend, offering cost-effective, flexible, and easily deployable churn prediction solutions. Additionally, advancements in natural language processing (NLP), deep learning, and big data analytics are further enhancing the accuracy and reliability of churn prediction models. As organizations strive to stay ahead of the competition by minimizing customer attrition, the demand for sophisticated, AI-driven predictive analytics tools continues to rise.




    Regionally, North America holds the largest market share, followed by Europe and Asia Pacific. The dominance of North America can be attributed to the early adoption of AI technologies, presence of major technology vendors, and a strong focus on customer-centric strategies among enterprises in the region. Europe is also witnessing significant growth, driven by stringent regulations around data protection and a growing emphasis on customer retention in industries like BFSI and retail. The Asia Pacific region is expected to exhibit the highest CAGR during the forecast period, fueled by rapid digitalization, increasing investments in AI, and the expansion of e-commerce and telecommunications sectors. Latin America and the Middle East & Africa are also experiencing gradual adoption, primarily in financial services and telecommunications.



    Component Analysis




    The component segment of the AI-powered customer churn prediction market is categorized into software and services. The software segment dominates the market, accounting for the largest share in 2024, owing to the widespread deployment of advanced AI and machine learning platforms

  12. f

    Comparison of the proposed algorithms with other ensemble models.

    • plos.figshare.com
    xls
    Updated Jun 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari (2024). Comparison of the proposed algorithms with other ensemble models. [Dataset]. http://doi.org/10.1371/journal.pone.0303881.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 6, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Kaveh Faraji Googerdchi; Shahrokh Asadi; Seyed Mohammadbagher Jafari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of the proposed algorithms with other ensemble models.

  13. t

    Telco_Customer_churn_Data

    • test.researchdata.tuwien.at
    bin, csv, png
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erum Naz; Erum Naz; Erum Naz; Erum Naz (2025). Telco_Customer_churn_Data [Dataset]. http://doi.org/10.82556/b0ch-cn44
    Explore at:
    png, csv, binAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    TU Wien
    Authors
    Erum Naz; Erum Naz; Erum Naz; Erum Naz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 28, 2025
    Description

    Context and Methodology

    The dataset originates from the research domain of Customer Churn Prediction in the Telecom Industry. It was created as part of the project "Data-Driven Churn Prediction: ML Solutions for the Telecom Industry," completed within the Data Stewardship course (Master programme Data Science, TU Wien).

    The primary purpose of this dataset is to support machine learning model development for predicting customer churn based on customer demographics, service usage, and account information.
    The dataset enables the training, testing, and evaluation of classification algorithms, allowing researchers and practitioners to explore techniques for customer retention optimization.

    The dataset was originally obtained from the IBM Accelerator Catalog and adapted for academic use. It was uploaded to TU Wien’s DBRepo test system and accessed via SQLAlchemy connections to the MariaDB environment.

    Technical Details

    The dataset has a tabular structure and was initially stored in CSV format. It contains:

    • Rows: 7,043 customer records

    • Columns: 21 features including customer attributes (gender, senior citizen status, partner status), account information (tenure, contract type, payment method), service usage (internet service, streaming TV, tech support), and the target variable (Churn: Yes/No).

    Naming Convention:

    • The table in the database is named telco_customer_churn_data.

    Software Requirements:

    • To open and work with the dataset, any standard database client or programming language supporting MariaDB connections can be used (e.g., Python etc).

    • For machine learning applications, libraries such as pandas, scikit-learn, and joblib are typically used.

    Additional Resources:

    Further Details

    When reusing the dataset, users should be aware:

    • Licensing: The dataset is shared under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

    • Use Case Suitability: The dataset is best suited for classification tasks, particularly binary classification (churn vs. no churn).

    • Metadata Standards: Metadata describing the dataset adheres to FAIR principles and is supplemented by CodeMeta and Croissant standards for improved interoperability.

  14. AI-Powered Customer Churn Prediction Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). AI-Powered Customer Churn Prediction Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/ai-powered-customer-churn-prediction-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Jun 28, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    AI-Powered Customer Churn Prediction Market Outlook




    According to our latest research, the AI-powered customer churn prediction market size reached USD 1.96 billion globally in 2024, with a robust CAGR of 18.3% projected through the forecast period. By 2033, the market is expected to hit USD 8.87 billion, driven by the increasing adoption of AI and machine learning solutions across multiple industries to proactively manage and reduce customer attrition. The rapid digital transformation and the growing emphasis on customer experience optimization have emerged as primary growth factors fueling the expansion of this dynamic market.




    One of the core growth factors propelling the AI-powered customer churn prediction market is the exponential increase in customer data generation across industries. As businesses increasingly digitize their operations, vast amounts of customer interactions, behavioral data, and transactional records are being accumulated every day. AI-powered churn prediction tools leverage advanced analytics and machine learning algorithms to extract actionable insights from this data, allowing companies to identify at-risk customers with high accuracy. This enables organizations to implement timely retention strategies, reduce churn rates, and ultimately boost long-term profitability. The continuous evolution of AI algorithms, including deep learning and natural language processing, further enhances the predictive capabilities of these solutions, making them indispensable in highly competitive sectors such as telecommunications, BFSI, and retail.




    Another significant driver is the escalating demand for personalized customer experiences. Modern consumers expect brands to anticipate their needs and deliver tailored interactions across all touchpoints. AI-powered customer churn prediction systems empower businesses to segment their customer base, understand individual preferences, and proactively address potential pain points. This targeted approach not only improves customer satisfaction but also increases the effectiveness of marketing campaigns and retention efforts. Moreover, the integration of AI with CRM platforms and omnichannel engagement tools has streamlined the deployment of churn prediction models, making them accessible even to small and medium-sized enterprises. The ability to automate and scale these insights across large customer populations is a critical factor stimulating market growth.




    The rising cost of customer acquisition compared to retention is also amplifying the importance of AI-powered churn prediction solutions. As competition intensifies and customer loyalty becomes harder to secure, organizations are prioritizing strategies that maximize the lifetime value of existing clients. AI-driven churn analytics provide a cost-effective means to identify early warning signals and intervene before customers decide to leave. This not only reduces the financial impact of churn but also enhances brand reputation and customer advocacy. The scalability, real-time processing, and predictive accuracy offered by AI solutions are attracting investments from both established enterprises and emerging startups, further accelerating market expansion.




    Regionally, North America continues to dominate the AI-powered customer churn prediction market, accounting for the largest revenue share in 2024. The region’s advanced technological infrastructure, high digital adoption rates, and concentration of leading AI vendors are key contributors to its leadership position. However, the Asia Pacific region is poised for the fastest growth, fueled by the rapid digitization of economies, increasing mobile and internet penetration, and rising investments in AI and analytics by enterprises. Europe also presents significant opportunities, particularly in sectors like BFSI and retail, where regulatory pressures and customer-centricity are driving early adoption of churn prediction tools. The market landscape in Latin America and the Middle East & Africa is evolving, with organizations gradually recognizing the value of proactive churn management in enhancing competitiveness and customer loyalty.



    "https://growthmarketreports.com/request-sample/12113">
    <button class="btn btn-lg text-center"

  15. P

    Travel Dataset

    • paperswithcode.com
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Travel Dataset [Dataset]. https://paperswithcode.com/dataset/travel
    Explore at:
    Dataset updated
    Dec 3, 2024
    Description

    A Tour & Travels Company Wants To Predict Whether A Customer Will Churn Or Not Based On Indicators Given Below. Help Build Predictive Models And Save The Company's Money. Perform Fascinating EDAs. The Data Was Used For Practice Purposes And Also During A Mini Hackathon, Its Completely Free To Use

  16. C

    Customer Churn Software Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Customer Churn Software Report [Dataset]. https://www.marketresearchforecast.com/reports/customer-churn-software-56060
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Mar 25, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Customer Churn Software market is experiencing robust growth, driven by the increasing need for businesses across diverse sectors to improve customer retention and enhance profitability. The market's expansion is fueled by several key factors. Firstly, the rising adoption of cloud-based solutions offers scalability and cost-effectiveness, attracting a wider range of businesses. Secondly, advancements in AI and machine learning are enabling more sophisticated churn prediction and proactive customer engagement strategies. The telecommunications, banking and finance, and retail and e-commerce sectors are currently leading the adoption, leveraging the software to identify at-risk customers and implement targeted retention programs. However, factors such as high implementation costs, integration challenges with existing systems, and the need for skilled personnel to manage the software can act as restraints on market growth. We project a substantial market expansion in the coming years, with a steady compound annual growth rate (CAGR) contributing to a significant increase in market value. The competitive landscape is dynamic, with established players like IBM, Salesforce, and Microsoft competing alongside specialized churn management solution providers. This competition fosters innovation and drives the development of more advanced features and functionalities. Looking ahead, the market will witness further consolidation through mergers and acquisitions, as larger companies seek to expand their market share. The increasing emphasis on data privacy and security regulations will also shape market dynamics, with vendors focusing on compliant solutions. The market is expected to witness the rise of niche solutions tailored to specific industry segments, providing customized functionalities. The geographic distribution of the market is expected to remain concentrated in North America and Europe initially, with significant growth potential in emerging markets like Asia Pacific and the Middle East & Africa, fueled by increasing digitalization and adoption of sophisticated business analytics. The continued evolution of AI and machine learning algorithms will be crucial in improving the accuracy and efficiency of churn prediction models, further enhancing the value proposition of Customer Churn Software. This convergence of technological advancement, regulatory compliance, and industry-specific needs will shape the future trajectory of the Customer Churn Software market.

  17. A

    ‘Customer Churn’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Mar 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2018). ‘Customer Churn’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-customer-churn-4f0b/a31eb722/?iid=005-065&v=presentation
    Explore at:
    Dataset updated
    Mar 5, 2018
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Customer Churn’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hassanamin/customer-churn on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    Binary Customer Churn

    A marketing agency has many customers that use their service to produce ads for the client/customer websites. They've noticed that they have quite a bit of churn in clients. They basically randomly assign account managers right now, but want you to create a machine learning model that will help predict which customers will churn (stop buying their service) so that they can correctly assign the customers most at risk to churn an account manager. Luckily they have some historical data, can you help them out? Create a classification algorithm that will help classify whether or not a customer churned. Then the company can test this against incoming data for future customers to predict which customers will churn and assign them an account manager.

    Content

    The data is saved as customer_churn.csv. Here are the fields and their definitions:

    Name : Name of the latest contact at Company

    Age: Customer Age

    Total_Purchase: Total Ads Purchased

    Account_Manager: Binary 0=No manager, 1= Account manager assigned

    Years: Totaly Years as a customer

    Num_sites: Number of websites that use the service.

    Onboard_date: Date that the name of the latest contact was onboarded

    Location: Client HQ Address

    Company: Name of Client Company

    Once you've created the model and evaluated it, test out the model on some new data (you can think of this almost like a hold-out set) that your client has provided, saved under new_customers.csv. The client wants to know which customers are most likely to churn given this data (they don't have the label yet).

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

    --- Original source retains full ownership of the source dataset ---

  18. f

    Details of feature variables of the data set.

    • plos.figshare.com
    xls
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Details of feature variables of the data set. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.

  19. Analyzing Customer Churn and Its Impact on Revenue

    • kaggle.com
    Updated Dec 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Analyzing Customer Churn and Its Impact on Revenue [Dataset]. https://www.kaggle.com/datasets/thedevastator/analyzing-customer-churn-and-its-impact-on-reven/versions/2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 25, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Analyzing Customer Churn and Its Impact on Revenue

    Exploring Patterns and Trends

    By [source]

    About this dataset

    This dataset contains customer data from multiple sources that can be used to predict customer churn and analyze its effect on revenue. We'll use this data to gain insights into customer behavior, such as when customers are likely to churn, how their behavior affects revenue and what patterns of behavior can help us better understand customers. This dataset features several different attributes for each customer: their unique identifier, total charges paid over time, contract information and more. Additionally, we can use the predictive analytical models based on this data to identify at-risk customers that may be more likely to churn in the near future. By gaining deep insight into which customers are most likely to leave and why they are leaving, businesses will be better equipped with tools necessary for taking proactive measures against potential revenue losses due to customer churn

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is an excellent tool for businesses to understand what factors are associated with customer churn and its impact on revenue. It can provide insights into which customers are most likely to leave, and how companies can prevent them from leaving.

    To use this dataset, here are the steps businesses can follow: 1. Understand each of the data points available in the dataset and what they represent - For example, CustomerID is a unique identifier for each customer, Churn indicates if a customer has left the company or not, gender denotes what gender the customer is etc. 2. Analyze any trends or patterns in your data – Look out for correlations between different variables like OnlineSecurity usage and Churn rate or MonthlyCharges and tenure to determine how these variables affect customers’ decisions to stay with a company or leave it etc. 3. Use machine learning models on your dataset – Utilize supervised learning algorithms such as logistic regression on this dataset to determine which variable most closely correlates with loyalty of customers i.e., which variable will decide whether a particular customer will stay with your company or not?
    4. Explore various ways of increasing retention rates – Think about ways you could incentivize customers who might be considering leaving their current provider (for example, offer discounts, free trials etc.). You could try different strategies like A/B testing too see which incentive works best for churn prevention/retention rate increase etc. 5.. Test out strategies before implementing them - Once you have decided on incentives that might work well, run small scale tests to check if they generate desired results before investing resources into full rollout programs .The systems based on machine learning algorithms allows you to quickly test assumptions efficiently without large investments in time & money prior committing these changes fully operational processes

    Research Ideas

    • Using customer data to identify and target customers who are at a high risk of churning to counter this effect with relevant customer service initiatives.
    • Analyzing the effects of promotional campaigns and loyalty programs on customer retention rates and overall revenue.
    • Machine learning models that predict future chances of customer churn which can be used by businesses to improve strategies for better retention & profitability

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: dataset1.csv | Column name | Description | |:---------------------|:-----------------------------------------------------------------| | CustomerID | Unique identifier for each customer. (Integer) | | Churn | Whether or not the customer has churned. (Boolean) | | gender | Gender of the customer. (String) | | SeniorCitizen | Whether or not the customer is a senior citizen. (Boolean) ...

  20. i

    Data from: Customer Churn Dataset

    • ieee-dataport.org
    Updated Jun 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Usman JOY (2024). Customer Churn Dataset [Dataset]. https://ieee-dataport.org/documents/customer-churn-dataset
    Explore at:
    Dataset updated
    Jun 4, 2024
    Authors
    Usman JOY
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    259

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
vikram amin (2023). Customer Churn - Decision Tree & Random Forest [Dataset]. https://www.kaggle.com/datasets/vikramamin/customer-churn-decision-tree-and-random-forest
Organization logo

Customer Churn - Decision Tree & Random Forest

Predicting the Customer Churn for a Telecom Company

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 6, 2023
Dataset provided by
Kaggle
Authors
vikram amin
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description
  • Main objective: Find out customers who will churn and who will not.
  • Methodology: It is a classification problem. We will use decision tree and random forest to predict the outcome.
  • Steps Involved
  • Read the data
  • Check for data types https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F1ffb600d8a4b4b36bc25e957524a3524%2FPicture1.png?generation=1688638600831386&alt=media" alt="">
  1. Change character vector to factor vector as this is as classification problem
  2. Drop the variable which is not significant for the analysis. We drop "customerID".
  3. Check for missing values. None are found.
  4. Split the data into train and test so we can use the train data for building the model and use test data for prediction. We split this into 80-20 ratio (train/test) using the sample function.
  5. Install and run libraries (rpart, rpart.plot, rattle, RColorBrewer, caret)
  6. Run decision tree using rpart function. The dependent variable is Churn and 19 other independent variables

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8d3442e6c82d8026c6a448e4780ab38c%2FPicture2.png?generation=1688638685268853&alt=media" alt=""> 9. Plot the decision tree

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F9ab0591e323dc30fe116c79f6d014d06%2FPicture3.png?generation=1688638747644320&alt=media" alt="">

Average customer churn is 27%. The churn can take place if the tenure is more than >=7.5 and there is no internet service

  1. Tuning the model
  2. Define the search grid using the expand.grid function
  3. Set up the control parameters through 5 fold cross validation
  4. When we print the model we get the best CP = 0.01 and an accuracy of 79.00%

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F16080ac04d3743ec238227e1ef2c8269%2FPicture4.png?generation=1688639197455166&alt=media" alt="">

  1. Predict the model
  2. Find out the variables which are most and least significant. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F61beb4224e9351cfc772147c43800502%2FPicture5.png?generation=1688639468638950&alt=media" alt="">

Significant variables are Internet Service, Tenure and the least significant are Streaming Movies, Tech Support.

USE RANDOM FOREST

  1. Run library(randomForest). Here we are using the default ntree (500) and mtry (p/3) where p is the number of independent variables. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc27fe7e83f0b53b7e067371b69c7f4a7%2FPicture6.png?generation=1688640478682685&alt=media" alt="">

    Through confusion matrix, accuracy is coming 79.27%. The accuracy is marginally higher than that of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and much higher when predicting "Yes".

  2. Plot the model showing which variables reduce the gini impunity the most and least. Total charges and tenure reduce the gini impunity the most while phone service has the least impact.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fec25fc3ba74ab9cef1a81188209512b1%2FPicture7.png?generation=1688640726235724&alt=media" alt="">

  1. Predict the model and create a new data frame showing the actuals vs predicted values

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F50aa40e5dd676c8285020fd2fe627bf1%2FPicture8.png?generation=1688640896763066&alt=media" alt="">

  1. Plot the model so as to find out where the OOB (out of bag ) error stops decreasing or becoming constant. As we can see that the error stops decreasing between 100 to 200 trees. So we decide to take ntree = 200 when we tune the model.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F87211e1b218c595911fbe6ea2806e27a%2FPicture9.png?generation=1688641103367564&alt=media" alt="">

Tune the model mtry=2 has the lowest OOB error rate

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F6057af5bb0719b16f1a97a58c3d4aa1d%2FPicture10.png?generation=1688641391027971&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fc7045eba4ee298c58f1bd0230c24c00d%2FPicture11.png?generation=1688641605829830&alt=media" alt="">

Use random forest with mtry = 2 and ntree = 200

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F01541eff1f9c6303591aa50dd707b5f5%2FPicture12.png?generation=1688641634979403&alt=media" alt="">

Through confusion matrix, accuracy is coming 79.71%. The accuracy is marginally higher than that of default (when ntree was 500 and mtry was 4) i.e 79.27% and of decision tree i.e 79.00%. The error rate is pretty low when predicting "No" and m...

Search
Clear search
Close search
Google apps
Main menu