14 datasets found
  1. Telco Customer Churn

    • kaggle.com
    zip
    Updated Feb 23, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BlastChar (2018). Telco Customer Churn [Dataset]. https://www.kaggle.com/datasets/blastchar/telco-customer-churn
    Explore at:
    zip(175758 bytes)Available download formats
    Dataset updated
    Feb 23, 2018
    Authors
    BlastChar
    Description

    Context

    "Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]

    Content

    Each row represents a customer, each column contains customer’s attributes described on the column Metadata.

    The data set includes information about:

    • Customers who left within the last month – the column is called Churn
    • Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
    • Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges
    • Demographic info about customers – gender, age range, and if they have partners and dependents

    Inspiration

    To explore this type of models and learn more about the subject.

    New version from IBM: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113

  2. Telecom Data Analysis(Customer Churn Analysis)

    • kaggle.com
    zip
    Updated Jun 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajdeep Kaur Bajwa (2022). Telecom Data Analysis(Customer Churn Analysis) [Dataset]. https://www.kaggle.com/datasets/rajdeepkaurbajwa/telecom-data-analysis
    Explore at:
    zip(245420 bytes)Available download formats
    Dataset updated
    Jun 25, 2022
    Authors
    Rajdeep Kaur Bajwa
    Description

    Dataset

    This dataset was created by Rajdeep Kaur Bajwa

    Contents

  3. churn_modelling

    • kaggle.com
    zip
    Updated Jun 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manasvi Kirti (2025). churn_modelling [Dataset]. https://www.kaggle.com/datasets/manasvikirti/churn-modelling
    Explore at:
    zip(267787 bytes)Available download formats
    Dataset updated
    Jun 27, 2025
    Authors
    Manasvi Kirti
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains information on customer demographics, account details, and service usage patterns to analyze and predict customer churn. It is commonly used in churn modeling projects to develop machine learning models that classify whether a customer is likely to leave (churn) or stay. The dataset is suitable for tasks such as Exploratory Data Analysis (EDA), feature engineering, model training, and evaluation.

    Key Features May Include:

    CustomerID: Unique identifier for each customer

    Gender, Age: Demographic details

    Tenure: Number of months the customer has stayed

    Balance, EstimatedSalary: Financial features

    IsActiveMember, HasCrCard: Behavioral indicators

    Exited: Target variable indicating churn (1 = churned, 0 = retained)

  4. S1 Data -

    • plos.figshare.com
    zip
    Updated Oct 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang (2023). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0292466.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 11, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analyzing customers’ characteristics and giving the early warning of customer churn based on machine learning algorithms, can help enterprises provide targeted marketing strategies and personalized services, and save a lot of operating costs. Data cleaning, oversampling, data standardization and other preprocessing operations are done on 900,000 telecom customer personal characteristics and historical behavior data set based on Python language. Appropriate model parameters were selected to build BPNN (Back Propagation Neural Network). Random Forest (RF) and Adaboost, the two classic ensemble learning models were introduced, and the Adaboost dual-ensemble learning model with RF as the base learner was put forward. The four models and the other four classical machine learning models-decision tree, naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) were utilized respectively to analyze the customer churn data. The results show that the four models have better performance in terms of recall rate, precision rate, F1 score and other indicators, and the RF-Adaboost dual-ensemble model has the best performance. Among them, the recall rates of BPNN, RF, Adaboost and RF-Adaboost dual-ensemble model on positive samples are respectively 79%, 90%, 89%,93%, the precision rates are 97%, 99%, 98%, 99%, and the F1 scores are 87%, 95%, 94%, 96%. The RF-Adaboost dual-ensemble model has the best performance, and the three indicators are 10%, 1%, and 6% higher than the reference. The prediction results of customer churn provide strong data support for telecom companies to adopt appropriate retention strategies for pre-churn customers and reduce customer churn.

  5. Customer Churn Prediction

    • kaggle.com
    zip
    Updated May 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rashad Mammadov (2024). Customer Churn Prediction [Dataset]. https://www.kaggle.com/datasets/rashadrmammadov/customer-churn-dataset/code
    Explore at:
    zip(121952 bytes)Available download formats
    Dataset updated
    May 29, 2024
    Authors
    Rashad Mammadov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    File Description:

    The dataset contains information about customers and their churn status. Each row represents a customer, and each column contains customer attributes and information.

    Column Descriptions:

    • customerID: Unique identifier for each customer.
    • gender: Gender of the customer (Male, Female).
    • SeniorCitizen: Whether the customer is a senior citizen or not (1: Yes, 0: No).
    • Partner: Whether the customer has a partner or not (Yes, No).
    • Dependents: Whether the customer has dependents or not (Yes, No).
    • tenure: Number of months the customer has stayed with the company.
    • PhoneService: Whether the customer has a phone service or not (Yes, No).
    • MultipleLines: Whether the customer has multiple lines or not (Yes, No, No phone service).
    • InternetService: Type of internet service the customer has (DSL, Fiber optic, No).
    • OnlineSecurity: Whether the customer has online security or not (Yes, No, No internet service).
    • OnlineBackup: Whether the customer has online backup or not (Yes, No, No internet service).
    • DeviceProtection: Whether the customer has device protection or not (Yes, No, No internet service).
    • TechSupport: Whether the customer has tech support or not (Yes, No, No internet service).
    • StreamingTV: Whether the customer has streaming TV or not (Yes, No, No internet service).
    • StreamingMovies: Whether the customer has streaming movies or not (Yes, No, No internet service).
    • Contract: The contract term of the customer (Month-to-month, One year, Two year).
    • PaperlessBilling: Whether the customer has paperless billing or not (Yes, No).
    • PaymentMethod: The payment method of the customer (Electronic check, Mailed check, Bank transfer, Credit card).
    • MonthlyCharges: The amount charged to the customer monthly.
    • TotalCharges: The total amount charged to the customer.
    • Churn: Whether the customer churned or not (Yes, No).
  6. Bank Customer Churn

    • kaggle.com
    zip
    Updated Aug 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandile Desmond Mfazi (2024). Bank Customer Churn [Dataset]. https://www.kaggle.com/datasets/sandiledesmondmfazi/bank-customer-churn
    Explore at:
    zip(12679114 bytes)Available download formats
    Dataset updated
    Aug 8, 2024
    Authors
    Sandile Desmond Mfazi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Botswana Bank Customer Churn Dataset

    Dataset Overview

    This synthetic dataset simulates customer data for a fictional bank in Botswana, specifically designed to model customer churn behavior. It includes a comprehensive set of customer demographics, financial data, product usage, and behavioral indicators that could influence whether a customer decides to leave the bank. The dataset is generated using the Python Faker library, ensuring realistic but entirely fictional data points for educational, testing, and modeling purposes.

    Dataset Highlights

    Number of Records: 115,640 customers Churn Rate: Determined by a calculated churn risk score based on several customer attributes Geographical Focus: Botswana Data Structure: The dataset is organized in a tabular format, with each row representing a unique customer

    Use Cases

    This dataset is ideal for the following applications:

    Churn Prediction Modeling: Building and evaluating machine learning models to predict customer churn. Customer Segmentation: Analyzing customer profiles and segmenting them based on various demographics and financial attributes. Product Analysis: Understanding which products are most associated with customer retention or churn. Educational Purposes: Teaching data science and machine learning concepts using a realistic dataset.

  7. F1 Drivers Churn Prediction

    • kaggle.com
    zip
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teo Calvo (2025). F1 Drivers Churn Prediction [Dataset]. https://www.kaggle.com/datasets/teocalvo/f1-drivers-churn-prediction/discussion
    Explore at:
    zip(674315 bytes)Available download formats
    Dataset updated
    Apr 28, 2025
    Authors
    Teo Calvo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Data collected from fastF1 Python library since 1990 to now.

    All data was processed and created a new vision, like a feature store, considering a bunch of statistics from the driver's performance on the grid.

    Joining this information, we added an info flag if the driver is keeping in the F1 category in the next season (year).

    All development was made on live and you can take a look on this project on GitHub: https://github.com/TeoMeWhy/tmw-lake

  8. SaaS Subscription & Churn Analytics Dataset

    • kaggle.com
    zip
    Updated Jul 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    River | Datasets for SQL Practice (2025). SaaS Subscription & Churn Analytics Dataset [Dataset]. https://www.kaggle.com/datasets/rivalytics/saas-subscription-and-churn-analytics-dataset/discussion
    Explore at:
    zip(600149 bytes)Available download formats
    Dataset updated
    Jul 21, 2025
    Authors
    River | Datasets for SQL Practice
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    RavenStack is a fictional AI-powered collaboration platform used to simulate a real-world SaaS business. This simulated dataset was created using Python and ChatGPT specifically for people learning data analysis, business intelligence, or data science. It offers a realistic environment to practice SQL joins, cohort analysis, churn modeling, revenue tracking, and support analytics using a multi-table relational structure.

    The dataset spans 5 CSV files:

    • accounts.csv – customer metadata

    • subscriptions.csv – subscription lifecycles and revenue

    • feature_usage.csv – daily product interaction logs

    • support_tickets.csv – support activity and satisfaction scores

    • churn_events.csv – churn dates, reasons, and refund behaviors

    Users can explore trial-to-paid conversion, MRR trends, upgrade funnels, feature adoption, support patterns, churn drivers, and reactivation cycles. The dataset supports temporal and cohort analyses, and has built-in edge cases for testing real-world logic.

  9. Churn Prediction and Transaction Forecasting

    • kaggle.com
    zip
    Updated Aug 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richa Patel (2025). Churn Prediction and Transaction Forecasting [Dataset]. https://www.kaggle.com/datasets/richapatel912/churn-prediction-and-transaction-forecasting/discussion
    Explore at:
    zip(3258802 bytes)Available download formats
    Dataset updated
    Aug 19, 2025
    Authors
    Richa Patel
    Description

    “AI-Powered Banking Analytics: Automated Power BI Documentation, Churn Prediction, and Transaction Forecasting”

    Project Workflow 1. Data Acquisition (Kaggle) • Dataset sourced from Kaggle (credit card / banking dataset). • Contains customer demographics, credit card transactions, and account details. • Cleaned and transformed data in Power BI for dashboard building.

    1. Interactive Power BI Dashboard • Built two key analytics pages:
    2. Customer Churn Insights → shows churn risk, drivers, segmentation.
    3. Transaction Forecasting → predicts future monthly transactions with confidence bands. • Added KPI cards, slicers, and professional formatting. • Ensured design follows: customer risk, forecasting, and governance. _
    4. Automated Documentation (Python + VPAX) • Exported the Power BI data model (VPAX) using DAX Studio. • Created a Python script to automatically generate: o Word doc with model documentation. o Excel file with tables, relationships, and fields. o ER diagram image. • This automation saves analysts hours of manual work and enforces governance. _
    5. Churn Prediction Model (Python + Power BI) • Built a Random Forest model for churn prediction. • Output: o Customer-level churn probability. o Risk categories (Low, Medium, High). o Feature importance (drivers of churn). • Exported predictions to Excel → Imported into Power BI. • Added Churn Risk Dashboard: o Distribution of churn risk. o Top churn drivers (feature importance bar chart). _
    6. Transaction Forecasting Model (Python + Prophet) • Used Prophet (Facebook’s forecasting library) to model monthly transaction volumes. • Forecasted next 12 months with confidence intervals (yhat_lower, yhat_upper). • Exported results to Excel → Integrated into Power BI. • Added Transaction Forecasting Dashboard: o Actual vs Forecast line chart (with confidence band). o KPI cards (Next Month Forecast, YoY Growth). o Clustered column chart for recent 12 months. _
    7. End-to-End Data & AI Pipeline • Data Source (Kaggle) → Power BI Dashboard → Automated Documentation → AI/ML Models → Power BI Insights.

    File Details: File / Folder Name Description .idea/ PyCharm IDE configuration folder (auto-generated). Churn Prediction + Forecasting.py Main Python script for churn prediction (Random Forest) and transaction forecasting (Prophet). churn_model.pkl Saved machine learning model (Random Forest) for churn prediction. Churn_Predictions.xlsx Excel output of churn probabilities and risk categories per customer. Credit Card Financial Dashboard.pbix Power BI dashboard file (interactive BI report). Credit Card Financial Dashboard.pdf Exported PDF version of the Power BI dashboard. credit_card.xlsx Kaggle dataset (credit card transactions / account features). customer.xlsx Kaggle dataset (customer demographic and account info). DocumentationGenerator.py Python script that parses VPAX model and generates automated Power BI documentation. Feature_Importance.xlsx Feature importance scores from churn model (top churn drivers). forecast_model.pkl Saved Prophet model for forecasting monthly transactions. LICENSE License file for open-source/public sharing. model.vpax Exported Power BI data model (via DAX Studio) for documentation. PowerBI_Documentation.docx Word output of auto-generated Power BI documentation. PowerBI_Documentation.xlsx Excel output of auto-generated Power BI documentation. PowerBI_ER_Diagram.png Entity-Relationship diagram image generated from Power BI model. README.md Markdown summary file for GitHub/Kaggle. Transaction_Forecast.xlsx Excel output containing actuals + forecast (Prophet) with confidence bounds.

  10. E-Commerce Customer Behavior & Sales Analysis -TR

    • kaggle.com
    zip
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UmutUygurr (2025). E-Commerce Customer Behavior & Sales Analysis -TR [Dataset]. https://www.kaggle.com/datasets/umuttuygurr/e-commerce-customer-behavior-and-sales-analysis-tr
    Explore at:
    zip(138245 bytes)Available download formats
    Dataset updated
    Oct 29, 2025
    Authors
    UmutUygurr
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🛒 E-Commerce Customer Behavior and Sales Dataset 📊 Dataset Overview This comprehensive dataset contains 5,000 e-commerce transactions from a Turkish online retail platform, spanning from January 2023 to March 2024. The dataset provides detailed insights into customer demographics, purchasing behavior, product preferences, and engagement metrics.

    🎯 Use Cases This dataset is perfect for:

    Customer Segmentation Analysis: Identify distinct customer groups based on behavior Sales Forecasting: Predict future sales trends and patterns Recommendation Systems: Build product recommendation engines Customer Lifetime Value (CLV) Prediction: Estimate customer value Churn Analysis: Identify customers at risk of leaving Marketing Campaign Optimization: Target customers effectively Price Optimization: Analyze price sensitivity across categories Delivery Performance Analysis: Optimize logistics and shipping 📁 Dataset Structure The dataset contains 18 columns with the following features:

    Order Information Order_ID: Unique identifier for each order (ORD_XXXXXX format) Date: Transaction date (2023-01-01 to 2024-03-26) Customer Demographics Customer_ID: Unique customer identifier (CUST_XXXXX format) Age: Customer age (18-75 years) Gender: Customer gender (Male, Female, Other) City: Customer city (10 major Turkish cities) Product Information Product_Category: 8 categories (Electronics, Fashion, Home & Garden, Sports, Books, Beauty, Toys, Food) Unit_Price: Price per unit (in TRY/Turkish Lira) Quantity: Number of units purchased (1-5) Transaction Details Discount_Amount: Discount applied (if any) Total_Amount: Final transaction amount after discount Payment_Method: Payment method used (5 types) Customer Behavior Metrics Device_Type: Device used for purchase (Mobile, Desktop, Tablet) Session_Duration_Minutes: Time spent on website (1-120 minutes) Pages_Viewed: Number of pages viewed during session (1-50) Is_Returning_Customer: Whether customer has purchased before (True/False) Post-Purchase Metrics Delivery_Time_Days: Delivery duration (1-30 days) Customer_Rating: Customer satisfaction rating (1-5 stars) 📈 Key Statistics Total Records: 5,000 transactions Date Range: January 2023 - March 2024 (15 months) Average Transaction Value: ~450 TRY Customer Satisfaction: 3.9/5.0 average rating Returning Customer Rate: 60% Mobile Usage: 55% of transactions 🔍 Data Quality ✅ No missing values ✅ Consistent formatting across all fields ✅ Realistic data distributions ✅ Proper data types for all columns ✅ Logical relationships between features 💡 Sample Analysis Ideas Customer Segmentation with K-Means Clustering

    Segment customers based on spending, frequency, and recency Sales Trend Analysis

    Identify seasonal patterns and peak shopping periods Product Category Performance

    Compare revenue, ratings, and return rates across categories Device-Based Behavior Analysis

    Understand how device choice affects purchasing patterns Predictive Modeling

    Build models to predict customer ratings or purchase amounts City-Level Market Analysis

    Compare market performance across different cities 🛠️ Technical Details File Format: CSV (Comma-Separated Values) Encoding: UTF-8 File Size: ~500 KB Delimiter: Comma (,) 📚 Column Descriptions Column Name Data Type Description Example Order_ID String Unique order identifier ORD_001337 Customer_ID String Unique customer identifier CUST_01337 Date DateTime Transaction date 2023-06-15 Age Integer Customer age 35 Gender String Customer gender Female City String Customer city Istanbul Product_Category String Product category Electronics Unit_Price Float Price per unit 1299.99 Quantity Integer Units purchased 2 Discount_Amount Float Discount applied 129.99 Total_Amount Float Final amount paid 2469.99 Payment_Method String Payment method Credit Card Device_Type String Device used Mobile Session_Duration_Minutes Integer Session time 15 Pages_Viewed Integer Pages viewed 8 Is_Returning_Customer Boolean Returning customer True Delivery_Time_Days Integer Delivery duration 3 Customer_Rating Integer Satisfaction rating 5 🎓 Learning Outcomes By working with this dataset, you can learn:

    Data cleaning and preprocessing techniques Exploratory Data Analysis (EDA) with Python/R Statistical analysis and hypothesis testing Machine learning model development Data visualization best practices Business intelligence and reporting 📝 Citation If you use this dataset in your research or project, please cite:

    E-Commerce Customer Behavior and Sales Dataset (2024) Turkish Online Retail Platform Data (2023-2024) Available on Kaggle ⚖️ License This dataset is released under the CC0: Public Domain license. You are free to use it for any purpose.

    🤝 Contribution Found any issues or have suggestions? Feel free to provide feedback!

    📞 Contact For questions or collaborations, please reach out through Kaggle.

    Happy Analyzing! 🚀

    Keywords: e-c...

  11. E-commerce_dataset

    • kaggle.com
    zip
    Updated Nov 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhay Ayare (2025). E-commerce_dataset [Dataset]. https://www.kaggle.com/datasets/abhayayare/e-commerce-dataset
    Explore at:
    zip(644123 bytes)Available download formats
    Dataset updated
    Nov 16, 2025
    Authors
    Abhay Ayare
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    E-commerce_dataset

    This dataset is a synthetic yet realistic E-commerce retail dataset generated programmatically using Python (Faker + NumPy + Pandas).
    It is designed to closely mimic real-world online shopping behavior, user patterns, product interactions, seasonal trends, and marketplace events.
    
    

    You can use this dataset for:

    Machine Learning & Deep Learning
    Recommender Systems
    Customer Segmentation
    Sales Forecasting
    A/B Testing
    E-commerce Behaviour Analysis
    Data Cleaning / Feature Engineering Practice
    SQL practice
    

    📁**Dataset Contents**

    The dataset contains 6 CSV files: ~~~ File Rows Description users.csv ~10,000 User profiles, demographics & signup info products.csv ~2,000 Product catalog with rating and pricing orders.csv ~20,000 Order-level transactions order_items.csv ~60,000 Items purchased per order reviews.csv ~15,000 Customer-written product reviews events.csv ~80,000 User event logs: view, cart, wishlist, purchase ~~~

    🧬 Data Dictionary

    1. Users (users.csv)
    Column Description
    user_id Unique user identifier
    name  Full customer name
    email  Email (synthetic, no real emails)
    gender Male / Female / Other
    city  City of residence
    signup_date Account creation date
    
    2. Products (products.csv)
    Column Description
    product_id Unique product identifier
    product_name  Product title
    category  Electronics, Clothing, Beauty, Home, Sports, etc.
    price  Actual selling price
    rating Average product rating
    
    3. Orders (orders.csv)
    Column Description
    order_id  Unique order identifier
    user_id User who placed the order
    order_date Timestamp of the order
    order_status  Completed / Cancelled / Returned
    total_amount  Total order value
    
    4. Order Items (order_items.csv)
    Column Description
    order_item_id  Unique identifier
    order_id  Associated order
    product_id Purchased product
    quantity  Quantity purchased
    item_price Price per unit
    
    5. Reviews (reviews.csv)
    Column Description
    review_id  Unique review identifier
    user_id User who submitted review
    product_id Reviewed product
    rating 1–5 star rating
    review_text Short synthetic review
    review_date Submission date
    
    6. Events (events.csv)
    Column Description
    event_id  Unique event identifier
    user_id User performing event
    product_id Viewed/added/purchased product
    event_type view/cart/wishlist/purchase
    event_timestamp Timestamp of event
    

    🧠 Possible Use Cases (Ideas & Projects)

    🔍 Machine Learning

    Customer churn prediction
    Review sentiment analysis (NLP)
    Recommendation engines
    Price optimization models
    Demand forecasting (Time-series)
    

    📦 Business Analytics

    Market basket analysis
    RFM segmentation
    Cohort analysis
    Funnel conversion tracking
    A/B testing simulations
    

    🧮 SQL Practice

    Joins
    Window functions
    Aggregations
    CTE-based funnels
    Complex queries
    

    🛠 How the Dataset Was Generated

    The dataset was generated entirely in Python using:

    Faker for realistic user and review generation
    NumPy for probability-based event modeling
    Pandas for data processing
    

    Custom logic for:

    demand variation
    user behavior simulation
    return/cancel probabilities
    seasonal order timestamp distribution
    The dataset does not include any real personal data.
    Everything is generated synthetically.
    

    ⚠️ License

    This dataset is released under CC BY 4.0 — free to use for:
    Research
    Education
    Commercial projects
    Kaggle competitions
    Machine learning pipelines
    Just provide attribution.
    

    ⭐ If you found this dataset helpful, please:

    Upvote the dataset
    Leave a comment
    Share your notebooks/notebooks using it
    
  12. Data from: scikit-survival

    • kaggle.com
    zip
    Updated Feb 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AnthonyTherrien (2025). scikit-survival [Dataset]. https://www.kaggle.com/anthonytherrien/scikit-survival
    Explore at:
    zip(3684823 bytes)Available download formats
    Dataset updated
    Feb 8, 2025
    Authors
    AnthonyTherrien
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📝 Overview

    This dataset provides the scikit-survival 0.23.1 Python package in .whl format, enabling users to perform survival analysis using machine learning techniques. scikit-survival is a powerful library that extends scikit-learn to handle censored data, commonly encountered in medical research, reliability engineering, and event-time prediction tasks.

    📥 Installation

    To install the package, first, download the .whl file from this Kaggle dataset. Then, install it using pip:

    pip install scikit_survival-0.23.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    

    Ensure that you have Python 3.13 installed, as this wheel is built specifically for that version.

    🔬 Features

    • Kaplan-Meier and Cox Proportional Hazards models
    • Random survival forests for non-linear survival relationships
    • Concordance index for model evaluation
    • Integration with scikit-learn for easy model training and validation
    • Handling of right-censored data for accurate event-time predictions

    🏥 Use Cases

    • Medical research: Predict patient survival times based on clinical features.
    • Reliability engineering: Estimate the lifespan of mechanical components.
    • Churn prediction: Analyze customer retention and attrition timelines.
    • Credit risk modeling: Assess time until loan default.
  13. Vietnam Bank Churn Dataset (2025)

    • kaggle.com
    zip
    Updated Oct 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tran Huu Nhan (2025). Vietnam Bank Churn Dataset (2025) [Dataset]. https://www.kaggle.com/datasets/tranhuunhan/vietnam-bank-churn-dataset-2025
    Explore at:
    zip(3387069 bytes)Available download formats
    Dataset updated
    Oct 23, 2025
    Authors
    Tran Huu Nhan
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    Việt Nam
    Description

    Synthetic Banking Customer Dataset (80,000 Records)

    Overview

    This dataset contains 80,000 synthetic retail banking customer records. The data includes demographic, financial, credit, and behavioral information along with churn labels and risk insights. It is designed for:

    • Churn prediction
    • Customer segmentation
    • Risk scoring
    • Behavioral analytics
    • Data science education and experimentation

    All records are fully synthetic and generated programmatically using realistic financial behavior rules.

    Dataset Columns

    ColumnDescription
    idUnique customer ID
    full_nameFull Vietnamese name
    genderGender (male, female)
    ageAge in years
    occupationCustomer occupation
    origin_provinceHome province
    addressResidential district (Ho Chi Minh City sampling)
    monthly_irMonthly income (VND)
    balanceTotal current balance (VND)
    credit_scoCredit score (range: 300–800)
    tenure_yeCustomer relationship tenure in years
    marriedMarital status (0=Single, 1=Married, 2=Divorced, 3=Widowed)
    nums_cardNumber of bank cards owned
    nums_serviceNumber of banking products/services used
    last_transaction_monthTransaction amount in the last month (VND)
    active_memberWas active in the last 90 days (True/False)
    created_dateAccount opening date
    last_active_dateMost recent usage date
    exitChurn label (target variable: True=churned)
    customer_segmentSegment class: Mass, Emerging, Affluent, Priority
    engagement_scoreBehavior score based on products and usage (0–100)
    loyalty_levelLoyalty level based on engagement (Bronze, Silver, Gold, Platinum)
    digital_behaviorUsage type: offline, hybrid, mobile
    risk_scoreComposite customer risk score (0–1 scale)
    risk_segmentRisk class derived from risk_score: Low, Medium, High
    cluster_groupCluster label generated by KMeans (n_clusters=4)

    Target Feature

    exit is the binary churn indicator:

    ValueMeaning
    FalseCustomer retained
    TrueCustomer churned

    Business Logic Summary

    Income Model

    Income is generated based on occupation, location, and age progression:

    income = occupation_range * region_factor * age_factor
    

    Balance Model

    Customer balances depend on income, saving behavior, and tenure:

    balance = income * saving_rate(age, tenure) * random_factor
    

    Engagement Score

    engagement_score =
      0.40 * activity +
      0.30 * services +
      0.15 * cards +
      0.15 * transaction_intensity
    

    Risk Score

    risk_score =
      0.40 * credit_risk +
      0.25 * balance_risk +
      0.15 * activity_risk +
      0.10 * income_risk +
      0.10 * service_risk
    

    Churn Labeling

    Churn is determined by ranking customers by risk_score, and the top 18% most risky customers are labeled as churned:

    exit = risk_score >= 82nd percentile
    

    Suggested Use Cases

    • Customer churn prediction (binary classification)
    • Risk profiling and survival analysis
    • Customer segmentation (behavior and product usage)
    • Banking business analytics
    • Customer lifetime value modeling (CLV)
    • Power BI and Tableau dashboard projects

    Tools Compatibility

    This dataset is compatible with:

    • Python (Pandas, NumPy, Scikit-learn)
    • R
    • SQL-based data warehouses
    • Power BI
    • Tableau
    • Excel
    • Kaggle Notebooks
    • Google Colab
  14. Predictive Employee Attrition Analysis (IBM HR)

    • kaggle.com
    zip
    Updated Nov 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolas Zalazar (2025). Predictive Employee Attrition Analysis (IBM HR) [Dataset]. https://www.kaggle.com/datasets/nicolaszalazar73/ibm-hr-analytics-predictive-employee-attrition
    Explore at:
    zip(101327 bytes)Available download formats
    Dataset updated
    Nov 3, 2025
    Authors
    Nicolas Zalazar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Predictive Employee Attrition Analysis (IBM HR Churn)

    Context and Dataset Overview This project focuses on the IBM HR Analytics Employee Attrition & Performance dataset, a classic resource for exploring predictive modeling in Human Resources. The goal is to move beyond simple descriptive statistics to build a functional predictive model that identifies which employees are at risk of leaving and why.

    The dataset includes demographic information, job roles, satisfaction levels, salary metrics, and key tenure data. The core challenge is addressing the high class imbalance and extracting actionable business intelligence from the model coefficients.

    Methodology and Core Contribution This analysis follows a robust Data Science pipeline, providing a complete solution from raw data to executive visualization:

    Data Preprocessing & Feature Engineering: We utilized Python (Pandas/Scikit-learn) to clean the data, impute missing values, and engineer crucial categorical variables (e.g., Seniority_Category, Monthly_Income_Level) to enhance segmentation analysis.

    Predictive Modeling (Logistic Regression): A Logistic Regression model was trained to predict the binary target (Attrition). Crucially, we use the model coefficients as "Churn Drivers" to quantify the influence of each variable on the probability of attrition.

    Executive Visualization: The findings are presented in a comprehensive two-page Looker Studio Dashboard. The dashboard is designed to be actionable, clearly separating the "Why" (Predictive Drivers) from the "Who and Where" (Descriptive Risk Segments).

    Key Project Outcomes Identification of Overtime as the single highest predictor of employee attrition.

    Confirmation of high early-career churn (within the first 5 years of tenure).

    A clear, validated framework for HR teams to prioritize intervention based on quantitative risk factors.

    Technologies Used: Python (Pandas, Scikit-learn), SQL, Looker Studio.

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
BlastChar (2018). Telco Customer Churn [Dataset]. https://www.kaggle.com/datasets/blastchar/telco-customer-churn
Organization logo

Telco Customer Churn

Focused customer retention programs

Explore at:
84 scholarly articles cite this dataset (View in Google Scholar)
zip(175758 bytes)Available download formats
Dataset updated
Feb 23, 2018
Authors
BlastChar
Description

Context

"Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]

Content

Each row represents a customer, each column contains customer’s attributes described on the column Metadata.

The data set includes information about:

  • Customers who left within the last month – the column is called Churn
  • Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
  • Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges
  • Demographic info about customers – gender, age range, and if they have partners and dependents

Inspiration

To explore this type of models and learn more about the subject.

New version from IBM: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113

Search
Clear search
Close search
Google apps
Main menu