2 datasets found

t
Telco_Customer_churn_Data
test.researchdata.tuwien.at
bin, csv, png
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erum Naz; Erum Naz; Erum Naz; Erum Naz (2025). Telco_Customer_churn_Data [Dataset]. http://doi.org/10.82556/b0ch-cn44
Explore at:
png, csv, binAvailable download formats
Unique identifier
https://doi.org/10.82556/b0ch-cn44
Dataset updated
Apr 28, 2025
Dataset provided by
TU Wien
Authors
Erum Naz; Erum Naz; Erum Naz; Erum Naz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 28, 2025
Description
Context and Methodology

The dataset originates from the research domain of Customer Churn Prediction in the Telecom Industry. It was created as part of the project "Data-Driven Churn Prediction: ML Solutions for the Telecom Industry," completed within the Data Stewardship course (Master programme Data Science, TU Wien).

The primary purpose of this dataset is to support machine learning model development for predicting customer churn based on customer demographics, service usage, and account information.
The dataset enables the training, testing, and evaluation of classification algorithms, allowing researchers and practitioners to explore techniques for customer retention optimization.

The dataset was originally obtained from the IBM Accelerator Catalog and adapted for academic use. It was uploaded to TU Wien’s DBRepo test system and accessed via SQLAlchemy connections to the MariaDB environment.

Technical Details

The dataset has a tabular structure and was initially stored in CSV format. It contains:

Rows: 7,043 customer records

Columns: 21 features including customer attributes (gender, senior citizen status, partner status), account information (tenure, contract type, payment method), service usage (internet service, streaming TV, tech support), and the target variable (Churn: Yes/No).

Naming Convention:

The table in the database is named telco_customer_churn_data.

Software Requirements:

To open and work with the dataset, any standard database client or programming language supporting MariaDB connections can be used (e.g., Python etc).

For machine learning applications, libraries such as pandas, scikit-learn, and joblib are typically used.

Additional Resources:

Source code for data loading, preprocessing, model training, and evaluation is available at the associated GitHub repository: https://github.com/nazerum/fair-ml-customer-churn

Further Details

When reusing the dataset, users should be aware:

Licensing: The dataset is shared under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Use Case Suitability: The dataset is best suited for classification tasks, particularly binary classification (churn vs. no churn).

Metadata Standards: Metadata describing the dataset adheres to FAIR principles and is supplemented by CodeMeta and Croissant standards for improved interoperability.
JIO_Telecom_Churn_Prediction
kaggle.com
Updated Dec 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arzoo Parihar (2021). JIO_Telecom_Churn_Prediction [Dataset]. https://www.kaggle.com/datasets/arzooparihar/jio-telecom-churn-prediction
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 30, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Arzoo Parihar
Description
****Business Problem Overview**** Let us say that Reliance Jio Infocomm Limited approached us with a problem. There is a general tendency in the telecom industry that customers actively switch from one operator to another. As the telecom is highly competitive, the telecommunications industry experiences an average of 18-27% annual churn rate. Since, it costs 7-12 times more to acquire a new customer as compared to retaining an existing one, customer retention is an important aspect when compared with customer acquisition which is why our clients, Jio, wants to retain their high profitable customers and thus, wish to predict those customers which have a high risk of churning. Also, since a postpaid customer usually informs the operator prior to shifting their business to a competitor’s platform, our client is more concerned regarding its prepaid customers that usually churn or shift their business to a different operator without informing them which results in loss of business because Jio couldn’t offer any promotional scheme in time, to prevent churning. As per Jio, there are two kinds of churning - revenue based and usage based. Those customers who have not utilized any revenue-generating facilities such as mobile data usage, outgoing calls, caller tunes, SMS etc. over a given period of time. To determine such a customer, Jio usually uses an aggregate metrics like ‘customers who have generated less than ₹ 7 per month in total revenue’. However, the disadvantage of using such a metric would be that many of Jio customers who use their services only for incoming calls will also be counted/treated as churn since they do not generate direct revenue. In such scenarios, revenue is generated by their relatives who also uses Jio network to call them. For example, many users in rural areas only receive calls from their wage-earning siblings in urban areas. The other type of Churn, as per our client, is usage based which consists of customers who do not use any of their services i.e., no calls (either incoming or outgoing), no internet usage, no SMS, etc. The problem with this segment is that by the time one realizes that a customer is not utilizing any of the services, it may be too late to take any corrective measure since the said customer might already switched to another operator. Currently, our client, Reliance Jio Infocomm Limited, have approached us to help them in predicting customers who will churn based on the usage-based definition Another aspect that we have to bear in mind is that as per Jio, 80% of their revenue is generated from 20% of their top customers. They call this group High-valued customers. Thus, if we can help reduce churn of the high-value customers, we will be able to reduce significant revenue leakage and for this they want us to define high-value customers based on a certain metric based on usage-based churn and predict only on high-value customers for prepaid segment. Understanding the Data-set The data-set contains customer-level information for a span of four consecutive months - June, July, August and September. The months are encoded as 6, 7, 8 and 9, respectively. The business objective is to predict the churn in the last (i.e. the ninth) month using the data (features) from the first three months. To do this task well, understanding the typical customer behavior during churn will be helpful. Understanding Customer Behavior During Churn Customers usually do not decide to switch to another competitor instantly, but rather over a period of time (this is especially applicable to high-value customers). In churn prediction, we assume that there are three phases of customer lifecycle: 1) The ‘good’ phase: In this phase, the customer is happy with the service and behaves as usual. 2) The ‘action’ phase: The customer experience starts to sore in this phase, for e.g. he/she gets a compelling offer from a competitor, faces unjust charges, becomes unhappy with service quality etc. In this phase, the customer usually shows different behavior than the ‘good’ months. Also, it is crucial to identify high-churn-risk customers in this phase, since some corrective actions can be taken at this point (such as matching the competitor’s offer/improving the service quality etc.) 3) The ‘churn’ phase: In this phase, the customer is said to have churned. You define churn based on this phase. Also, it is important to note that at the time of prediction (i.e. the action months), this data is not available to you for prediction. Thus, after tagging churn as 1/0 based on this phase, you discard all data corresponding to this phase. In this case, since you are working over a four-month window, the first two months are the ‘good’ phase, the third month is the ‘action’ phase, while the fourth month is the ‘churn’ phase. Data Dictionary  The data-set is available in a csv file named as “Company Data.csv” and the da...
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Erum Naz; Erum Naz; Erum Naz; Erum Naz (2025). Telco_Customer_churn_Data [Dataset]. http://doi.org/10.82556/b0ch-cn44

Telco_Customer_churn_Data

Explore at:

png, csv, binAvailable download formats

Unique identifier

https://doi.org/10.82556/b0ch-cn44

Dataset updated

Apr 28, 2025

Dataset provided by

TU Wien

Authors

Erum Naz; Erum Naz; Erum Naz; Erum Naz

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Apr 28, 2025

Description

Context and Methodology

The dataset originates from the research domain of Customer Churn Prediction in the Telecom Industry. It was created as part of the project "Data-Driven Churn Prediction: ML Solutions for the Telecom Industry," completed within the Data Stewardship course (Master programme Data Science, TU Wien).

The primary purpose of this dataset is to support machine learning model development for predicting customer churn based on customer demographics, service usage, and account information.
The dataset enables the training, testing, and evaluation of classification algorithms, allowing researchers and practitioners to explore techniques for customer retention optimization.

The dataset was originally obtained from the IBM Accelerator Catalog and adapted for academic use. It was uploaded to TU Wien’s DBRepo test system and accessed via SQLAlchemy connections to the MariaDB environment.

Technical Details

The dataset has a tabular structure and was initially stored in CSV format. It contains:

Rows: 7,043 customer records
Columns: 21 features including customer attributes (gender, senior citizen status, partner status), account information (tenure, contract type, payment method), service usage (internet service, streaming TV, tech support), and the target variable (Churn: Yes/No).

Naming Convention:

The table in the database is named telco_customer_churn_data.

Software Requirements:

To open and work with the dataset, any standard database client or programming language supporting MariaDB connections can be used (e.g., Python etc).
For machine learning applications, libraries such as pandas, scikit-learn, and joblib are typically used.

Additional Resources:

Source code for data loading, preprocessing, model training, and evaluation is available at the associated GitHub repository: https://github.com/nazerum/fair-ml-customer-churn

Further Details

When reusing the dataset, users should be aware:

Licensing: The dataset is shared under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Use Case Suitability: The dataset is best suited for classification tasks, particularly binary classification (churn vs. no churn).
Metadata Standards: Metadata describing the dataset adheres to FAIR principles and is supplemented by CodeMeta and Croissant standards for improved interoperability.

Clear search

Close search

Google apps

Main menu

Telco_Customer_churn_Data

Context and Methodology

Technical Details

Further Details

JIO_Telecom_Churn_Prediction

Telco_Customer_churn_Data

Context and Methodology

Technical Details

Further Details