Facebook
TwitterBy Yashwanth Sharaff [source]
This dataset contains essential characteristics of a variety of movies, including basic pieces of information such as the movie's title and budget, as well as performance indicators like the movie's MPAA rating, gross revenue, release date, genre, runtime, rating count and summary. With this data set we can better understand the film industry and uncover insights on how different features and performance metrics impact one another to guarantee a movie's success. The movies dataset also helps you make informed decisions about which features are key indicators in setting up a high-grossing feature film
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
To get the most out of this data set you need to understand what each column in it represents. The ‘Title’ column gives you the title of the movie which can be used for further search or exploration on popular streaming services and websites that are dedicated to providing detailed information about movies. The ‘MPAA Rating’ lists any Motion Picture Association (MPAA) rating for a movie which consists of G (General Audiences), PG (Parental Guidance Suggested), PG-13 (Parents Strongly Cautioned), R (Under 17 Requires Accompanying Parent or Guardian) etc. The 'Budget' column give you an approximate idea about how much a particular production cost while the 'Gross' columns depicts its earnings if it was released in theaters while its successor 'Release Date' reveals when each film has been released or is going to release in future. The columns 'Genre', 'Runtime', and ‘Rating Count’ cover subjects such as what type of movie is it? Every genre will have an associated runtime limit along with rating count which refers to number people who have rated/reviewed a particular flick whether on IMDB or other streaming services as well as paper mediums like newspapers . Last but not least summary field states an overview of what we can expect from film so take this in account before watching anything especially if include children members in your family.
So go ahead - start exploring this interesting dataset today!
- Creating a box office prediction model using budget, genre, release date and MPAA rating
- Using the summary data to create a sentiment analysis tool for movie reviews
- Building a recommendation engine for users based on their prior ratings and what other users with similar tastes have rated as highly
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: movies.csv | Column name | Description | |:-----------------|:-------------------------------------------------------------------------------| | Title | The title of the movie. (String) | | MPAA Rating | The Motion Picture Association of America (MPAA) rating of the movie. (String) | | Budget | The budget of the movie in US dollars. (Integer) | | Gross | The gross revenue of the movie in US dollars. (Integer) | | Release Date | The date the movie was released. (Date) | | Genre | The genre of the movie. (String) | | Runtime | The length of the movie in minutes. (Integer) | | Rating Count | The number of ratings the movie has received. (Integer) | | Summary | A brief summary of the movie. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Yashwanth Sharaff.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Methodology Every Tuesday, we publish four global Top 10 lists for films and TV: Film (English), TV (English), Film (Non-English), and TV (Non-English). These lists rank titles based on ‘views’ for each title from Monday to Sunday of the previous week. We define views for a title as the total hours viewed divided by the total runtime. Values are rounded to 100,000.
We consider each season of a series and each film on their own, so you might see both Stranger Things seasons 2 and 3 in the Top 10. Because titles sometimes move in and out of the Top 10, we also show the total number of weeks that a season of a series or film has spent on the list.
To give you a sense of what people are watching around the world, we also publish Top 10 lists for nearly 100 countries and territories (the same locations where there are Top 10 rows on Netflix). Country lists are also ranked by views.
Finally, we provide a list of the Top 10 most popular Netflix films and TV overall (branded Netflix in any country) in each of the four categories based on the views of each title in its first 91 days.
Some TV shows have multiple premiere dates, whether weekly or in parts, and therefore the runtime increases over time. For the weekly lists, we show the views based on the total hours viewed during the week divided by the total runtime available at the end of the week. On the Most Popular List, we wait until all episodes have premiered, so you see the views of the entire season. For titles that are Netflix branded in some countries but not others, we still include all of the hours viewed.
Information on the site starts from June 28, 2021 and any lists published before June 20, 2023 are ranked by hours viewed.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Since launching our weekly Top 10 and Most Popular lists in 2021, Netflix has provided more information about what people are watching than any other streamer except YouTube. And now we believe it’s time to go further.
Starting today we will publish What We Watched: A Netflix Engagement Report twice a year. This is a comprehensive report of what people watched on Netflix over a six month period1, including:
Hours viewed for every title — original and licensed — watched for over 50,000 hours2;
The premiere date3 for any Netflix TV series or film; and
Whether a title was available globally.
In total, this report covers more than 18,000 titles — representing 99% of all viewing on Netflix — and nearly 100 billion hours viewed.
Over 60% of Netflix titles released between January and June 2023 appeared on our weekly Top 10 lists. So while this report is broader in scope, the trends reflected in it are very similar to those in the Top 10 lists, including:
The strength of returning favorites like Ginny & Georgia, Alice in Borderland, The Marked Heart, Outer Banks, You, Queen Charlotte: A Bridgerton Story, XO Kitty and film sequels Murder Mystery 2 and Extraction 2;
The popularity of new series like The Night Agent, The Diplomat, Beef, The Glory, Alpha Males, FUBAR and Fake Profile, which generate huge audiences and fandoms;
The size of the audience of our films across every genre including The Mother, Luther: The Fallen Sun, You People, AKA, ¡Que viva México! and Hunger;
The enthusiasm for non-English stories, which generated 30% of all viewing;
The staying power of titles on Netflix, which extends well beyond their premieres. All Quiet on the Western Front, for example, debuted in October 2022 and generated 80M hours viewed between January and June; and
The demand for older, licensed titles, which generates tremendous value for our members and for rights holders.
When reading the report it’s important to remember:
Success on Netflix comes in all shapes and sizes, and is not determined by hours viewed alone. We have enormously successful movies and TV shows with both lower and higher hours viewed. It’s all about whether a movie or TV show thrilled its audience — and the size of that audience relative to the economics of the title; and
To compare between titles it’s best to use our weekly Top 10 and Most Popular lists, which take into account run times and premiere dates.
This is a big step forward for Netflix and our industry. We believe the viewing information in this report — combined with our weekly Top 10 and Most Popular lists — will give creators and our industry deeper insights into our audiences, and what resonates with them.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
From the report page:
Since launching our weekly Top 10 and Most Popular lists in 2021, Netflix has provided more information about what people are watching than any other streamer except YouTube. And now we believe it’s time to go further.
Starting today we will publish What We Watched: A Netflix Engagement Report twice a year. This is a comprehensive report of what people watched on Netflix over a six month period1, including:
Hours viewed for every title — original and licensed — watched for over 50,000 hours2;
The premiere date3 for any Netflix TV series or film; and
Whether a title was available globally.
In total, this report covers more than 18,000 titles — representing 99% of all viewing on Netflix — and nearly 100 billion hours viewed.
Over 60% of Netflix titles released between January and June 2023 appeared on our weekly Top 10 lists. So while this report is broader in scope, the trends reflected in it are very similar to those in the Top 10 lists, including:
The strength of returning favorites like Ginny & Georgia, Alice in Borderland, The Marked Heart, Outer Banks, You, Queen Charlotte: A Bridgerton Story, XO Kitty and film sequels Murder Mystery 2 and Extraction 2;
The popularity of new series like The Night Agent, The Diplomat, Beef, The Glory, Alpha Males, FUBAR and Fake Profile, which generate huge audiences and fandoms;
The size of the audience of our films across every genre including The Mother, Luther: The Fallen Sun, You People, AKA, ¡Que viva México! and Hunger;
The enthusiasm for non-English stories, which generated 30% of all viewing;
The staying power of titles on Netflix, which extends well beyond their premieres. All Quiet on the Western Front, for example, debuted in October 2022 and generated 80M hours viewed between January and June; and
The demand for older, licensed titles, which generates tremendous value for our members and for rights holders.
When reading the report it’s important to remember:
Success on Netflix comes in all shapes and sizes, and is not determined by hours viewed alone. We have enormously successful movies and TV shows with both lower and higher hours viewed. It’s all about whether a movie or TV show thrilled its audience — and the size of that audience relative to the economics of the title; and
To compare between titles it’s best to use our weekly Top 10 and Most Popular lists, which take into account run times and premiere dates.
This is a big step forward for Netflix and our industry. We believe the viewing information in this report — combined with our weekly Top 10 and Most Popular lists — will give creators and our industry deeper insights into our audiences, and what resonates with them.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Context
A fictional telco company that provided home phone and Internet services to 7043 customers in California in Q3.
Data Description 7043 observations with 33 variables
CustomerID: A unique ID that identifies each customer.
Count: A value used in reporting/dashboarding to sum up the number of customers in a filtered set.
Country: The country of the customer’s primary residence.
State: The state of the customer’s primary residence.
City: The city of the customer’s primary residence.
Zip Code: The zip code of the customer’s primary residence.
Lat Long: The combined latitude and longitude of the customer’s primary residence.
Latitude: The latitude of the customer’s primary residence.
Longitude: The longitude of the customer’s primary residence.
Gender: The customer’s gender: Male, Female
Senior Citizen: Indicates if the customer is 65 or older: Yes, No
Partner: Indicate if the customer has a partner: Yes, No
Dependents: Indicates if the customer lives with any dependents: Yes, No. Dependents could be children, parents, grandparents, etc.
Tenure Months: Indicates the total amount of months that the customer has been with the company by the end of the quarter specified above.
Phone Service: Indicates if the customer subscribes to home phone service with the company: Yes, No
Multiple Lines: Indicates if the customer subscribes to multiple telephone lines with the company: Yes, No
Internet Service: Indicates if the customer subscribes to Internet service with the company: No, DSL, Fiber Optic, Cable.
Online Security: Indicates if the customer subscribes to an additional online security service provided by the company: Yes, No
Online Backup: Indicates if the customer subscribes to an additional online backup service provided by the company: Yes, No
Device Protection: Indicates if the customer subscribes to an additional device protection plan for their Internet equipment provided by the company: Yes, No
Tech Support: Indicates if the customer subscribes to an additional technical support plan from the company with reduced wait times: Yes, No
Streaming TV: Indicates if the customer uses their Internet service to stream television programing from a third party provider: Yes, No. The company does not charge an additional fee for this service.
Streaming Movies: Indicates if the customer uses their Internet service to stream movies from a third party provider: Yes, No. The company does not charge an additional fee for this service.
Contract: Indicates the customer’s current contract type: Month-to-Month, One Year, Two Year.
Paperless Billing: Indicates if the customer has chosen paperless billing: Yes, No
Payment Method: Indicates how the customer pays their bill: Bank Withdrawal, Credit Card, Mailed Check
Monthly Charge: Indicates the customer’s current total monthly charge for all their services from the company.
Total Charges: Indicates the customer’s total charges, calculated to the end of the quarter specified above.
Churn Label: Yes = the customer left the company this quarter. No = the customer remained with the company. Directly related to Churn Value.
Churn Value: 1 = the customer left the company this quarter. 0 = the customer remained with the company. Directly related to Churn Label.
Churn Score: A value from 0-100 that is calculated using the predictive tool IBM SPSS Modeler. The model incorporates multiple factors known to cause churn. The higher the score, the more likely the customer will churn.
CLTV: Customer Lifetime Value. A predicted CLTV is calculated using corporate formulas and existing data. The higher the value, the more valuable the customer. High value customers should be monitored for churn.
Churn Reason: A customer’s specific reason for leaving the company. Directly related to Churn Category.
Source This dataset is detailed in: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113
Downloaded from: https://community.ibm.com/accelerators/?context=analytics&query=telco%20churn&type=Data&product=Cognos%20Analytics
There are several related datasets as documented in: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2018/09/12/base-samples-for-ibm-cognos-analytics
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Telco customer churn data contains information about a fictional telco company that provided home phone and Internet services to 7043 customers in California in Q3. It indicates which customers have left, stayed, or signed up for their service. Multiple important demographics are included for each customer, as well as a Satisfaction Score, Churn Score, and Customer Lifetime Value (CLTV) index.
CustomerID: A unique ID that identifies each customer.
Gender: The customer’s gender: Male, Female
Age: The customer’s current age, in years, at the time the fiscal quarter ended.
Senior Citizen: Indicates if the customer is 65 or older: Yes, No
Married: Indicates if the customer is married: Yes, No
Dependents: Indicates if the customer lives with any dependents: Yes, No. Dependents could be children, parents, grandparents, etc.
Number of Dependents: Indicates the number of dependents that live with the customer.
CustomerID: A unique ID that identifies each customer.
Count: A value used in reporting/dashboarding to sum up the number of customers in a filtered set.
Country: The country of the customer’s primary residence.
State: The state of the customer’s primary residence.
City: The city of the customer’s primary residence.
Zip Code: The zip code of the customer’s primary residence.
Latitude: The latitude of the customer’s primary residence.
Longitude: The longitude of the customer’s primary residence.
Zip Code: The zip code of the customer’s primary residence.
Population: A current population estimate for the entire Zip Code area.
CustomerID: A unique ID that identifies each customer.
Count: A value used in reporting/dashboarding to sum up the number of customers in a filtered set.
Quarter: The fiscal quarter that the data has been derived from (e.g. Q3).
Referred a Friend: Indicates if the customer has ever referred a friend or family member to this company: Yes, No
Number of Referrals: Indicates the number of referrals to date that the customer has made.
Tenure in Months: Indicates the total amount of months that the customer has been with the company by the end of the quarter specified above.
Offer: Identifies the last marketing offer that the customer accepted, if applicable. Values include None, Offer A, Offer B, Offer C, Offer D, and Offer E.
Phone Service: Indicates if the customer subscribes to home phone service with the company: Yes, No
Avg Monthly Long Distance Charges: Indicates the customer’s average long distance charges, calculated to the end of the quarter specified above.
Multiple Lines: Indicates if the customer subscribes to multiple telephone lines with the company: Yes, No
Internet Service: Indicates if the customer subscribes to Internet service with the company: No, DSL, Fiber Optic, Cable.
Avg Monthly GB Download: Indicates the customer’s average download volume in gigabytes, calculated to the end of the quarter specified above.
Online Security: Indicates if the customer subscribes to an additional online security service provided by the company: Yes, No
Online Backup: Indicates if the customer subscribes to an additional online backup service provided by the company: Yes, No
Device Protection Plan: Indicates if the customer subscribes to an additional device protection plan for their Internet equipment provided by the company: Yes, No
Premium Tech Support: Indicates if the customer subscribes to an additional technical support plan from the company with reduced wait times: Yes, No
Streaming TV: Indicates if the customer uses their Internet service to stream television programing from a third party provider: Yes, No. The company does not charge an additional fee for this service.
Streaming Movies: Indicates if the customer uses their Internet service to stream movies from a third party provider: Yes, No. The company does not charge an additional fee for this service.
Streaming Music: Indicates if the customer uses their Internet service to stream music from a third party provider: Yes, No. The company does not charge an additional fee for this service.
Unlimited Data: Indicates if the customer has paid an additional monthly fee to have unlimited data downloads/uploads: Yes, No
Contract: Indicates the customer’s current contract type: Month-to-Month, One Year, Two Year.
Paperless Billing: Indicates if the customer has chosen paperless billing: Yes, No
Payment Method: Indicates how the customer pays their bill: Bank Withdrawal, Credit Card, Mailed Check
Monthly Charge: Indicates the customer’s current total monthly charge for all their services from the company.
Total Charges: Indicates the customer’s total charges, calculated to the end of the quarter specified above.
Total Refunds: Indicates the customer’s t...
Facebook
Twitter"Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]
Each row represents a customer, each column contains customer’s attributes described on the column Metadata.
The data set includes information about:
To explore this type of models and learn more about the subject.
New version from IBM: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains various features that describe customer information for a telecommunications company. The objective is to analyze these features to predict whether a customer will churn (i.e., discontinue service). Below is a breakdown of each feature in the dataset:
| Feature | Description |
|---|---|
| gender | Indicates the gender of the customer (1 = Female, 0 = Male). |
| seniorcitizen | Indicates if the customer is a senior citizen (1 = Yes, 0 = No). |
| partner | Indicates if the customer has a partner (1 = Yes, 0 = No). |
| dependents | Indicates if the customer has dependents (1 = Yes, 0 = No). |
| tenure | The number of months the customer has been with the company. |
| phoneservice | Indicates if the customer has a phone service (1 = Yes, 0 = No). |
| multiplelines | Indicates if the customer has multiple lines (1 = Yes, 0 = No). |
| onlinesecurity | Indicates if the customer has online security (1 = Yes, 0 = No). |
| onlinebackup | Indicates if the customer has online backup (1 = Yes, 0 = No). |
| deviceprotection | Indicates if the customer has device protection (1 = Yes, 0 = No). |
| techsupport | Indicates if the customer has tech support (1 = Yes, 0 = No). |
| streamingtv | Indicates if the customer has streaming TV services (1 = Yes, 0 = No). |
| streamingmovies | Indicates if the customer has streaming movies services (1 = Yes, 0 = No). |
| paperlessbilling | Indicates if the customer uses paperless billing (1 = Yes, 0 = No). |
| monthlycharges | The amount charged to the customer each month. |
| totalcharges | The total amount charged to the customer up to the current month. |
| label | Indicates whether the customer has churned (1 = Churned, 0 = Not Churned). |
| contract_Month-to-month | Indicates if the customer is on a month-to-month contract (1 = Yes, 0 = No). |
| contract_One year | Indicates if the customer is on a one-year contract (1 = Yes, 0 = No). |
| contract_Two year | Indicates if the customer is on a two-year contract (1 = Yes, 0 = No). |
| paymentmethod_Electronic check | Indicates if the payment method is Electronic Check (1 = Yes, 0 = No). |
| paymentmethod_Credit card (automatic) | Indicates if the payment method is Credit Card (Automatic) (1 = Yes, 0 = No). |
| paymentmethod_Mailed check | Indicates if the payment method is Mailed Check (1 = Yes, 0 = No). |
| paymentmethod_Bank transfer (automatic) | Indicates if the payment method is Bank Transfer (Automatic) (1 = Yes, 0 = No). |
| internetservice_Fiber optic | Indicates if the customer has Fiber Optic internet service (1 = Yes, 0 = No). |
| internetservice_DSL | Indicates if the customer has DSL internet service (1 = Yes, 0 = No). |
| internetservice_No | Indicates if the customer has no internet service (1 = Yes, 0 = No). |
The dataset captures a wide range of features that are likely to influence customer retention. By analyzing these features, we can build models to predict which customers are at risk of churning and develop strategies to retain them.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Context
This sample data tracks a fictional telco company's customer churn based on a variety of possible factors. The churn column indicates whether or not the customer left within the last month. Other columns include gender, dependents, monthly charges, and many with information about the types of services each customer has. Source: IBM.
Inventory of Telco Assets
A variety of objects have been updated/created that work together to tell a comprehensive story:
Telco churn: This sample dashboard tracks a fictional telco company's customer churn based on a variety of factors. The Churn Label column indicates whether or not the customer left within the last month. Other columns include location, monthly charges, services, and customer lifetime value. Location: Team content > Samples > Dashboards.
Quarterly churn update: This sample story shows quarterly changes of customer churn in a fictional telco company, and which contract and location has the highest churn in order to decide the goals for the next quarter. The churn label column indicates whether or not the customer left within the last quarter. Location: Team content > Samples > Stories.
Telco customer churn: This sample data module tracks a fictional telco company's customer churn based on a variety of possible factors. The churn column indicates whether or not the customer left within the last month. Other columns include gender, dependents, monthly charges, and many with information about the types of services each customer has. The Telco customer churn data module is composed of 3 uploaded files:
Telco_customer_churn_demographics.xlsx Telco_customer_churn_services.xlsx Telco_customer_churn_status.xlsx
Content
Data
Each table is described below. Demographics CustomerID: A unique ID that identifies each customer.
Count: A value used in reporting/dashboarding to sum up the number of customers in a filtered set.
Gender: The customer’s gender: Male, Female
Age: The customer’s current age, in years, at the time the fiscal quarter ended.
Senior Citizen: Indicates if the customer is 65 or older: Yes, No
Married: Indicates if the customer is married: Yes, No
Dependents: Indicates if the customer lives with any dependents: Yes, No. Dependents could be children, parents, grandparents, etc.
Number of Dependents: Indicates the number of dependents that live with the customer.
Services CustomerID: A unique ID that identifies each customer.
Count: A value used in reporting/dashboarding to sum up the number of customers in a filtered set.
Quarter: The fiscal quarter that the data has been derived from (e.g. Q3).
Referred a Friend: Indicates if the customer has ever referred a friend or family member to this company: Yes, No
Number of Referrals: Indicates the number of referrals to date that the customer has made.
Tenure in Months: Indicates the total amount of months that the customer has been with the company by the end of the quarter specified above.
Offer: Identifies the last marketing offer that the customer accepted, if applicable. Values include None, Offer A, Offer B, Offer C, Offer D, and Offer E.
Phone Service: Indicates if the customer subscribes to home phone service with the company: Yes, No
Avg Monthly Long Distance Charges: Indicates the customer’s average long distance charges, calculated to the end of the quarter specified above.
Multiple Lines: Indicates if the customer subscribes to multiple telephone lines with the company: Yes, No
Internet Service: Indicates if the customer subscribes to Internet service with the company: No, DSL, Fiber Optic, Cable.
Avg Monthly GB Download: Indicates the customer’s average download volume in gigabytes, calculated to the end of the quarter specified above.
Online Security: Indicates if the customer subscribes to an additional online security service provided by the company: Yes, No
Online Backup: Indicates if the customer subscribes to an additional online backup service provided by the company: Yes, No
Device Protection Plan: Indicates if the customer subscribes to an additional device protection plan for their Internet equipment provided by the company: Yes, No
Premium Tech Support: Indicates if the customer subscribes to an additional technical support plan from the company with reduced wait times: Yes, No
Streaming TV: Indicates if the customer uses their Internet service to stream television programing from a third party provider: Yes, No. The company does not charge an additional fee for this service.
Streaming Movies: Indicates if the customer uses their Internet service to stream movies from a third party provider: Yes, No. The company does not charge an additional fee for this service.
Streaming Music: Indicates if the customer uses their Internet service to stream music from a third party provider: Yes, No. The company does not charge an additional fee for ...
Facebook
Twitter📝 Dataset Description This dataset contains information about customers of a telecommunications company, including their demographic details, account information, service subscriptions, and churn status. It is a modified version of the popular Telco Churn dataset, curated for exploratory data analysis, machine learning model development, and churn prediction tasks.
The dataset includes simulated missing values in some columns to reflect real-world data issues and support preprocessing and imputation tasks. This makes it especially useful for demonstrating data cleaning techniques and evaluating model robustness.
📂 Files Included telco_data_modified.csv: The main dataset with 21 columns and 7043 rows (some missing values are intentionally inserted).
📌 Features Column Name Description customerID Unique identifier for each customer gender Customer gender: Male/Female SeniorCitizen Indicates if the customer is a senior citizen (0 = No, 1 = Yes) Partner Whether the customer has a partner Dependents Whether the customer has dependents tenure Number of months the customer has stayed with the company PhoneService Whether the customer has phone service MultipleLines Whether the customer has multiple lines InternetService Customer's internet service provider (DSL, Fiber optic, No) OnlineSecurity Whether the customer has online security OnlineBackup Whether the customer has online backup DeviceProtection Whether the customer has device protection TechSupport Whether the customer has tech support StreamingTV Whether the customer has streaming TV StreamingMovies Whether the customer has streaming movies Contract Type of contract: Month-to-month, One year, Two year PaperlessBilling Whether the customer uses paperless billing PaymentMethod Payment method: (e.g., Electronic check, Mailed check, etc.) MonthlyCharges Monthly charges TotalCharges Total charges to date Churn Whether the customer has left the company (Yes/No)
🔍 Use Cases Binary classification: Predict customer churn
Data preprocessing and imputation exercises
Feature engineering and importance analysis
Customer segmentation and churn modeling
⚠️ Notes Missing values were intentionally inserted in the dataset to help simulate real-world conditions.
Some preprocessing may be required before modeling (e.g., converting categorical to numerical data, handling TotalCharges as numeric).
🏷️ Tags
🙏 Acknowledgements This dataset is based on the original Telco Customer Churn dataset (initially provided by IBM). The current version has been modified for academic and practical exercises.
Facebook
TwitterThis dataset contains information about customers of a fictional telecommunications company, including demographic information, services subscribed to, location details, and churn behavior. This merged dataset combines the information from the original Telco Customer Churn dataset with additional details.
Dataset Description This merged Telco Customer Churn dataset provides a comprehensive view of customer attributes, service usage, location data, and churn behavior. This expanded dataset is a valuable resource for understanding churn patterns, customer segmentation, and developing targeted marketing strategies.
Direct Use This dataset can be used for various purposes, including:
Customer churn prediction: Develop machine learning models to predict which customers are at risk of churning, leveraging the expanded features. Customer segmentation: Identify different customer segments based on demographics, service usage, location, and churn behavior. Targeted marketing campaigns: Develop targeted marketing campaigns to retain at-risk customers or attract new customers, tailoring campaigns based on the insights derived from the merged dataset. Location-based analysis: Analyze customer churn trends based on specific locations, cities, or zip codes, and identify potential regional differences.
The dataset is not suitable for:
Real-time churn prediction: The dataset lacks real-time data, making it inappropriate for immediate churn prediction. Personal identification: While the dataset contains customer information, it is anonymized and should not be used to identify individuals.
The dataset is structured as a CSV file with 49 columns, each representing a customer attribute. The columns include:
Age: The customer's age in years. Avg Monthly GB Download: The customer's average monthly gigabyte download volume. Avg Monthly Long Distance Charges: The customer's average monthly long distance charges. Churn Category: A high-level category for the customer's reason for churning. Churn Label: Indicates whether the customer churned. Churn Reason: The customer's specific reason for leaving the company. Churn Score: A score from 0-100 indicating the likelihood of the customer churning. Churn Value: A numerical value representing whether the customer churned (1 for churned, 0 for not churned). City: The city of the customer's residence. CLTV: Customer Lifetime Value. Contract: The customer's contract type. Country: The country of the customer's residence. Customer ID: A unique identifier for each customer. Customer Status: The customer's status at the end of the quarter (Churned, Stayed, or Joined). Dependents: Whether the customer has dependents. Device Protection Plan: Whether the customer has a device protection plan. Gender: The customer's gender. Internet Service: Indicates whether the customer subscribes to internet service. Internet Type: The type of internet service provider. Lat Long: The combined latitude and longitude of the customer's residence. Latitude: The latitude of the customer's residence. Longitude: The longitude of the customer's residence. Married: Indicates if the customer is married. Monthly Charge: The customer's total monthly charge for all their services. Multiple Lines: Whether the customer has multiple phone lines. Number of Dependents: The number of dependents the customer has. Number of Referrals: The number of referrals made by the customer. Offer: The last marketing offer the customer accepted. Online Backup: Whether the customer has online backup service. Online Security: Whether the customer has online security service. Paperless Billing: Whether the customer has paperless billing. Partner: Whether the customer has a partner. Payment Method: The customer's payment method. Phone Service: Whether the customer has phone service. Population: The estimated population of the customer's zip code. Premium Tech Support: Whether the customer has premium tech support. Quarter: The fiscal quarter for the data. Referred a Friend: Indicates if the customer has referred a friend. Satisfaction Score: The customer's satisfaction rating. Senior Citizen: Whether the customer is a senior citizen. State: The state of the customer's residence. Streaming Movies: Whether the customer has streaming movies service. Streaming Music: Whether the customer has streaming music service. Streaming TV: Whether the customer has streaming TV service. Tenure in Months: The number of months the customer has been with the company. Total Charges: The customer's total charges. Total Extra Data Charges: The total charges for extra data downloads. Total Long Distance Charges: The total charges for long distance calls. Total Refunds: The total refunds received by the customer. Total Revenue: The total revenue generated by...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Telco customer churn dataset contains information about a fictional telco company that provided home phone and Internet services to 7043 customers in California in Q3. It indicates which customers have left, stayed, or signed up for their service. Multiple important demographics are included for each customer, as well as a Satisfaction Score, Churn Score, and Customer Lifetime Value (CLTV) index. It is designed to facilitate analysis of customer behavior and retention strategies.
| Column Name | Description |
|---|---|
| CustomerID | A unique ID that identifies each customer. |
| Gender | The customer’s gender: Male, Female. |
| Age | The customer’s current age, in years, at the time the fiscal quarter ended. |
| Senior Citizen | Indicates if the customer is 65 or older: Yes, No. |
| Married | Indicates if the customer is married: Yes, No. |
| Dependents | Indicates if the customer lives with any dependents: Yes, No. |
| Number of Dependents | Indicates the number of dependents that live with the customer. |
| Column Name | Description |
|---|---|
| CustomerID | A unique ID that identifies each customer. |
| Country | The country of the customer’s primary residence. |
| State | The state of the customer’s primary residence. |
| City | The city of the customer’s primary residence. |
| Zip Code | The zip code of the customer’s primary residence. |
| Total Population | A current population estimate for the entire Zip Code area. |
| Latitude | The latitude of the customer’s primary residence. |
| Longitude | The longitude of the customer’s primary residence. |
| Column Name | Description |
|---|---|
| CustomerID | A unique ID that identifies each customer. |
| Phone Service | Indicates if the customer subscribes to home phone service with the company: Yes, No |
| Internet Service | Indicates if the customer subscribes to Internet service with the company: No, DSL, Fiber Optic, Cable. |
| Online Security | Indicates if the customer subscribes to an additional online security service provided by the company: Yes, No |
| Online Backup | Indicates if the customer subscribes to an additional online backup service provided by the company: Yes, No |
| Device Protection Plan | Indicates if the customer subscribes to an additional device protection plan: Yes, No |
| Premium Tech Support | Indicates if the customer subscribes to an additional technical support plan: Yes, No |
| Streaming TV | Indicates if the customer uses their Internet service to stream television programming: Yes, No |
| Streaming Movies | Indicates if the customer uses their Internet service to stream movies: Yes, No |
| Streaming Music | Indicates if the customer uses their Internet service to stream music: Yes, No |
| Column Name | Description |
|---|---|
| CustomerID | A unique ID that identifies each customer. ... |
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterBy Yashwanth Sharaff [source]
This dataset contains essential characteristics of a variety of movies, including basic pieces of information such as the movie's title and budget, as well as performance indicators like the movie's MPAA rating, gross revenue, release date, genre, runtime, rating count and summary. With this data set we can better understand the film industry and uncover insights on how different features and performance metrics impact one another to guarantee a movie's success. The movies dataset also helps you make informed decisions about which features are key indicators in setting up a high-grossing feature film
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
To get the most out of this data set you need to understand what each column in it represents. The ‘Title’ column gives you the title of the movie which can be used for further search or exploration on popular streaming services and websites that are dedicated to providing detailed information about movies. The ‘MPAA Rating’ lists any Motion Picture Association (MPAA) rating for a movie which consists of G (General Audiences), PG (Parental Guidance Suggested), PG-13 (Parents Strongly Cautioned), R (Under 17 Requires Accompanying Parent or Guardian) etc. The 'Budget' column give you an approximate idea about how much a particular production cost while the 'Gross' columns depicts its earnings if it was released in theaters while its successor 'Release Date' reveals when each film has been released or is going to release in future. The columns 'Genre', 'Runtime', and ‘Rating Count’ cover subjects such as what type of movie is it? Every genre will have an associated runtime limit along with rating count which refers to number people who have rated/reviewed a particular flick whether on IMDB or other streaming services as well as paper mediums like newspapers . Last but not least summary field states an overview of what we can expect from film so take this in account before watching anything especially if include children members in your family.
So go ahead - start exploring this interesting dataset today!
- Creating a box office prediction model using budget, genre, release date and MPAA rating
- Using the summary data to create a sentiment analysis tool for movie reviews
- Building a recommendation engine for users based on their prior ratings and what other users with similar tastes have rated as highly
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: movies.csv | Column name | Description | |:-----------------|:-------------------------------------------------------------------------------| | Title | The title of the movie. (String) | | MPAA Rating | The Motion Picture Association of America (MPAA) rating of the movie. (String) | | Budget | The budget of the movie in US dollars. (Integer) | | Gross | The gross revenue of the movie in US dollars. (Integer) | | Release Date | The date the movie was released. (Date) | | Genre | The genre of the movie. (String) | | Runtime | The length of the movie in minutes. (Integer) | | Rating Count | The number of ratings the movie has received. (Integer) | | Summary | A brief summary of the movie. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Yashwanth Sharaff.