8 datasets found

IMDb TV show data sets (Top 250 TV shows on IMDb)
kaggle.com
zip
Updated Apr 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mubarak Ganiyu (2022). IMDb TV show data sets (Top 250 TV shows on IMDb) [Dataset]. https://www.kaggle.com/muby98/imdb-tv-show-data-sets-top-250-tv-shows-on-imdb
Explore at:
zip(5434310 bytes)Available download formats
Dataset updated
Apr 10, 2022
Authors
Mubarak Ganiyu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

IMDB hosts a lot of information about TV shows and movies. It is a web platform that is information rich in terms of details about movies, TV shows or mini-series. The information it has about the movies/TVshows are about the airdate, ratings, description, director, et al. In case of TV shows, it also has information about each episode. Due to its neat setup, IMDb is ripe for data collection. Hence, it is easy to create a data set on the TV shows on IMDb. A lot of people often hype up their favorite TV shows on a weekly basis, and it would be great to actually examine how these shows perform in comparison to one another. Hence, the objective of this project is to collect data on the episodes of all the TV shows in the top 250 TV shows on IMDb. Then, to proceed to analyze them to extract insights on how they perform.

Content

The three data sets are designed to focus mostly on the Top 250 shows on IMDb. After collecting information about the top 250 shows on IMDb and how they are ranked according to their ratings, information of their episodes was also collected to build a huge database in order to conduct in-depth analysis of the shows. Then, another data containing information about the top 1000 episodes on IMDb was collected to portray how the top 250 shows contribute towards producing prolific content. The two sub-folders include information about each shows' csv file data on their episodes information.

Resources

IMDb

Link to Top 250 TV shows on IMDb

Link to Top 1000 TV series episodes on IMDb

Repository

Here is a link to the GitHub Repository that contains the code that was used to collect data.

Dashboard

Here is a link to the dashboard that was built on Tableau using this data.

Inspiration

After watching too many TV shows and keeping track of their IMDb ratings, I was inspired to do research into how the best TV shows perform.

Data Collection date

The data was collected on February 13, 2022.
c
Permits Issued by Building Safety Dashboard
s.cnmilf.com
open.tempe.gov
+8more
Updated Jun 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2024). Permits Issued by Building Safety Dashboard [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/permits-issued-by-building-safety-dashboard
Explore at:
Dataset updated
Jun 29, 2024
Dataset provided by
City of Tempe
Description
The City of Tempe Building Safety Division dashboard provides a visual roadmap and detailed data on permits for various projects issued throughout Tempe by the divisions of Building Safety. Data can be filtered and searched several ways including by area, _location, type, category, and project valuations.This data assists city officials and stakeholders make decisions on how to best support the city's residents and needs.This dashboard contains data extracted from multiple tables that are part of the City of Tempe's instance of Accela Civic Platform. Data is extracted from Accela Civic Platform, transformed to meet the minimum requirements of the Building and Land Data Specification (BLDS), and loaded into its final state that is shown here.The permits issued data can be found here: Permits Issued by Building SafetyView this dashboard in a new window: Permits Issued by Building Safety DashboardAdditional Information:Source: AccelaContact: Jacob PayneContact E-Mail: Jacob_Payne@tempe.govData Source Type: TablePreparation Method: ManualPublish Frequency: WeeklyPublish Method: ManualData Dictionary
m
Minneapolis Crime Dashboard
opendata.minneapolismn.gov
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MapIT Minneapolis (2025). Minneapolis Crime Dashboard [Dataset]. https://opendata.minneapolismn.gov/items/8e75fbd328584aa8abb48fe0d6bf373b
Explore at:
Dataset updated
Feb 12, 2025
Dataset authored and provided by
MapIT Minneapolis
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Minneapolis
Description
View the dataFor best results:View the dashboard in full screen.Use Chrome or Firefox as your browser.Read the dataData viewsThere are two views with this dashboard. You can toggle between them by clicking the button on the top right of the dashboard.The views are:Crime summary viewCrime details viewViewing modesThere are ways to view with this dashboard. You can toggle between them by clicking the button.The modes to view the data are:DarkLightSearch the dataCrime summary viewThe search options allow you to select:Location: Options are citywide, each of the precincts, each of the wards, or each of the neighborhoods.Select Crime: Select a type of crime to display.Select Chart: Select a way to display the crime data.Crime detail viewThe search options allow you to select:Date range: Select a custom date range.Location: Options are citywide, each of the precincts, each of the wards, or each of the neighborhoods.Select Type: Select a type of crime.Select Categories: Select one or more categories of crime to display.Select Details: Select one or more details to filter the data displayed.Select Chart: Select a way to display the crime data.View dashboard data definitions and detailed directionsView the open data set
Visualizing Chicago Crime Data
kaggle.com
zip
Updated Jul 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elijah Toumoua (2022). Visualizing Chicago Crime Data [Dataset]. https://www.kaggle.com/datasets/elijahtoumoua/chicago-analysis-of-crime-data-dashboard
Explore at:
zip(94861784 bytes)Available download formats
Dataset updated
Jul 1, 2022
Authors
Elijah Toumoua
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Chicago
Description
Prelude

This dataset is a cleaned version of the Chicago Crime Dataset, which can be found here. All rights for the dataset go to the original owners. The purpose of this dataset is to display my skills in visualizations and creating dashboards. To be specific, I will attempt to create a dashboard that will allow users to see metrics for a specific crime within a given year using filters and metrics. Due to this, there will not be much of a focus on the analysis of the data, but there will be portions discussing the validity of the dataset, the steps I took to clean the data, and how I organized it. The cleaned datasets can be found below, the Query (which utilized BigQuery) can be found here and the Tableau dashboard can be found here.

About the Dataset

Important Facts

The dataset comes directly from the City of Chicago's website under the page "City Data Catalog." The data is gathered directly from the Chicago Police's CLEAR (Citizen Law Enforcement Analysis and Reporting) and is updated daily to present the information accurately. This means that a crime on a specific date may be changed to better display the case. The dataset represents crimes starting all the way from 2001 to seven days prior to today's date.

Reliability

Using the ROCCC method, we can see that: * The data has high reliability: The data covers the entirety of Chicago from a little over 2 decades. It covers all the wards within Chicago and even gives the street names. While we may not have an idea for how big the sample size is, I do believe that the dataset has high reliability since it geographically covers the entirety of Chicago. * The data has high originality: The dataset was gained directly from the Chicago Police Dept. using their database, so we can say this dataset is original. * The data is somewhat comprehensive: While we do have important information such as the types of crimes committed and their geographic location, I do not think this gives us proper insights as to why these crimes take place. We can pinpoint the location of the crime, but we are limited by the information we have. How hot was the day of the crime? Did the crime take place in a neighborhood with low-income? I believe that these key factors prevent us from getting proper insights as to why these crimes take place, so I would say that this dataset is subpar with how comprehensive it is. * The data is current: The dataset is updated frequently to display crimes that took place seven days prior to today's date and may even update past crimes as more information comes to light. Due to the frequent updates, I do believe the data is current. * The data is cited: As mentioned prior, the data is collected directly from the polices CLEAR system, so we can say that the data is cited.

Processing the Data

Cleaning the Dataset

The purpose of this step is to clean the dataset such that there are no outliers in the dashboard. To do this, we are going to do the following: * Check for any null values and determine whether we should remove them. * Update any values where there may be typos. * Check for outliers and determine if we should remove them.

The following steps will be explained in the code segments below. (I used BigQuery for this so the coding will follow BigQuery's syntax) ```

Examining the dataset

There are over 7.5 million rows of data

Putting a limit so it does not take a long time to run

SELECT * FROM portfolioproject-350601.ChicagoCrime.Crime LIMIT 1000;

Seeing which points are null

There are 85,000 null points so we can exclude them as it's not a significant amount since it is only ~1.3% of the dataset

Most of the null points are in the lat and long, which we will need later

Because we don't have the full address, we can't estimate the lat and long in SQL so we will have to delete the rows with Null Data

SELECT * FROM portfolioproject-350601.ChicagoCrime.Crime WHERE unique_key IS NULL OR case_number IS NULL OR date IS NULL OR primary_type IS NULL OR location_description IS NULL OR arrest IS NULL OR longitude IS NULL OR latitude IS NULL;

Deleting all null rows

DELETE FROM portfolioproject-350601.ChicagoCrime.Crime WHERE
unique_key IS NULL OR case_number IS NULL OR date IS NULL OR primary_type IS NULL OR location_description IS NULL OR arrest IS NULL OR longitude IS NULL OR latitude IS NULL;

Checking for any duplicates in the unique keys

None to be found

SELECT unique_key, COUNT(unique_key) FROM `portfolioproject-350601.ChicagoCrime....
Cambridge City Smart Sensor Traffic Counts - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Apr 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2025). Cambridge City Smart Sensor Traffic Counts - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/cambridge-city-smart-sensor-traffic-counts
Explore at:
Dataset updated
Apr 2, 2025
Dataset provided by
CKANhttps://ckan.org/
Area covered
Cambridge
Description
Update, Autumn 2024: We have now published an interactive dashboard which is designed to provide typical average daily flows by month or by site for the purposes of long-term trend monitoring. This approach to data provision will enable users to access data in a more timely fashion, as the dashboard refreshes on a daily basis. The data in this dashboard has also been cleaned to remove 'non-neutral' and erroneous days of data from average flow calculations. Please examine the front page of the dashboard for clarity on what this means. This dashboard is available at the following link: Cambridgeshire & Peterborough Insight – Roads, Transport and Active Travel – Traffic Flows – Traffic Flows Dashboard (cambridgeshireinsight.org.uk) The background: In spring and summer 2019, a series of smart traffic sensors were installed in Cambridge to monitor the impact of the Mill Road bridge closure. These sensors were installed for approximately 18 months in order to gather data before the closure, during the time when there was no vehicle traffic coming over Mill Road Bridge and then after the bridge re-opened. Due to the success of the sensors and the level of insight it is possible to gain, additional sensors have since been installed in more locations across the county. A traffic count sites map showing the locations of the permanent and annually monitored sites across the county, including the Vivacity sensor locations, is available on Cambridgeshire Insight. The data Data from the longer-term Vivacity sensors from 2019-2022 is available to download from the bottom of this page. The Vivacity sensor network grew considerably during 2022 and as a result, manual uploading of the data is no longer feasible. Consideration is currently being given to methods to streamline and/or automate Vivacity data sharing. The data below provides traffic counts at one-hour intervals, broken down into 8 vehicle categories. Data is provided (with caveats – see bottom of page) from the installation of the sensor up to 31/12/2022. The 8 vehicle categories are: 'Car', 'Pedestrian', 'Cyclist', 'Motorbike', 'Bus', 'OGV1', 'OGV2' and 'LGV'. The counts are broken down into inbound (In) and outbound (Out) journeys. Please see the 'Location List' below to establish which compass directions the 'In' and 'Out' are referring to for each sensor, as it differs by location. Some sensors record counts across multiple 'count-lines' which enables the sensor to provide more accurate counts at different points across the road, for example footways, cycle ways and the road. This is particularly useful for picking up pedestrians. Sensors with multiple count lines often present data for the road, the left-hand side footway (LHS) and the right-hand side footway (RHS) respectively. To determine the total flow, simply aggregate the centre, LHS and RHS count-lines. Please note that new countlines have been introduced over time for some sensors so care should be taken to make sure all necessary countlines are included when calculating a total flow. In some locations sensor hardware has been replaced and the sensor number has therefore changed (e.g. the Perne Road sensor was originaly named "16" but was subsequently replaced and renamed "44"). Please refer to the 'Location List' file which details the current and previous sensor numbers at each location. Caveats: 1. Data quality: A Vivacity sensor performance monitoring exercise was undertaken in 2022 to determine the level of accuracy of the Vivacity sensors. The findings of this exercise are documented in a technical note. The note helps to highlight data limitations and provides guidance on how best to work with the Vivacity data. A key finding within the note is that the v1 hardware Vivacity sensors (a small group of older hardware sensors) have been found to struggle to accurately count pedestrians and cyclists. As of December 2022, the only sensors that continue to use v1 hardware are on Milton Road (s13), Coleridge Road (s3), Vinery Road (s4), Coldham's Lane (s7), Devonshire Road cycle bridge (s12) and Hills Road (s14). Full details are provided within the tehcnical note. The note also helps to highlight data limitations and provides guidance on how best to work with the Vivacity data. 2. Data gaps: The sensors are designed to capture data 24 hours per day, 7 days per week however there are occasions when sensors go down and are not able to capture data or only capture partial data that is therefore not representative. The Research Group make every effort to remove data believed to be misleading but this cannot be guaranteed and the user is responsible for sense checking the data and excluding anything considered erroneous prior to use. The Research Group exclude days where very low or zero flows have been recorded for the day. Within the spreadsheets, these rows will simply appear blank when downloaded – indicating that the sensor is live and active during this time, but the output is not deemed reliable enough for publication. 3. British summer time / clocks changing: The data is provided in hourly intervals in the local time zone. When the clocks go forward at the end of March and the clocks go backwards at the end of October there are therefore missing / duplicate hours included within the data. On 27 October 2019, 25 October 2020, and 31 October 2021, all countlines will show two separate values for 1am. This is due to clocks going back at 1am in the morning on these dates. As these days were all 25-hours long we have kept both instances in the data for full transparency. Similarly, all countlines on 29 March 2020, 28 March 2021, and 27 March 2022 will show no values at all for 1-2am. This is due to the clocks going forward by one hour on these dates meaning they were 23-hour days.
U
Dataset for Health Education Journal, 'The promise of teacher-led physical...
researchdata.bath.ac.uk
Updated Oct 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgina Wort; Gareth Wiltshire; Simon Sebire; Oliver Peacock; Dylan Thompson (2024). Dataset for Health Education Journal, 'The promise of teacher-led physical activity strategies informed by pupil data' [Dataset]. http://doi.org/10.15125/BATH-01438
Explore at:
Unique identifier
https://doi.org/10.15125/BATH-01438
Dataset updated
Oct 10, 2024
Dataset provided by
University of Bath
Authors
Georgina Wort; Gareth Wiltshire; Simon Sebire; Oliver Peacock; Dylan Thompson
Dataset funded by
Economic and Social Research Council
Description
The project in which this data comes from was designed to take advantage of the recent introduction of commercially available wearable devices specifically designed for the school environment (Moki Technology Ltd. ©, 2021). The devices used includes a wrist-mounted tri-accelerometer without a screen, so pupils’ access to their data is controlled by the teacher. Accelerometers return data using proprietary algorithms that include estimated step count and moderate to vigorous physical activity (MVPA) averaged over 30-minute blocks. Devices are tapped against a contactless (near-field communication) reader to instantly download data which are displayed on a teacher-facing dashboard. Moki devices have good external validity and represent a good option for school-based research (Sun et al., 2021). This type of system also makes it possible to conduct research remotely (e.g., during COVID-19).

The current study sought to share PA data from school classes with the class teacher so they could develop bespoke strategies aiming to increase PA amongst their pupils. Within-subjects’ in-school PA was measured pre- and post- data-sharing discussions. Pupils in each school aged 8-11 (Year 5-6 pupils) were invited to participate and were provided with participant information sheets for themselves and their parents (N = 489).

The dataset includes physical activity data in the form of step counts (Steps) in 30-minute blocks for each anonymised pupil and teacher (Unique ID), and information about the year and month of birth (Year), (Gender), the date (Date) and (Time) step counts were recorded. The data also details which (School) the pupils come from, whether the data is from a (Teacher), and whether this was during the (Baseline or Intervention) period.
Solana Blockchain Dataset
kaggle.com
zip
Updated Oct 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Solana Blockchain Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/solana-blockchain-dataset/code
Explore at:
zip(17430 bytes)Available download formats
Dataset updated
Oct 31, 2022
Authors
The Devastator
Description
Solana Blockchain Dataset

Insights into Blockchain Usage, Adoption, and Growth

About this dataset

This dataset provides valuable insights into the usage of the Solana blockchain. The data includes information on blockchain activity, new users, and programs. This data can be used to create dashboards and visualizations to better understand the Solana ecosystem.

The data shows that the Solana ecosystem is growing steadily, with new users and programs being created every week. The data also shows that the majority of users are active on the blockchain, with only a small number of users who have not used the blockchain in recent days.

This dataset provides valuable insights into the usage of the Solana blockchain and can be used to create dashboards and visualizations to better understand the Solana ecosystem

How to use the dataset

This data provides valuable insights into the usage of the Solana blockchain. The data includes information on blockchain activity, new users, and programs. This data can be used to create dashboards and visualizations to better understand the Solana ecosystem.

When exploring the data, you may want to focus on specific columns such as 'Blockchain', 'Creator', 'Address', or 'Label Type'. You can use filters to look at only certain rows of data that are relevant to your research question.

Once you have filtered the data down to a manageable set, you can begin creating visualizations. Visualizations can help you spot trends and patterns that might not be immediately apparent from looking at raw data. For example, if you create a line graph of weekly program creation over time, you might notice an overall upward trend indicating increasing adoption of Solana by developers.

Research Ideas

This data can be used to create a dashboard that tracks the number of new users of the Solana blockchain.

This data can be used to create a dashboard that tracks the number of new programs created on the Solana blockchain.

This data can be used to create a dashboard that tracks the number of days since the last use of the Solana blockchain

Columns

File: program_flipside_labels.csv | Column name | Description | |:------------------|:-----------------------------------------------| | BLOCKCHAIN | The blockchain the transaction is on. (String) | | CREATOR | The creator of the transaction. (String) | | ADDRESS | The address the transaction is from. (String) | | LABEL_TYPE | The type of label. (String) | | LABEL_SUBTYPE | The subtype of label. (String) | | ADDRESS_NAME | The name of the address. (String) |

File: program_solana_fm_labels.csv | Column name | Description | |:-----------------|:-------------------------------------------------| | ADDRESS | The address the transaction is from. (String) | | FriendlyName | The name of the blockchain. (String) | | Abbreviation | The abbreviation for the blockchain. (String) | | Category | The category the blockchain falls into. (String) | | Flag | The flag for the blockchain. (String) | | LogoURI | The logo for the blockchain. (String) |

File: weekly_days_active.csv | Column name | Description | |:------------------|:----------------------------------------------------------| | CREATION_DATE | The date the account was created. (Date) | | Days Active | The number of days the account has been active. (Numeric) |

File: weekly_days_since_last_use.csv | Column name | Description | |:------------------------|:--------------------------------------------------------------| | CREATION_DATE | The date the account was created. (Date) | | Days since last use | The number of days since the account was last used. (Integer) | | Days since creation | The number of days since the account was created. (Integer) |

File: weekly_new_program.csv | Column name | Description | |:-----------------|:--------------------------------------------------------| | WEEK | The week the data is from. (String) | | New Programs | The number of new programs on the blockchain. (Integer) |

File: weekly_new_users.csv | Column name | Description | |:--------------|:------------------------------------------------| | WEEK | The week the data is from. (String) | | NEW_USERS | The number of new users for the week. (Intege...
WHO Malaysia Health Indicators
kaggle.com
zip
Updated Jan 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). WHO Malaysia Health Indicators [Dataset]. https://www.kaggle.com/datasets/thedevastator/who-malaysia-health-indicators
Explore at:
zip(752315 bytes)Available download formats
Dataset updated
Jan 28, 2023
Authors
The Devastator
Area covered
Malaysia
Description
WHO Malaysia Health Indicators

Malaria, HIV/STIs, Suicide, CVD, Mortality, and more

By Humanitarian Data Exchange [source]

About this dataset

This dataset contains a range of indicators related to health, health systems, and sustainable development from the World Health Organization's data portal. It covers topics ranging from mortality and global health estimates to essential health technologies, youth engagement, mental health initiatives, and infectious diseases. With data points including publich state codes and display values, this dataset provides detailed insight into how healthcare is managed all around the globe. From tracking malaria outbreaks to exploring various international agreements on public healthcare initiatives, this dataset offers a wide array of powerful information for machine learning projects that are designed to improve our understanding of global healthcare trends. Explore the correlations between different countries' universal healthcare coverage measures or investigate any discrepancies between developed and developing nations - unlock deeper insights with the WHO's extensive data!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Getting Started: First, you need to download the dataset from Kaggle. Once you have it saved in your computer, open it with a spreadsheet software such as Excel or Google Sheets.

Exploring the Data: The dataset contains columns that offer information about indicators related to health in Malaysia including mortality rates, prevention programs and providers, financing information, human resource information, and more. To explore particular aspects of this data you should filter the rows using any of these column values. For example if you want results for a specific year or region you can filter by ‘year’ or ‘region’ accordingly. It’s important to note that some columns have relation between them (e.g., country code corresponds with country display name).

Data Outputs:
Using this dataset allows users to generate visual representations such as graphs which can help display trends over time regarding our stability goals concerning human resources funding rates or pregnancies outcomes among other variables included in our report summary outputs on WHO dashboard at global level specifically representing data coming from our members countries likeMalaysia making sense out these actions performed by several governments highlights where we still have areas lacking risk mitigation efforts and core elements when tryingto achieve better life quality around world aiming better efficiency through good governance practices supported on demand reduction strategies coming from healthcare professionals expertise frame work .

Conclusion:

Research Ideas

Analysis of health coverage and services in Malaysia, allowing comparison between different public health organizations and the effect of specific prevention programs.

Identification of gaps between existing healthcare access and provide a standardized data-driven reference point to ensure equitable access across different regions in the country.

Creation of interactive geographical dashboards that display comparisons among relevant indicators, providing visual representation on how to best target distribution resources for optimal impact

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: rsud-service-organization-and-delivery-prevention-programs-and-providers-indicators-for-malaysia-38.csv | Column name | Description | |:--------------------------------------|:----------------------------------------------------------------| | GHO (CODE) | The Global Health Observatory code for the indicator. (String) | | GHO (DISPLAY) | The name of the indicator. (String) | | GHO (URL) | The URL for the indicator. (URL) | | PUBLISHSTATE (CODE) | The code for the publishing state of the indicator. (String) | | PUBLISHSTATE (DISPLAY) | The name of the publishing state of the indicator. (String) | | PUBLISHSTATE (URL) | The URL for the publishing state of the indicator. (URL) | | YEAR (CODE) | The code for...
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mubarak Ganiyu (2022). IMDb TV show data sets (Top 250 TV shows on IMDb) [Dataset]. https://www.kaggle.com/muby98/imdb-tv-show-data-sets-top-250-tv-shows-on-imdb

IMDb TV show data sets (Top 250 TV shows on IMDb)

This is a collection of data sets focusing on the top 250 shows on IMDb

Explore at:

zip(5434310 bytes)Available download formats

Dataset updated

Apr 10, 2022

Authors

Mubarak Ganiyu

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

IMDB hosts a lot of information about TV shows and movies. It is a web platform that is information rich in terms of details about movies, TV shows or mini-series. The information it has about the movies/TVshows are about the airdate, ratings, description, director, et al. In case of TV shows, it also has information about each episode. Due to its neat setup, IMDb is ripe for data collection. Hence, it is easy to create a data set on the TV shows on IMDb. A lot of people often hype up their favorite TV shows on a weekly basis, and it would be great to actually examine how these shows perform in comparison to one another. Hence, the objective of this project is to collect data on the episodes of all the TV shows in the top 250 TV shows on IMDb. Then, to proceed to analyze them to extract insights on how they perform.

Content

The three data sets are designed to focus mostly on the Top 250 shows on IMDb. After collecting information about the top 250 shows on IMDb and how they are ranked according to their ratings, information of their episodes was also collected to build a huge database in order to conduct in-depth analysis of the shows. Then, another data containing information about the top 1000 episodes on IMDb was collected to portray how the top 250 shows contribute towards producing prolific content. The two sub-folders include information about each shows' csv file data on their episodes information.

Resources

Repository

Here is a link to the GitHub Repository that contains the code that was used to collect data.

Dashboard

Here is a link to the dashboard that was built on Tableau using this data.

Inspiration

After watching too many TV shows and keeping track of their IMDb ratings, I was inspired to do research into how the best TV shows perform.

Data Collection date

The data was collected on February 13, 2022.

Clear search

Close search

Google apps

Main menu

IMDb TV show data sets (Top 250 TV shows on IMDb)

Context

Content

Resources

Repository

Dashboard

Inspiration

Data Collection date

Permits Issued by Building Safety Dashboard

Minneapolis Crime Dashboard

Visualizing Chicago Crime Data

Prelude

About the Dataset

Important Facts

Reliability

Processing the Data

Cleaning the Dataset

Examining the dataset

There are over 7.5 million rows of data

Putting a limit so it does not take a long time to run

Seeing which points are null

There are 85,000 null points so we can exclude them as it's not a significant amount since it is only ~1.3% of the dataset

Most of the null points are in the lat and long, which we will need later

Because we don't have the full address, we can't estimate the lat and long in SQL so we will have to delete the rows with Null Data

Deleting all null rows

Checking for any duplicates in the unique keys

None to be found

Cambridge City Smart Sensor Traffic Counts - Dataset - data.gov.uk

Dataset for Health Education Journal, 'The promise of teacher-led physical...

Solana Blockchain Dataset

Solana Blockchain Dataset

Insights into Blockchain Usage, Adoption, and Growth

About this dataset

How to use the dataset

Research Ideas

Columns

WHO Malaysia Health Indicators

WHO Malaysia Health Indicators

Malaria, HIV/STIs, Suicide, CVD, Mortality, and more

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

IMDb TV show data sets (Top 250 TV shows on IMDb)

This is a collection of data sets focusing on the top 250 shows on IMDb

Context

Content

Resources

Repository

Dashboard

Inspiration

Data Collection date