https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Column Name | Description |
---|---|
Rank | The ranking of the movie based on popularity or ratings. |
Title | The title of the movie. |
Genre | The genre(s) of the movie (e.g., Action, Adventure, Sci-Fi). |
Description | A brief description or synopsis of the movie. |
Director | The director of the movie. |
Actors | The main cast or leading actors in the movie. |
Year | The release year of the movie. |
Runtime (Minutes) | The runtime of the movie in minutes. |
Rating | The IMDb user rating of the movie on a scale from 1 to 10. |
Votes | The number of user votes for the movie on IMDb. |
Revenue (Millions) | The box office revenue of the movie in millions of dollars. |
Metascore | The Metascore of the movie, representing the aggregated critic reviews score on a scale of 1 to 100. |
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Unlock one of the most comprehensive movie datasets available—4.5 million structured IMDb movie records, extracted and enriched for data science, machine learning, and entertainment research.
This dataset includes a vast collection of global movie metadata, including details on title, release year, genre, country, language, runtime, cast, directors, IMDb ratings, reviews, and synopsis. Whether you're building a recommendation engine, benchmarking trends, or training AI models, this dataset is designed to give you deep and wide access to cinematic data across decades and continents.
Perfect for use in film analytics, OTT platforms, review sentiment analysis, knowledge graphs, and LLM fine-tuning, the dataset is cleaned, normalized, and exportable in multiple formats.
Genres: Drama, Comedy, Horror, Action, Sci-Fi, Documentary, and more
Train LLMs or chatbots on cinematic language and metadata
Build or enrich movie recommendation engines
Run cross-lingual or multi-region film analytics
Benchmark genre popularity across time periods
Power academic studies or entertainment dashboards
Feed into knowledge graphs, search engines, or NLP pipelines
Traffic analytics, rankings, and competitive metrics for imdb.com as of June 2025
This dataset was created by Igor Costa da Silva Estevao de Azevedo
It contains the following files:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘IMDB Movies Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harshitshankhdhar/imdb-dataset-of-top-1000-movies-and-tv-shows on 13 November 2021.
--- Dataset description provided by original source is as follows ---
IMDB Dataset of top 1000 movies and tv shows. You can find the EDA Process on - https://www.kaggle.com/harshitshankhdhar/eda-on-imdb-movies-dataset
Please consider UPVOTE if you found it useful.
Data:- - Poster_Link - Link of the poster that imdb using - Series_Title = Name of the movie - Released_Year - Year at which that movie released - Certificate - Certificate earned by that movie - Runtime - Total runtime of the movie - Genre - Genre of the movie - IMDB_Rating - Rating of the movie at IMDB site - Overview - mini story/ summary - Meta_score - Score earned by the movie - Director - Name of the Director - Star1,Star2,Star3,Star4 - Name of the Stars - No_of_votes - Total number of votes - Gross - Money earned by that movie
--- Original source retains full ownership of the source dataset ---
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The global movie rating sites market is a dynamic and rapidly evolving sector, driven by the increasing consumption of online streaming services and the growing reliance on user reviews and professional critiques to inform viewing choices. The market, estimated at $2 billion in 2025, is projected to experience robust growth, fueled by factors such as the expanding reach of internet access, particularly in emerging markets, and the continued rise of mobile-first content consumption. Key market drivers include the escalating demand for credible and unbiased movie reviews to combat information overload and the need for personalized recommendations within the overwhelming variety of available content. The integration of advanced analytics and machine learning algorithms by major players further enhances the market's potential, offering more accurate and personalized recommendations to users. Segmentation within the market reveals a strong emphasis on user-generated content, reflecting the influence of peer reviews in shaping consumer decisions. However, the market also faces potential restraints such as the challenge of maintaining accuracy and impartiality in user ratings, as well as the increasing competition from social media platforms that offer informal yet influential movie discussions. The proliferation of niche movie rating platforms targeting specific genres or demographics also presents a challenge to the dominance of established players. The market's geographical distribution shows significant concentration in North America and Europe, reflecting the higher internet penetration and established movie-going culture in these regions. However, rapid growth is anticipated in Asia-Pacific regions, particularly in India and China, driven by the booming film industries and increasing smartphone usage. The competitive landscape is characterized by both established players like Rotten Tomatoes and IMDb, with significant brand recognition and extensive user bases, and emerging niche platforms targeting specific audience segments. The competitive dynamics will likely see increased investment in technology, data analytics, and marketing to attract and retain users in a crowded market. Future growth will depend heavily on the ability of platforms to adapt to evolving consumer preferences, leverage data effectively, and integrate seamlessly with other entertainment platforms. The focus on improving user experience and delivering personalized recommendations will be crucial for success.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
To visualize numerical data episode-wise and comparative analysis with other famous TV-shows.
# of season, # of episode, title, year, and other numerical data such as IMDb ratings, IMDb votes, US views
Data collected from here https://www.ratingraph.com/tv-shows/breaking-bad-ratings-26165/ https://www.wikiwand.com/en/List_of_Breaking_Bad_episodes
Saw some cool visualizations in reddit few days back but couldn't find anymore. :(
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for "imdb"
Dataset Summary
Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
Supported Tasks and Leaderboards
More Information Needed
Languages
More Information Needed
Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/imdb.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
This is the sentiment analysis dataset based on IMDB reviews initially released by Stanford University. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided. See the README file contained in the release for more… See the full description on the dataset page: https://huggingface.co/datasets/scikit-learn/imdb.
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The global movie rating sites market is experiencing robust growth, driven by the increasing consumption of online streaming services and the rising demand for credible film reviews before purchasing tickets or subscribing. The market's expansion is fueled by several factors, including the proliferation of smartphones and internet access, making it easier for users to access rating platforms. Furthermore, the integration of social media features on many platforms fosters engagement and user-generated content, creating a dynamic and interactive ecosystem. The market is segmented by application (movie promotion, movie research, audience choice, and others) and by rating type (user-based, professional-based, and others). While precise market sizing data is unavailable, given the significant presence of established players like Rotten Tomatoes and IMDb, and considering the considerable global viewership of movies, we can estimate the 2025 market size to be approximately $2 billion. This estimation accounts for advertising revenue, premium subscriptions (where applicable), and potential data licensing to film studios and distributors. The projected CAGR suggests continued substantial growth throughout the forecast period (2025-2033), likely driven by technological advancements and the ever-growing global movie-watching audience. However, potential restraints include the risk of biased reviews and the increasing competition from new platforms and emerging technologies like AI-powered recommendation systems. The North American market currently holds a significant share due to the established presence of major players and a large movie-going audience. However, rapid growth is anticipated in the Asia-Pacific region, particularly in countries like India and China, fueled by the expansion of streaming platforms and increasing internet penetration. Europe, with its diverse film culture and established digital infrastructure, also represents a substantial market segment. Competitive pressures are intensifying, with existing players continually innovating to enhance user experiences, introduce new features, and attract and retain users in a crowded market. The market's future trajectory will be shaped by the strategic moves of key players, technological disruptions, and evolving consumer preferences regarding how they discover and choose movies to watch. Strategic partnerships and acquisitions could also play a significant role in shaping the market landscape in the coming years.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context Movies are a powerful lens into culture, emotion, and storytelling. This dataset brings together the top 260 highest-rated movies with enriched metadata from two authoritative sources: TMDb (The Movie Database) and OMDb (Open Movie Database).
It is ideal for researchers, data scientists, and developers working on: -Movie recommendation systems -NLP with plot summaries -Data visualization of film trends -Sentiment and genre analysis
Overview Category Detail - Records 260 top-rated movies based on TMDb user ratings - Timeframe Includes titles from classic to contemporary cinema - Metadata Title, Release Year, IMDb Rating, Genre(s), Runtime, Director, Plot - Sources TMDb API, OMDb API (retrieved via custom Python scripts) - Format Single CSV file: tmdb_top260_with_imdb.csv
Column Name | Description | Data Type |
---|---|---|
Title | Official title of the movie | String |
Year | Year the movie was released | Integer |
IMDb Rating | IMDb user rating (scale of 1–10) | Float |
Runtime | Duration of the movie (e.g., "142 min") | String |
Genre | Comma-separated list of genres | String |
Director | Name(s) of the movie’s director(s) | String |
Actors | Leading cast members listed on IMDb | String |
Plot | Short summary or synopsis of the storyline | String |
Files - tmdb_top260_with_imdb.csv Each row represents one film
Key Features - Multi-source Integration: Combines crowd-sourced user ratings (TMDb) with metadata-rich records (OMDb). - Diverse Genre Coverage: Drama, thriller, animation, sci-fi, and more. - Chronological Range: Spans across decades from vintage masterpieces to modern blockbusters. - Plot Summaries Included: Excellent for NLP projects like topic modeling, keyword extraction, or classification. - Standardized Format: Clean, ready-to-use data for ML, visualization, or statistical analysis.
Use Cases This dataset is well-suited for: - Recommendation Systems: Build hybrid or content-based models using genre, director, and plot. - Natural Language Processing: Use plot summaries for sentiment analysis or thematic clustering. - Trend Analysis: Explore how movie length, genres, or ratings evolved over time. - Director Impact: Analyze how specific filmmakers influence ratings or genre styles.
Licensing This dataset is released under the Creative Commons Zero (CC0) license. It is free to use for personal, academic, or commercial purposes with no attribution required.
This dataset provides valuable insights derived from an analysis of IMDB movie data, specifically tailored to inform strategic decision-making for film production companies. It offers a comprehensive overview of trends in movie genres, release timing, ratings, top-performing directors and actors, and potential production partners.
The analysis includes:
Monthly Production Trends: Identifies peak production months and average annual output.
Genre Popularity: Analyzes genre popularity based on quantity and average duration.
Rating Distribution: Reveals common rating ranges and target ratings for success.
High-Rated Production Houses: Highlights production houses associated with top-rated films.
Top Directors: Lists directors with a track record of successful films.
Popular Actors: Identifies popular actors with high average ratings and vote counts.
Potential Global Partners: Suggests potential global partners based on audience reach.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is a curated collection of IMDb's top 250 movies, capturing the unique qualities that make each film a standout. For each movie, you’ll find details like the title, IMDb rating, genre, release date, director, writers, and actors. This gives a snapshot of what defines each film. There’s also a link to the IMDb page for each movie to make it easy to dive deeper into any title that catches your interest. This dataset is perfect for anyone looking to analyze film trends, explore popular genres, or just get a better understanding of what makes these films so iconic.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains detailed metadata and user reviews for movies. It includes information such as movie titles, genres, user scores, certifications, metascores, directors, top cast members, plot summaries, and user reviews. The data was scraped from IMDb and may contain some inconsistencies and missing values, making it a great resource for practicing data cleaning and preprocessing.
The dataset may include the following issues:
This dataset is shared under the MIT License. If you use this data, please attribute IMDb as the source.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Complete dataset of “Film Circulation on the International Film Festival Network and the Impact on Global Film Culture”
A peer-reviewed data paper for this dataset is in review to be published in NECSUS_European Journal of Media Studies - an open access journal aiming at enhancing data transparency and reusability, and will be available from https://necsus-ejms.org/ and https://mediarep.org
Please cite this when using the dataset.
Detailed description of the dataset:
1 Film Dataset: Festival Programs
The Film Dataset consists a data scheme image file, a codebook and two dataset tables in csv format.
The codebook (csv file “1_codebook_film-dataset_festival-program”) offers a detailed description of all variables within the Film Dataset. Along with the definition of variables it lists explanations for the units of measurement, data sources, coding and information on missing data.
The csv file “1_film-dataset_festival-program_long” comprises a dataset of all films and the festivals, festival sections, and the year of the festival edition that they were sampled from. The dataset is structured in the long format, i.e. the same film can appear in several rows when it appeared in more than one sample festival. However, films are identifiable via their unique ID.
The csv file “1_film-dataset_festival-program_wide” consists of the dataset listing only unique films (n=9,348). The dataset is in the wide format, i.e. each row corresponds to a unique film, identifiable via its unique ID. For easy analysis, and since the overlap is only six percent, in this dataset the variable sample festival (fest) corresponds to the first sample festival where the film appeared. For instance, if a film was first shown at Berlinale (in February) and then at Frameline (in June of the same year), the sample festival will list “Berlinale”. This file includes information on unique and IMDb IDs, the film title, production year, length, categorization in length, production countries, regional attribution, director names, genre attribution, the festival, festival section and festival edition the film was sampled from, and information whether there is festival run information available through the IMDb data.
2 Survey Dataset
The Survey Dataset consists of a data scheme image file, a codebook and two dataset tables in csv format.
The codebook “2_codebook_survey-dataset” includes coding information for both survey datasets. It lists the definition of the variables or survey questions (corresponding to Samoilova/Loist 2019), units of measurement, data source, variable type, range and coding, and information on missing data.
The csv file “2_survey-dataset_long-festivals_shared-consent” consists of a subset (n=161) of the original survey dataset (n=454), where respondents provided festival run data for films (n=206) and gave consent to share their data for research purposes. This dataset consists of the festival data in a long format, so that each row corresponds to the festival appearance of a film.
The csv file “2_survey-dataset_wide-no-festivals_shared-consent” consists of a subset (n=372) of the original dataset (n=454) of survey responses corresponding to sample films. It includes data only for those films for which respondents provided consent to share their data for research purposes. This dataset is shown in wide format of the survey data, i.e. information for each response corresponding to a film is listed in one row. This includes data on film IDs, film title, survey questions regarding completeness and availability of provided information, information on number of festival screenings, screening fees, budgets, marketing costs, market screenings, and distribution. As the file name suggests, no data on festival screenings is included in the wide format dataset.
3 IMDb & Scripts
The IMDb dataset consists of a data scheme image file, one codebook and eight datasets, all in csv format. It also includes the R scripts that we used for scraping and matching.
The codebook “3_codebook_imdb-dataset” includes information for all IMDb datasets. This includes ID information and their data source, coding and value ranges, and information on missing data.
The csv file “3_imdb-dataset_aka-titles_long” contains film title data in different languages scraped from IMDb in a long format, i.e. each row corresponds to a title in a given language.
The csv file “3_imdb-dataset_awards_long” contains film award data in a long format, i.e. each row corresponds to an award of a given film.
The csv file “3_imdb-dataset_companies_long” contains data on production and distribution companies of films. The dataset is in a long format, so that each row corresponds to a particular company of a particular film.
The csv file “3_imdb-dataset_crew_long” contains data on names and roles of crew members in a long format, i.e. each row corresponds to each crew member. The file also contains binary gender assigned to directors based on their first names using the GenderizeR application.
The csv file “3_imdb-dataset_festival-runs_long” contains festival run data scraped from IMDb in a long format, i.e. each row corresponds to the festival appearance of a given film. The dataset does not include each film screening, but the first screening of a film at a festival within a given year. The data includes festival runs up to 2019.
The csv file “3_imdb-dataset_general-info_wide” contains general information about films such as genre as defined by IMDb, languages in which a film was shown, ratings, and budget. The dataset is in wide format, so that each row corresponds to a unique film.
The csv file “3_imdb-dataset_release-info_long” contains data about non-festival release (e.g., theatrical, digital, tv, dvd/blueray). The dataset is in a long format, so that each row corresponds to a particular release of a particular film.
The csv file “3_imdb-dataset_websites_long” contains data on available websites (official websites, miscellaneous, photos, video clips). The dataset is in a long format, so that each row corresponds to a website of a particular film.
The dataset includes 8 text files containing the script for webscraping. They were written using the R-3.6.3 version for Windows.
The R script “r_1_unite_data” demonstrates the structure of the dataset, that we use in the following steps to identify, scrape, and match the film data.
The R script “r_2_scrape_matches” reads in the dataset with the film characteristics described in the “r_1_unite_data” and uses various R packages to create a search URL for each film from the core dataset on the IMDb website. The script attempts to match each film from the core dataset to IMDb records by first conducting an advanced search based on the movie title and year, and then potentially using an alternative title and a basic search if no matches are found in the advanced search. The script scrapes the title, release year, directors, running time, genre, and IMDb film URL from the first page of the suggested records from the IMDb website. The script then defines a loop that matches (including matching scores) each film in the core dataset with suggested films on the IMDb search page. Matching was done using data on directors, production year (+/- one year), and title, a fuzzy matching approach with two methods: “cosine” and “osa.” where the cosine similarity is used to match titles with a high degree of similarity, and the OSA algorithm is used to match titles that may have typos or minor variations.
The script “r_3_matching” creates a dataset with the matches for a manual check. Each pair of films (original film from the core dataset and the suggested match from the IMDb website was categorized in the following five categories: a) 100% match: perfect match on title, year, and director; b) likely good match; c) maybe match; d) unlikely match; and e) no match). The script also checks for possible doubles in the dataset and identifies them for a manual check.
The script “r_4_scraping_functions” creates a function for scraping the data from the identified matches (based on the scripts described above and manually checked). These functions are used for scraping the data in the next script.
The script “r_5a_extracting_info_sample” uses the function defined in the “r_4_scraping_functions”, in order to scrape the IMDb data for the identified matches. This script does that for the first 100 films, to check, if everything works. Scraping for the entire dataset took a few hours. Therefore, a test with a subsample of 100 films is advisable.
The script “r_5b_extracting_info_all” extracts the data for the entire dataset of the identified matches.
The script “r_5c_extracting_info_skipped” checks the films with missing data (where data was not scraped) and tried to extract data one more time to make sure that the errors were not caused by disruptions in the internet connection or other technical issues.
The script “r_check_logs” is used for troubleshooting and tracking the progress of all of the R scripts used. It gives information on the amount of missing values and errors.
4 Festival Library Dataset
The Festival Library Dataset consists of a data scheme image file, one codebook and one dataset, all in csv format.
The codebook (csv file “4_codebook_festival-library_dataset”) offers a detailed description of all variables within the Library Dataset. It lists the definition of variables, such as location and festival name, and festival categories,
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This contains different csv files related top 50 movies according to different genres. Having fields like the duration of the movie , the Director ,Rating of the movie ,How many people voted for the rating ,the amount that the movie made all around the world and the description of the movie can be used to analyze why certain highly rated movies attracted many people
This dataset brings together the Top 5000 highest-rated TV shows according to IMDb users. It was curated to enable analysis of rating patterns, popularity trends, genres, and other relevant attributes in the TV show landscape.
Data Source: https://developer.imdb.com/non-commercial-datasets/
Processing and Code Repository: https://github.com/TiagoAdriaNunes/imdb_top_5000_tv_shows/blob/main/imdb_tv_shows_analysis.R
Purpose: Inspired by the structure of the "IMDB Top 5000 Movies" dataset, this version focuses exclusively on TV series, offering a solid base for data analysis and visualization projects in the entertainment domain.
Shiny App for Data Visualization: https://tiagoadrianunes.shinyapps.io/IMDB_TOP_5000_TV_SHOWS/
Kaggle Notebook using this dataset: https://www.kaggle.com/code/tiagoadrianunes/imdb-top-5000-tv-shows-notebook
Information courtesy of IMDb (https://www.imdb.com). Used with permission.
See also the Movies version: https://www.kaggle.com/datasets/tiagoadrianunes/imdb-top-5000-movies
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Thank you for viewing my dataset, looking forward to seeing some codes.
Attribution-NonCommercial-ShareAlike 1.0 (CC BY-NC-SA 1.0)https://creativecommons.org/licenses/by-nc-sa/1.0/
License information was derived automatically
El conjunto de datos para este proyecto contendrá los siguientes descriptivos sobre películas y series de IMDb, lo que permitirá analizar las distintas tendencias en la industria: Title, Year, Genres, Directors, Actors, Rating, Reviews, Duration, Type, Episode, Season, Budget, Revenue. Estos campos creo que son lo suficientemente descriptivos como para permitirnos un análisis en profundidad de las películas, series, actores, directores, etc. a lo largo del tiempo.
· Title: El título de la película o serie.
· Year: El año en que se lanzó la película o serie.
· Genres: El género de la película o serie (por ejemplo, drama, comedia, acción, etc.).
· Directors: El director de la película o serie.
· Actors: Los actores principales de la película o serie.
· Rating: La calificación de la película o serie en IMDb.
· Reviews: El número de reseñas de usuarios para la película o serie.
· Duration: La duración de la película o serie en minutos.
· Type: Si es una película o serie.
· Episode: El número de episodios si es una serie.
· Season: El número de temporadas si es una serie.
· Budget: El presupuesto de la película o serie.
· Revenue: La recaudación de la película o serie.
Los datos del conjunto abarcan un periodo de tiempo que se extiende desde el lanzamiento de IMDb en octubre de 1990 hasta el presente mes de abril de 2024.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy