22 datasets found

IMDB movie details dataset
crawlfeeds.com
csv, zip
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). IMDB movie details dataset [Dataset]. https://crawlfeeds.com/datasets/imdb-movie-details-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description

The IMDB Movie Details Dataset is a comprehensive collection of movie datasets that offers a treasure trove of information about movies, TV shows, and streaming content listed on IMDB. This dataset includes detailed data such as titles, release years, genres, cast, crew, ratings, and more, making it a go-to resource for film and entertainment enthusiasts. Ideal for data analysis, IMDB movie dataset applications span machine learning projects, predictive modeling, and insights into industry trends.

Researchers can explore patterns in movie ratings and genre popularity, while developers can use the dataset to build recommendation systems or applications. Movie buffs can dive deep into historical and contemporary trends in the world of cinema. This dataset not only supports academic and professional pursuits but also opens doors for creative projects in storytelling, content creation, and audience engagement. Whether you’re a developer, researcher, or film enthusiast, the IMDB movie dataset is a powerful tool for uncovering trends and gaining deeper insights into the evolving entertainment landscape.

IMDB Movie Dataset

kaggle.com

Updated Oct 30, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Yusuf Delikkaya (2024). IMDB Movie Dataset [Dataset]. https://www.kaggle.com/datasets/yusufdelikkaya/imdb-movie-dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 30, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Yusuf Delikkaya

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Description:

The dataset comprises anonymized data on movies available on IMDb, capturing various aspects such as genre, rating, and revenue.
This dataset can be utilized for analyzing movie trends, audience preferences, and the impact of different attributes like genre and director on movie success.
It can aid in understanding the factors contributing to high ratings and box office revenue, as well as providing insights into the popularity of genres over time.
This dataset can be utilized for analyzing movie success factors, audience preferences, and genre trends.
It can help in identifying the relationship between movie features (e.g., genre, director) and ratings or revenue, examining the popularity of actors and directors, and understanding critical reception through Metascore.

Features:

Column Name	Description
Rank	The ranking of the movie based on popularity or ratings.
Title	The title of the movie.
Genre	The genre(s) of the movie (e.g., Action, Adventure, Sci-Fi).
Description	A brief description or synopsis of the movie.
Director	The director of the movie.
Actors	The main cast or leading actors in the movie.
Year	The release year of the movie.
Runtime (Minutes)	The runtime of the movie in minutes.
Rating	The IMDb user rating of the movie on a scale from 1 to 10.
Votes	The number of user votes for the movie on IMDb.
Revenue (Millions)	The box office revenue of the movie in millions of dollars.
Metascore	The Metascore of the movie, representing the aggregated critic reviews score on a scale of 1 to 100.

IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage)
crawlfeeds.com
csv, zip
Updated Aug 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage) [Dataset]. https://crawlfeeds.com/datasets/imdb-movies-metadata-dataset-4-5m-records-global-coverage
Explore at:
csv, zipAvailable download formats
Dataset updated
Aug 10, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Unlock one of the most comprehensive movie datasets available—4.5 million structured IMDb movie records, extracted and enriched for data science, machine learning, and entertainment research.

This dataset includes a vast collection of global movie metadata, including details on title, release year, genre, country, language, runtime, cast, directors, IMDb ratings, reviews, and synopsis. Whether you're building a recommendation engine, benchmarking trends, or training AI models, this dataset is designed to give you deep and wide access to cinematic data across decades and continents.

Perfect for use in film analytics, OTT platforms, review sentiment analysis, knowledge graphs, and LLM fine-tuning, the dataset is cleaned, normalized, and exportable in multiple formats.

What’s Included:

Genres: Drama, Comedy, Horror, Action, Sci-Fi, Documentary, and more

Delivery: Direct download

Use Cases:

Train LLMs or chatbots on cinematic language and metadata

Build or enrich movie recommendation engines

Run cross-lingual or multi-region film analytics

Benchmark genre popularity across time periods

Power academic studies or entertainment dashboards

Feed into knowledge graphs, search engines, or NLP pipelines
e
imdb.com Traffic Analytics Data
analytics.explodingtopics.com
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). imdb.com Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/imdb.com
Explore at:
Dataset updated
Jun 1, 2025
Variables measured
Global Rank, Monthly Visits, Authority Score, US Country Rank, Online Services Category Rank
Description
Traffic analytics, rankings, and competitive metrics for imdb.com as of June 2025
titles and ratings from IMDB
kaggle.com
zip
Updated Jul 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Igor Costa da Silva Estevao de Azevedo (2021). titles and ratings from IMDB [Dataset]. https://www.kaggle.com/igoraazevedo/datasets-from-imdb
Explore at:
zip(4382756 bytes)Available download formats
Dataset updated
Jul 23, 2021
Authors
Igor Costa da Silva Estevao de Azevedo
Description
Dataset

This dataset was created by Igor Costa da Silva Estevao de Azevedo

Contents

It contains the following files:
A
‘IMDB Movies Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘IMDB Movies Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-imdb-movies-dataset-f301/9b433bd2/?iid=018-445&v=presentation
Explore at:
Dataset updated
Nov 13, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘IMDB Movies Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harshitshankhdhar/imdb-dataset-of-top-1000-movies-and-tv-shows on 13 November 2021.

--- Dataset description provided by original source is as follows ---

Context

IMDB Dataset of top 1000 movies and tv shows. You can find the EDA Process on - https://www.kaggle.com/harshitshankhdhar/eda-on-imdb-movies-dataset

Please consider UPVOTE if you found it useful.

Content

Data:- - Poster_Link - Link of the poster that imdb using - Series_Title = Name of the movie - Released_Year - Year at which that movie released - Certificate - Certificate earned by that movie - Runtime - Total runtime of the movie - Genre - Genre of the movie - IMDB_Rating - Rating of the movie at IMDB site - Overview - mini story/ summary - Meta_score - Score earned by the movie - Director - Name of the Director - Star1,Star2,Star3,Star4 - Name of the Stars - No_of_votes - Total number of votes - Gross - Money earned by that movie

Inspiration

Analysis of the gross of a movie vs directors.

Analysis of the gross of a movie vs different - different stars.

Analysis of the No_of_votes of a movie vs directors.

Analysis of the No_of_votes of a movie vs different - different stars.

Which actor prefer which Genre more?

Which combination of actors are getting good IMDB_Rating maximum time?

Which combination of actors are getting good gross?

--- Original source retains full ownership of the source dataset ---
M
Movie Rating Sites Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Movie Rating Sites Report [Dataset]. https://www.marketreportanalytics.com/reports/movie-rating-sites-75773
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Apr 10, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global movie rating sites market is a dynamic and rapidly evolving sector, driven by the increasing consumption of online streaming services and the growing reliance on user reviews and professional critiques to inform viewing choices. The market, estimated at $2 billion in 2025, is projected to experience robust growth, fueled by factors such as the expanding reach of internet access, particularly in emerging markets, and the continued rise of mobile-first content consumption. Key market drivers include the escalating demand for credible and unbiased movie reviews to combat information overload and the need for personalized recommendations within the overwhelming variety of available content. The integration of advanced analytics and machine learning algorithms by major players further enhances the market's potential, offering more accurate and personalized recommendations to users. Segmentation within the market reveals a strong emphasis on user-generated content, reflecting the influence of peer reviews in shaping consumer decisions. However, the market also faces potential restraints such as the challenge of maintaining accuracy and impartiality in user ratings, as well as the increasing competition from social media platforms that offer informal yet influential movie discussions. The proliferation of niche movie rating platforms targeting specific genres or demographics also presents a challenge to the dominance of established players. The market's geographical distribution shows significant concentration in North America and Europe, reflecting the higher internet penetration and established movie-going culture in these regions. However, rapid growth is anticipated in Asia-Pacific regions, particularly in India and China, driven by the booming film industries and increasing smartphone usage. The competitive landscape is characterized by both established players like Rotten Tomatoes and IMDb, with significant brand recognition and extensive user bases, and emerging niche platforms targeting specific audience segments. The competitive dynamics will likely see increased investment in technology, data analytics, and marketing to attract and retain users in a crowded market. Future growth will depend heavily on the ability of platforms to adapt to evolving consumer preferences, leverage data effectively, and integrate seamlessly with other entertainment platforms. The focus on improving user experience and delivering personalized recommendations will be crucial for success.
Breaking Bad IMDb ratings, votes and US views
kaggle.com
zip
Updated Aug 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
t2 (2020). Breaking Bad IMDb ratings, votes and US views [Dataset]. https://www.kaggle.com/twintyone/breaking-bad-ratings
Explore at:
zip(1362 bytes)Available download formats
Dataset updated
Aug 26, 2020
Authors
t2
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

To visualize numerical data episode-wise and comparative analysis with other famous TV-shows.

Content

# of season, # of episode, title, year, and other numerical data such as IMDb ratings, IMDb votes, US views

Acknowledgements

Data collected from here https://www.ratingraph.com/tv-shows/breaking-bad-ratings-26165/ https://www.wikiwand.com/en/List_of_Breaking_Bad_episodes

Inspiration

Saw some cool visualizations in reddit few days back but couldn't find anymore. :(
h
Data from: imdb
huggingface.co
Updated Aug 3, 2003
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford NLP (2003). imdb [Dataset]. https://huggingface.co/datasets/stanfordnlp/imdb
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 3, 2003
Dataset authored and provided by
Stanford NLP
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Dataset Card for "imdb"

Dataset Summary

Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

Supported Tasks and Leaderboards

More Information Needed

Languages

More Information Needed

Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/imdb.
h
Data from: imdb
huggingface.co
Updated May 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
scikit-learn (2025). imdb [Dataset]. https://huggingface.co/datasets/scikit-learn/imdb
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 10, 2025
Dataset authored and provided by
scikit-learn
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
This is the sentiment analysis dataset based on IMDB reviews initially released by Stanford University. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided. See the README file contained in the release for more… See the full description on the dataset page: https://huggingface.co/datasets/scikit-learn/imdb.
M
Movie Rating Sites Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Movie Rating Sites Report [Dataset]. https://www.marketreportanalytics.com/reports/movie-rating-sites-75765
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Apr 10, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global movie rating sites market is experiencing robust growth, driven by the increasing consumption of online streaming services and the rising demand for credible film reviews before purchasing tickets or subscribing. The market's expansion is fueled by several factors, including the proliferation of smartphones and internet access, making it easier for users to access rating platforms. Furthermore, the integration of social media features on many platforms fosters engagement and user-generated content, creating a dynamic and interactive ecosystem. The market is segmented by application (movie promotion, movie research, audience choice, and others) and by rating type (user-based, professional-based, and others). While precise market sizing data is unavailable, given the significant presence of established players like Rotten Tomatoes and IMDb, and considering the considerable global viewership of movies, we can estimate the 2025 market size to be approximately $2 billion. This estimation accounts for advertising revenue, premium subscriptions (where applicable), and potential data licensing to film studios and distributors. The projected CAGR suggests continued substantial growth throughout the forecast period (2025-2033), likely driven by technological advancements and the ever-growing global movie-watching audience. However, potential restraints include the risk of biased reviews and the increasing competition from new platforms and emerging technologies like AI-powered recommendation systems. The North American market currently holds a significant share due to the established presence of major players and a large movie-going audience. However, rapid growth is anticipated in the Asia-Pacific region, particularly in countries like India and China, fueled by the expansion of streaming platforms and increasing internet penetration. Europe, with its diverse film culture and established digital infrastructure, also represents a substantial market segment. Competitive pressures are intensifying, with existing players continually innovating to enhance user experiences, introduce new features, and attract and retain users in a crowded market. The market's future trajectory will be shaped by the strategic moves of key players, technological disruptions, and evolving consumer preferences regarding how they discover and choose movies to watch. Strategic partnerships and acquisitions could also play a significant role in shaping the market landscape in the coming years.

TMDB Top 260 Movies with IMDb Ratings

kaggle.com

Updated Jun 14, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

102203218_Digvijay_Singh (2025). TMDB Top 260 Movies with IMDb Ratings [Dataset]. https://www.kaggle.com/datasets/diggusingh/top-260-movies-on-tmdb-with-imdb/code

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 14, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

102203218_Digvijay_Singh

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context Movies are a powerful lens into culture, emotion, and storytelling. This dataset brings together the top 260 highest-rated movies with enriched metadata from two authoritative sources: TMDb (The Movie Database) and OMDb (Open Movie Database).

It is ideal for researchers, data scientists, and developers working on: -Movie recommendation systems -NLP with plot summaries -Data visualization of film trends -Sentiment and genre analysis

Overview Category Detail - Records 260 top-rated movies based on TMDb user ratings - Timeframe Includes titles from classic to contemporary cinema - Metadata Title, Release Year, IMDb Rating, Genre(s), Runtime, Director, Plot - Sources TMDb API, OMDb API (retrieved via custom Python scripts) - Format Single CSV file: tmdb_top260_with_imdb.csv

🧾 Column Descriptions

Column Name	Description	Data Type
Title	Official title of the movie	String
Year	Year the movie was released	Integer
IMDb Rating	IMDb user rating (scale of 1–10)	Float
Runtime	Duration of the movie (e.g., "142 min")	String
Genre	Comma-separated list of genres	String
Director	Name(s) of the movie’s director(s)	String
Actors	Leading cast members listed on IMDb	String
Plot	Short summary or synopsis of the storyline	String

Files - tmdb_top260_with_imdb.csv Each row represents one film

Key Features - Multi-source Integration: Combines crowd-sourced user ratings (TMDb) with metadata-rich records (OMDb). - Diverse Genre Coverage: Drama, thriller, animation, sci-fi, and more. - Chronological Range: Spans across decades from vintage masterpieces to modern blockbusters. - Plot Summaries Included: Excellent for NLP projects like topic modeling, keyword extraction, or classification. - Standardized Format: Clean, ready-to-use data for ML, visualization, or statistical analysis.

Use Cases This dataset is well-suited for: - Recommendation Systems: Build hybrid or content-based models using genre, director, and plot. - Natural Language Processing: Use plot summaries for sentiment analysis or thematic clustering. - Trend Analysis: Explore how movie length, genres, or ratings evolved over time. - Director Impact: Analyze how specific filmmakers influence ratings or genre styles.

Licensing This dataset is released under the Creative Commons Zero (CC0) license. It is free to use for personal, academic, or commercial purposes with no attribution required.

IMDB SQL dataset project
kaggle.com
Updated Jun 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mithilesh Kale (2024). IMDB SQL dataset project [Dataset]. https://www.kaggle.com/datasets/mithilesh9/sql-dataset-project
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 11, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mithilesh Kale
Description
This dataset provides valuable insights derived from an analysis of IMDB movie data, specifically tailored to inform strategic decision-making for film production companies. It offers a comprehensive overview of trends in movie genres, release timing, ratings, top-performing directors and actors, and potential production partners.

The analysis includes:

Monthly Production Trends: Identifies peak production months and average annual output.

Genre Popularity: Analyzes genre popularity based on quantity and average duration.

Rating Distribution: Reveals common rating ranges and target ratings for success.

High-Rated Production Houses: Highlights production houses associated with top-rated films.

Top Directors: Lists directors with a track record of successful films.

Popular Actors: Identifies popular actors with high average ratings and vote counts.

Potential Global Partners: Suggests potential global partners based on audience reach.
250 best ever films analysis
zenodo.org
csv
Updated Nov 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jorge Martín Rota; Jorge Moreno Fuentes; Jorge Martín Rota; Jorge Moreno Fuentes (2024). 250 best ever films analysis [Dataset]. http://doi.org/10.5281/zenodo.14062156
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14062156
Dataset updated
Nov 10, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jorge Martín Rota; Jorge Moreno Fuentes; Jorge Martín Rota; Jorge Moreno Fuentes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset is a curated collection of IMDb's top 250 movies, capturing the unique qualities that make each film a standout. For each movie, you’ll find details like the title, IMDb rating, genre, release date, director, writers, and actors. This gives a snapshot of what defines each film. There’s also a link to the IMDb page for each movie to make it easy to dive deeper into any title that catches your interest. This dataset is perfect for anyone looking to analyze film trends, explore popular genres, or just get a better understanding of what makes these films so iconic.
Movie Metadata and Reviews
kaggle.com
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valentina Acevedo Lopez (2024). Movie Metadata and Reviews [Dataset]. https://www.kaggle.com/datasets/valentinaacevedo/movie-metadata-and-reviews
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 6, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Valentina Acevedo Lopez
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Overview

This dataset contains detailed metadata and user reviews for movies. It includes information such as movie titles, genres, user scores, certifications, metascores, directors, top cast members, plot summaries, and user reviews. The data was scraped from IMDb and may contain some inconsistencies and missing values, making it a great resource for practicing data cleaning and preprocessing.

Columns Description

Name: The title of the movie.

Year: The release year of the movie.

Genres: The genres associated with the movie (e.g., Action, Adventure, Sci-Fi).

Users-Score: Average user score.

Certification: Movie certification rating (e.g., PG-13, R).

Metascore: Metacritic score.

Director: The director of the movie.

Top-Cast: Main cast members.

Plot-Summary: A brief summary of the movie's plot.

Users-Reviews: User-submitted reviews.

Data Cleaning and Preprocessing

The dataset may include the following issues:

Missing Values: Some columns have missing values.

Inconsistent Delimiters: Certain rows may have inconsistent delimiters.

Duplicate Entries: There might be duplicate records.

Formatting Issues: Some columns may contain improperly formatted data.

Steps for Data Cleaning:

Identify and handle missing values.

Correct delimiter issues using text processing techniques.

Remove duplicate records to ensure data integrity.

Standardize formats for categorical variables.

Potential Use Cases

Movie Recommendation Systems: Use the metadata to build recommendation algorithms.

Sentiment Analysis: Analyze user reviews to gauge audience sentiment.

Trend Analysis: Explore trends in movie genres, ratings, and user reviews.

License

This dataset is shared under the MIT License. If you use this data, please attribute IMDb as the source.
Film Circulation dataset
zenodo.org
data.niaid.nih.gov
bin, csv, png
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Skadi Loist; Skadi Loist; Evgenia (Zhenya) Samoilova; Evgenia (Zhenya) Samoilova (2024). Film Circulation dataset [Dataset]. http://doi.org/10.5281/zenodo.7887672
Explore at:
csv, png, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7887672
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Skadi Loist; Skadi Loist; Evgenia (Zhenya) Samoilova; Evgenia (Zhenya) Samoilova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Complete dataset of “Film Circulation on the International Film Festival Network and the Impact on Global Film Culture”

A peer-reviewed data paper for this dataset is in review to be published in NECSUS_European Journal of Media Studies - an open access journal aiming at enhancing data transparency and reusability, and will be available from https://necsus-ejms.org/ and https://mediarep.org

Please cite this when using the dataset.

Detailed description of the dataset:

1 Film Dataset: Festival Programs

The Film Dataset consists a data scheme image file, a codebook and two dataset tables in csv format.

The codebook (csv file “1_codebook_film-dataset_festival-program”) offers a detailed description of all variables within the Film Dataset. Along with the definition of variables it lists explanations for the units of measurement, data sources, coding and information on missing data.

The csv file “1_film-dataset_festival-program_long” comprises a dataset of all films and the festivals, festival sections, and the year of the festival edition that they were sampled from. The dataset is structured in the long format, i.e. the same film can appear in several rows when it appeared in more than one sample festival. However, films are identifiable via their unique ID.

The csv file “1_film-dataset_festival-program_wide” consists of the dataset listing only unique films (n=9,348). The dataset is in the wide format, i.e. each row corresponds to a unique film, identifiable via its unique ID. For easy analysis, and since the overlap is only six percent, in this dataset the variable sample festival (fest) corresponds to the first sample festival where the film appeared. For instance, if a film was first shown at Berlinale (in February) and then at Frameline (in June of the same year), the sample festival will list “Berlinale”. This file includes information on unique and IMDb IDs, the film title, production year, length, categorization in length, production countries, regional attribution, director names, genre attribution, the festival, festival section and festival edition the film was sampled from, and information whether there is festival run information available through the IMDb data.

2 Survey Dataset

The Survey Dataset consists of a data scheme image file, a codebook and two dataset tables in csv format.

The codebook “2_codebook_survey-dataset” includes coding information for both survey datasets. It lists the definition of the variables or survey questions (corresponding to Samoilova/Loist 2019), units of measurement, data source, variable type, range and coding, and information on missing data.

The csv file “2_survey-dataset_long-festivals_shared-consent” consists of a subset (n=161) of the original survey dataset (n=454), where respondents provided festival run data for films (n=206) and gave consent to share their data for research purposes. This dataset consists of the festival data in a long format, so that each row corresponds to the festival appearance of a film.

The csv file “2_survey-dataset_wide-no-festivals_shared-consent” consists of a subset (n=372) of the original dataset (n=454) of survey responses corresponding to sample films. It includes data only for those films for which respondents provided consent to share their data for research purposes. This dataset is shown in wide format of the survey data, i.e. information for each response corresponding to a film is listed in one row. This includes data on film IDs, film title, survey questions regarding completeness and availability of provided information, information on number of festival screenings, screening fees, budgets, marketing costs, market screenings, and distribution. As the file name suggests, no data on festival screenings is included in the wide format dataset.

3 IMDb & Scripts

The IMDb dataset consists of a data scheme image file, one codebook and eight datasets, all in csv format. It also includes the R scripts that we used for scraping and matching.

The codebook “3_codebook_imdb-dataset” includes information for all IMDb datasets. This includes ID information and their data source, coding and value ranges, and information on missing data.

The csv file “3_imdb-dataset_aka-titles_long” contains film title data in different languages scraped from IMDb in a long format, i.e. each row corresponds to a title in a given language.

The csv file “3_imdb-dataset_awards_long” contains film award data in a long format, i.e. each row corresponds to an award of a given film.

The csv file “3_imdb-dataset_companies_long” contains data on production and distribution companies of films. The dataset is in a long format, so that each row corresponds to a particular company of a particular film.

The csv file “3_imdb-dataset_crew_long” contains data on names and roles of crew members in a long format, i.e. each row corresponds to each crew member. The file also contains binary gender assigned to directors based on their first names using the GenderizeR application.

The csv file “3_imdb-dataset_festival-runs_long” contains festival run data scraped from IMDb in a long format, i.e. each row corresponds to the festival appearance of a given film. The dataset does not include each film screening, but the first screening of a film at a festival within a given year. The data includes festival runs up to 2019.

The csv file “3_imdb-dataset_general-info_wide” contains general information about films such as genre as defined by IMDb, languages in which a film was shown, ratings, and budget. The dataset is in wide format, so that each row corresponds to a unique film.

The csv file “3_imdb-dataset_release-info_long” contains data about non-festival release (e.g., theatrical, digital, tv, dvd/blueray). The dataset is in a long format, so that each row corresponds to a particular release of a particular film.

The csv file “3_imdb-dataset_websites_long” contains data on available websites (official websites, miscellaneous, photos, video clips). The dataset is in a long format, so that each row corresponds to a website of a particular film.

The dataset includes 8 text files containing the script for webscraping. They were written using the R-3.6.3 version for Windows.

The R script “r_1_unite_data” demonstrates the structure of the dataset, that we use in the following steps to identify, scrape, and match the film data.

The R script “r_2_scrape_matches” reads in the dataset with the film characteristics described in the “r_1_unite_data” and uses various R packages to create a search URL for each film from the core dataset on the IMDb website. The script attempts to match each film from the core dataset to IMDb records by first conducting an advanced search based on the movie title and year, and then potentially using an alternative title and a basic search if no matches are found in the advanced search. The script scrapes the title, release year, directors, running time, genre, and IMDb film URL from the first page of the suggested records from the IMDb website. The script then defines a loop that matches (including matching scores) each film in the core dataset with suggested films on the IMDb search page. Matching was done using data on directors, production year (+/- one year), and title, a fuzzy matching approach with two methods: “cosine” and “osa.” where the cosine similarity is used to match titles with a high degree of similarity, and the OSA algorithm is used to match titles that may have typos or minor variations.

The script “r_3_matching” creates a dataset with the matches for a manual check. Each pair of films (original film from the core dataset and the suggested match from the IMDb website was categorized in the following five categories: a) 100% match: perfect match on title, year, and director; b) likely good match; c) maybe match; d) unlikely match; and e) no match). The script also checks for possible doubles in the dataset and identifies them for a manual check.

The script “r_4_scraping_functions” creates a function for scraping the data from the identified matches (based on the scripts described above and manually checked). These functions are used for scraping the data in the next script.

The script “r_5a_extracting_info_sample” uses the function defined in the “r_4_scraping_functions”, in order to scrape the IMDb data for the identified matches. This script does that for the first 100 films, to check, if everything works. Scraping for the entire dataset took a few hours. Therefore, a test with a subsample of 100 films is advisable.

The script “r_5b_extracting_info_all” extracts the data for the entire dataset of the identified matches.

The script “r_5c_extracting_info_skipped” checks the films with missing data (where data was not scraped) and tried to extract data one more time to make sure that the errors were not caused by disruptions in the internet connection or other technical issues.

The script “r_check_logs” is used for troubleshooting and tracking the progress of all of the R scripts used. It gives information on the amount of missing values and errors.

4 Festival Library Dataset

The Festival Library Dataset consists of a data scheme image file, one codebook and one dataset, all in csv format.

The codebook (csv file “4_codebook_festival-library_dataset”) offers a detailed description of all variables within the Library Dataset. It lists the definition of variables, such as location and festival name, and festival categories,
Imdb genre wise Top 50 movies
kaggle.com
Updated Jul 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhimanyu Kundu (2022). Imdb genre wise Top 50 movies [Dataset]. http://doi.org/10.34740/kaggle/dsv/3904815
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/3904815
Dataset updated
Jul 5, 2022
Dataset provided by
Kaggle
Authors
Abhimanyu Kundu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This contains different csv files related top 50 movies according to different genres. Having fields like the duration of the movie , the Director ,Rating of the movie ,How many people voted for the rating ,the amount that the movie made all around the world and the description of the movie can be used to analyze why certain highly rated movies attracted many people
IMDb Top 5000 TV Shows
kaggle.com
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tiago Adria Nunes (2025). IMDb Top 5000 TV Shows [Dataset]. https://www.kaggle.com/datasets/tiagoadrianunes/imdb-top-5000-tv-shows/versions/61
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 9, 2025
Dataset provided by
Kaggle
Authors
Tiago Adria Nunes
Description
This dataset brings together the Top 5000 highest-rated TV shows according to IMDb users. It was curated to enable analysis of rating patterns, popularity trends, genres, and other relevant attributes in the TV show landscape.

Data Source: https://developer.imdb.com/non-commercial-datasets/

Processing and Code Repository: https://github.com/TiagoAdriaNunes/imdb_top_5000_tv_shows/blob/main/imdb_tv_shows_analysis.R

Purpose: Inspired by the structure of the "IMDB Top 5000 Movies" dataset, this version focuses exclusively on TV series, offering a solid base for data analysis and visualization projects in the entertainment domain.

Pipeline: https://github.com/TiagoAdriaNunes/imdb_top_5000_tv_shows/blob/main/.github/workflows/imdb-tv-shows-pipeline.yml

Shiny App for Data Visualization: https://tiagoadrianunes.shinyapps.io/IMDB_TOP_5000_TV_SHOWS/

Kaggle Notebook using this dataset: https://www.kaggle.com/code/tiagoadrianunes/imdb-top-5000-tv-shows-notebook

Information courtesy of IMDb (https://www.imdb.com). Used with permission.

See also the Movies version: https://www.kaggle.com/datasets/tiagoadrianunes/imdb-top-5000-movies
IMDb India Movies
kaggle.com
Updated Jun 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adrian McMahon (2021). IMDb India Movies [Dataset]. https://www.kaggle.com/adrianmcmahon/imdb-india-movies/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 18, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Adrian McMahon
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
India
Description
Task Details

Every dataset has a story and this set is pulled from IMDb.com of all the Indian movies on the platform. Clean this data by removing missing values or adding average values this process will help to manipulate the data to help with your EDA.

Analyze data and provide some trends.

Year with best rating

Does length of movie have any impact with the rating?

Top 10 movies according to rating per year and overall.

Number of popular movies released each year.

Counting the number of votes which movies preformed better in rating per year and overall.

Any other trends or future prediction you may have

Which director directed the most movies

Which actor starred in the movie

Any other trends you can find

Thank you for viewing my dataset, looking forward to seeing some codes.
IMDb Film & Series Data Analysis
zenodo.org
csv
Updated Apr 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fernando José Cofiño Gavito; Fernando José Cofiño Gavito (2024). IMDb Film & Series Data Analysis [Dataset]. http://doi.org/10.5281/zenodo.10982158
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10982158
Dataset updated
Apr 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Fernando José Cofiño Gavito; Fernando José Cofiño Gavito
License
Attribution-NonCommercial-ShareAlike 1.0 (CC BY-NC-SA 1.0)https://creativecommons.org/licenses/by-nc-sa/1.0/
License information was derived automatically
Time period covered
Apr 2024
Description
El conjunto de datos para este proyecto contendrá los siguientes descriptivos sobre películas y series de IMDb, lo que permitirá analizar las distintas tendencias en la industria: Title, Year, Genres, Directors, Actors, Rating, Reviews, Duration, Type, Episode, Season, Budget, Revenue. Estos campos creo que son lo suficientemente descriptivos como para permitirnos un análisis en profundidad de las películas, series, actores, directores, etc. a lo largo del tiempo.

· Title: El título de la película o serie.

· Year: El año en que se lanzó la película o serie.

· Genres: El género de la película o serie (por ejemplo, drama, comedia, acción, etc.).

· Directors: El director de la película o serie.

· Actors: Los actores principales de la película o serie.

· Rating: La calificación de la película o serie en IMDb.

· Reviews: El número de reseñas de usuarios para la película o serie.

· Duration: La duración de la película o serie en minutos.

· Type: Si es una película o serie.

· Episode: El número de episodios si es una serie.

· Season: El número de temporadas si es una serie.

· Budget: El presupuesto de la película o serie.

· Revenue: La recaudación de la película o serie.

Los datos del conjunto abarcan un periodo de tiempo que se extiende desde el lanzamiento de IMDb en octubre de 1990 hasta el presente mes de abril de 2024.

Facebook

Twitter

Click to copy link

Link copied

Cite

Crawl Feeds (2025). IMDB movie details dataset [Dataset]. https://crawlfeeds.com/datasets/imdb-movie-details-dataset

IMDB movie details dataset

IMDB movie details dataset from imdb.com

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

zip, csvAvailable download formats

Dataset updated

Jul 5, 2025

Dataset authored and provided by

Crawl Feeds

License

https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

Description

The IMDB Movie Details Dataset is a comprehensive collection of movie datasets that offers a treasure trove of information about movies, TV shows, and streaming content listed on IMDB. This dataset includes detailed data such as titles, release years, genres, cast, crew, ratings, and more, making it a go-to resource for film and entertainment enthusiasts. Ideal for data analysis, IMDB movie dataset applications span machine learning projects, predictive modeling, and insights into industry trends.

Researchers can explore patterns in movie ratings and genre popularity, while developers can use the dataset to build recommendation systems or applications. Movie buffs can dive deep into historical and contemporary trends in the world of cinema. This dataset not only supports academic and professional pursuits but also opens doors for creative projects in storytelling, content creation, and audience engagement. Whether you’re a developer, researcher, or film enthusiast, the IMDB movie dataset is a powerful tool for uncovering trends and gaining deeper insights into the evolving entertainment landscape.

Clear search

Close search

Google apps

Main menu

IMDB movie details dataset

IMDB Movie Dataset

Description:

Features:

IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage)

What’s Included:

Use Cases:

imdb.com Traffic Analytics Data

titles and ratings from IMDB

Dataset

Contents

‘IMDB Movies Dataset’ analyzed by Analyst-2

Context

Content

Inspiration

Movie Rating Sites Report

Breaking Bad IMDb ratings, votes and US views

Context

Content

Acknowledgements

Inspiration

Data from: imdb

Data from: imdb

Movie Rating Sites Report

TMDB Top 260 Movies with IMDb Ratings

🧾 Column Descriptions

IMDB SQL dataset project

250 best ever films analysis

Movie Metadata and Reviews

Overview

Columns Description

Data Cleaning and Preprocessing

Steps for Data Cleaning:

Potential Use Cases

License

Film Circulation dataset

Imdb genre wise Top 50 movies

IMDb Top 5000 TV Shows

IMDb India Movies

Task Details

Every dataset has a story and this set is pulled from IMDb.com of all the Indian movies on the platform. Clean this data by removing missing values or adding average values this process will help to manipulate the data to help with your EDA.

Analyze data and provide some trends.

IMDb Film & Series Data Analysis

IMDB movie details dataset

IMDB movie details dataset from imdb.com