Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
What can we say about the success of a movie before it is released? Are there certain companies (Pixar?) that have found a consistent formula? Given that major films costing over $100 million to produce can still flop, this question is more important than ever to the industry. Film aficionados might have different interests. Can we predict which films will be highly rated, whether or not they are a commercial success?
This is a great place to start digging in to those questions, with data on the plot, cast, crew, budget, and revenues of several thousand films.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Lĩnh Trần476
Released under Apache 2.0
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by Yueming
Released under Database: Open Database, Contents: Database Contents
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Muhammad_Nauman_k
Released under CC0: Public Domain
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This file contains detailed credit information on cast and crew members for more than 5,000 movies available on The Movie Database (TMDb). The data covers the names and roles of actors, directors, writers and other key crew members in each movie. It provides a comprehensive resource for film industry analysis and cinema history studies.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Welcome to the TMDB 5000 Movie Dataset with Ratings, a comprehensive collection that merges the original TMDb 5000 Movie Dataset with additional user ratings. This dataset offers an extensive exploration of the cinematic world, providing valuable insights for data enthusiasts, researchers, and machine learning practitioners.
tmdb_movie_dataset:
tmdb_movie_credits:
tmdb_movie_ratings:
This dataset is a curated compilation, merging the original TMDb 5000 Movie Dataset with additional user ratings to provide a comprehensive resource for the data science and machine learning community. We express our gratitude to the TMDb community for their valuable contributions.
We welcome feedback and contributions to enhance the dataset. Connect, collaborate, and contribute to make this resource even more valuable for the community.
Explore the cinematic universe through data with the TMDB 5000 Movie Dataset with Ratings!
Facebook
TwitterThis dataset was created by sumit kr
Facebook
TwitterThis dataset was created by Nitin Kharade
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by MD RAIHAN ALI
Released under CC0: Public Domain
Facebook
TwitterThis is a simplified version of TMDB 5000 Movie Dataset (https://www.kaggle.com/tmdb/tmdb-movie-metadata). See that dataset for more info.
id, status, popularitygenres, keywords, production_companies, production_countries, spoken_languages (replaced json-like structures with comma separated list of name attributes)Photo by Felix Mooneeram on Unsplash
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
"Introducing the Ultimate Movie Database: Delve into the magic of cinema with our meticulously curated dataset, sourced directly from The Movie Database (TMDb) website. This comprehensive collection is a testament to the art of storytelling, featuring a vast array of films with rating, original language, Popularity etc. Our inspiration behind this dataset was to create a valuable resource for film enthusiasts, researchers, and data scientists, fostering a deeper understanding of movie trends, audience preferences, and industry evolution. Whether you're analyzing box office hits, exploring directorial styles, or uncovering hidden gems, this dataset opens the door to a world of cinematic exploration. Lights, camera, data – let the analysis begin!"
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset features a comprehensive compilation of over 5,000 top-rated movies scraped from The Movie Database (TMDb) using their official API. TMDb is one of the largest open movie databases on the internet, trusted by millions of users and developers worldwide. This dataset focuses specifically on their top-rated films, ranked by a global audience based on average votes and vote counts.
The dataset was generated programmatically using Python and includes 500 full pages of TMDb’s /movie/top_rated endpoint. Each page returns 20 movies, resulting in 10,000 entries, from which duplicates or low-vote entries can be filtered based on project needs. 🔍 What’s Inside?
For each movie entry, the dataset includes:
🎬 Title — The original title of the film
🗓️ Release Date — The official release date
🌐 Original Language — The ISO language code (e.g., en, fr, ja)
📄 Overview — A short synopsis of the film's plot
⭐ Vote Average — The average rating out of 10
🗳️ Vote Count — Number of votes cast by TMDb users
📈 Popularity — TMDb’s internal popularity score
🆔 Movie ID — Unique TMDb identifier for the movie
💡 Why Use This Dataset?
This dataset is ideal for a wide range of projects in:
📊 Data analysis (e.g., trends in top-rated movies over time)
🧠 Machine learning & deep learning (e.g., recommendation systems)
💬 Natural Language Processing (e.g., sentiment analysis on movie overviews)
📈 Visualization (e.g., top genres, ratings by year/language)
🎞️ Film industry insights (e.g., how vote count influences average rating)
With a blend of metadata and user interaction data, it's a perfect dataset for anyone looking to combine storytelling with statistics. ✅ Highlights:
Extracted using the TMDb API with robust error handling and pagination
Clean format with no missing columns
Ready for immediate use in Jupyter Notebooks, Kaggle kernels, or data pipelines
Can be joined with external genre, actor, or production data via id
Whether you're a film buff, a data scientist looking to build a movie recommender, or a developer training an NLP model — this dataset is your launchpad into the world of data-driven storytelling with cinema.
Facebook
TwitterThis dataset was created by SOURAV SAHOO
Facebook
TwitterThe TMDB 5000 Movie Dataset is a database of information on over 5000 films including various features such as budget, revenue, cast, directors, production companies, and genre. It is a popular dataset for data analysis and machine learning projects, particularly for natural language processing and recommendation systems. The data is collected from The Movie Database (TMDb), a user-edited database of information on films, TV shows, and other media. The dataset includes information on both popular and lesser-known films and provides a comprehensive overview of the film industry. The data is available for public use, making it a great resource for both researchers and students to practice their data analysis and machine learning skills.
Facebook
TwitterThis dataset was created by yiyiyi
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Şükrü Yusuf Kaya
Released under MIT
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Chaitanya Sood
Released under Apache 2.0
Facebook
TwitterThis dataset was created by Marcelo Barbosa de Morais
Facebook
TwitterBackground What can we say about the success of a movie before it is released? Are there certain companies (Pixar?) that have found a consistent formula? Given that major films costing over $100 million to produce can still flop, this question is more important than ever to the industry. Film aficionados might have different interests. Can we predict which films will be highly rated, whether or not they are a commercial success?
This is a great place to start digging in to those questions, with data on the plot, cast, crew, budget, and revenues of several thousand films.
Data Source Transfer Details Several of the new columns contain json. You can save a bit of time by porting the load data functions from this kernel.
Even in simple fields like runtime may not be consistent across versions. For example, previous dataset shows the duration for Avatar's extended cut while TMDB shows the time for the original version.
There's now a separate file containing the full credits for both the cast and crew.
All fields are filled out by users so don't expect them to agree on keywords, genres, ratings, or the like. Your existing kernels will continue to render normally until they are re-run. If you are curious about how this dataset was prepared, the code to access TMDb's API is posted here.
New columns: homepage id original_title overview popularity production_companies production_countries release_date spoken_languages status tagline vote_average Lost columns: actor1facebook_likes actor2facebook_likes actor3facebook_likes aspect_ratio casttotalfacebook_likes color content_rating directorfacebooklikesfacenumberinposter moviefacebooklikes movieimdblink numcriticfor_reviews numuserfor_reviews
Facebook
TwitterThis dataset was created by Nazima
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
What can we say about the success of a movie before it is released? Are there certain companies (Pixar?) that have found a consistent formula? Given that major films costing over $100 million to produce can still flop, this question is more important than ever to the industry. Film aficionados might have different interests. Can we predict which films will be highly rated, whether or not they are a commercial success?
This is a great place to start digging in to those questions, with data on the plot, cast, crew, budget, and revenues of several thousand films.