22 datasets found
  1. IMDB movie details dataset

    • crawlfeeds.com
    csv, zip
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). IMDB movie details dataset [Dataset]. https://crawlfeeds.com/datasets/imdb-movie-details-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description
    The IMDB Movie Details Dataset is a comprehensive collection of movie datasets that offers a treasure trove of information about movies, TV shows, and streaming content listed on IMDB. This dataset includes detailed data such as titles, release years, genres, cast, crew, ratings, and more, making it a go-to resource for film and entertainment enthusiasts. Ideal for data analysis, IMDB movie dataset applications span machine learning projects, predictive modeling, and insights into industry trends.
    Researchers can explore patterns in movie ratings and genre popularity, while developers can use the dataset to build recommendation systems or applications. Movie buffs can dive deep into historical and contemporary trends in the world of cinema. This dataset not only supports academic and professional pursuits but also opens doors for creative projects in storytelling, content creation, and audience engagement. Whether you’re a developer, researcher, or film enthusiast, the IMDB movie dataset is a powerful tool for uncovering trends and gaining deeper insights into the evolving entertainment landscape.
  2. IMDB Movie Dataset

    • kaggle.com
    Updated Oct 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yusuf Delikkaya (2024). IMDB Movie Dataset [Dataset]. https://www.kaggle.com/datasets/yusufdelikkaya/imdb-movie-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 30, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yusuf Delikkaya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description:

    • The dataset comprises anonymized data on movies available on IMDb, capturing various aspects such as genre, rating, and revenue.
    • This dataset can be utilized for analyzing movie trends, audience preferences, and the impact of different attributes like genre and director on movie success.
    • It can aid in understanding the factors contributing to high ratings and box office revenue, as well as providing insights into the popularity of genres over time.
    • This dataset can be utilized for analyzing movie success factors, audience preferences, and genre trends.
    • It can help in identifying the relationship between movie features (e.g., genre, director) and ratings or revenue, examining the popularity of actors and directors, and understanding critical reception through Metascore.

    Features:

    Column NameDescription
    RankThe ranking of the movie based on popularity or ratings.
    TitleThe title of the movie.
    GenreThe genre(s) of the movie (e.g., Action, Adventure, Sci-Fi).
    DescriptionA brief description or synopsis of the movie.
    DirectorThe director of the movie.
    ActorsThe main cast or leading actors in the movie.
    YearThe release year of the movie.
    Runtime (Minutes)The runtime of the movie in minutes.
    RatingThe IMDb user rating of the movie on a scale from 1 to 10.
    VotesThe number of user votes for the movie on IMDb.
    Revenue (Millions)The box office revenue of the movie in millions of dollars.
    MetascoreThe Metascore of the movie, representing the aggregated critic reviews score on a scale of 1 to 100.
  3. IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage)

    • crawlfeeds.com
    csv, zip
    Updated Aug 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage) [Dataset]. https://crawlfeeds.com/datasets/imdb-movies-metadata-dataset-4-5m-records-global-coverage
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Aug 10, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Unlock one of the most comprehensive movie datasets available—4.5 million structured IMDb movie records, extracted and enriched for data science, machine learning, and entertainment research.

    This dataset includes a vast collection of global movie metadata, including details on title, release year, genre, country, language, runtime, cast, directors, IMDb ratings, reviews, and synopsis. Whether you're building a recommendation engine, benchmarking trends, or training AI models, this dataset is designed to give you deep and wide access to cinematic data across decades and continents.

    Perfect for use in film analytics, OTT platforms, review sentiment analysis, knowledge graphs, and LLM fine-tuning, the dataset is cleaned, normalized, and exportable in multiple formats.

    What’s Included:

    • Genres: Drama, Comedy, Horror, Action, Sci-Fi, Documentary, and more

    • Delivery: Direct download

    Use Cases:

    • Train LLMs or chatbots on cinematic language and metadata

    • Build or enrich movie recommendation engines

    • Run cross-lingual or multi-region film analytics

    • Benchmark genre popularity across time periods

    • Power academic studies or entertainment dashboards

    • Feed into knowledge graphs, search engines, or NLP pipelines

  4. e

    imdb.com Traffic Analytics Data

    • analytics.explodingtopics.com
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). imdb.com Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/imdb.com
    Explore at:
    Dataset updated
    Jun 1, 2025
    Variables measured
    Global Rank, Monthly Visits, Authority Score, US Country Rank, Online Services Category Rank
    Description

    Traffic analytics, rankings, and competitive metrics for imdb.com as of June 2025

  5. titles and ratings from IMDB

    • kaggle.com
    zip
    Updated Jul 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Igor Costa da Silva Estevao de Azevedo (2021). titles and ratings from IMDB [Dataset]. https://www.kaggle.com/igoraazevedo/datasets-from-imdb
    Explore at:
    zip(4382756 bytes)Available download formats
    Dataset updated
    Jul 23, 2021
    Authors
    Igor Costa da Silva Estevao de Azevedo
    Description

    Dataset

    This dataset was created by Igor Costa da Silva Estevao de Azevedo

    Contents

    It contains the following files:

  6. A

    ‘IMDB Movies Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘IMDB Movies Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-imdb-movies-dataset-f301/9b433bd2/?iid=018-445&v=presentation
    Explore at:
    Dataset updated
    Nov 13, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘IMDB Movies Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harshitshankhdhar/imdb-dataset-of-top-1000-movies-and-tv-shows on 13 November 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    IMDB Dataset of top 1000 movies and tv shows. You can find the EDA Process on - https://www.kaggle.com/harshitshankhdhar/eda-on-imdb-movies-dataset

    Please consider UPVOTE if you found it useful.

    Content

    Data:- - Poster_Link - Link of the poster that imdb using - Series_Title = Name of the movie - Released_Year - Year at which that movie released - Certificate - Certificate earned by that movie - Runtime - Total runtime of the movie - Genre - Genre of the movie - IMDB_Rating - Rating of the movie at IMDB site - Overview - mini story/ summary - Meta_score - Score earned by the movie - Director - Name of the Director - Star1,Star2,Star3,Star4 - Name of the Stars - No_of_votes - Total number of votes - Gross - Money earned by that movie

    Inspiration

    • Analysis of the gross of a movie vs directors.
    • Analysis of the gross of a movie vs different - different stars.
    • Analysis of the No_of_votes of a movie vs directors.
    • Analysis of the No_of_votes of a movie vs different - different stars.
    • Which actor prefer which Genre more?
    • Which combination of actors are getting good IMDB_Rating maximum time?
    • Which combination of actors are getting good gross?

    --- Original source retains full ownership of the source dataset ---

  7. M

    Movie Rating Sites Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Movie Rating Sites Report [Dataset]. https://www.marketreportanalytics.com/reports/movie-rating-sites-75773
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Apr 10, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global movie rating sites market is a dynamic and rapidly evolving sector, driven by the increasing consumption of online streaming services and the growing reliance on user reviews and professional critiques to inform viewing choices. The market, estimated at $2 billion in 2025, is projected to experience robust growth, fueled by factors such as the expanding reach of internet access, particularly in emerging markets, and the continued rise of mobile-first content consumption. Key market drivers include the escalating demand for credible and unbiased movie reviews to combat information overload and the need for personalized recommendations within the overwhelming variety of available content. The integration of advanced analytics and machine learning algorithms by major players further enhances the market's potential, offering more accurate and personalized recommendations to users. Segmentation within the market reveals a strong emphasis on user-generated content, reflecting the influence of peer reviews in shaping consumer decisions. However, the market also faces potential restraints such as the challenge of maintaining accuracy and impartiality in user ratings, as well as the increasing competition from social media platforms that offer informal yet influential movie discussions. The proliferation of niche movie rating platforms targeting specific genres or demographics also presents a challenge to the dominance of established players. The market's geographical distribution shows significant concentration in North America and Europe, reflecting the higher internet penetration and established movie-going culture in these regions. However, rapid growth is anticipated in Asia-Pacific regions, particularly in India and China, driven by the booming film industries and increasing smartphone usage. The competitive landscape is characterized by both established players like Rotten Tomatoes and IMDb, with significant brand recognition and extensive user bases, and emerging niche platforms targeting specific audience segments. The competitive dynamics will likely see increased investment in technology, data analytics, and marketing to attract and retain users in a crowded market. Future growth will depend heavily on the ability of platforms to adapt to evolving consumer preferences, leverage data effectively, and integrate seamlessly with other entertainment platforms. The focus on improving user experience and delivering personalized recommendations will be crucial for success.

  8. Breaking Bad IMDb ratings, votes and US views

    • kaggle.com
    zip
    Updated Aug 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    t2 (2020). Breaking Bad IMDb ratings, votes and US views [Dataset]. https://www.kaggle.com/twintyone/breaking-bad-ratings
    Explore at:
    zip(1362 bytes)Available download formats
    Dataset updated
    Aug 26, 2020
    Authors
    t2
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    To visualize numerical data episode-wise and comparative analysis with other famous TV-shows.

    Content

    # of season, # of episode, title, year, and other numerical data such as IMDb ratings, IMDb votes, US views

    Acknowledgements

    Data collected from here https://www.ratingraph.com/tv-shows/breaking-bad-ratings-26165/ https://www.wikiwand.com/en/List_of_Breaking_Bad_episodes

    Inspiration

    Saw some cool visualizations in reddit few days back but couldn't find anymore. :(

  9. h

    Data from: imdb

    • huggingface.co
    Updated Aug 3, 2003
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford NLP (2003). imdb [Dataset]. https://huggingface.co/datasets/stanfordnlp/imdb
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 3, 2003
    Dataset authored and provided by
    Stanford NLP
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Card for "imdb"

      Dataset Summary
    

    Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

      Supported Tasks and Leaderboards
    

    More Information Needed

      Languages
    

    More Information Needed

      Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/imdb.
    
  10. h

    Data from: imdb

    • huggingface.co
    Updated May 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    scikit-learn (2025). imdb [Dataset]. https://huggingface.co/datasets/scikit-learn/imdb
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 10, 2025
    Dataset authored and provided by
    scikit-learn
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    This is the sentiment analysis dataset based on IMDB reviews initially released by Stanford University. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided. See the README file contained in the release for more… See the full description on the dataset page: https://huggingface.co/datasets/scikit-learn/imdb.

  11. M

    Movie Rating Sites Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Movie Rating Sites Report [Dataset]. https://www.marketreportanalytics.com/reports/movie-rating-sites-75765
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Apr 10, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global movie rating sites market is experiencing robust growth, driven by the increasing consumption of online streaming services and the rising demand for credible film reviews before purchasing tickets or subscribing. The market's expansion is fueled by several factors, including the proliferation of smartphones and internet access, making it easier for users to access rating platforms. Furthermore, the integration of social media features on many platforms fosters engagement and user-generated content, creating a dynamic and interactive ecosystem. The market is segmented by application (movie promotion, movie research, audience choice, and others) and by rating type (user-based, professional-based, and others). While precise market sizing data is unavailable, given the significant presence of established players like Rotten Tomatoes and IMDb, and considering the considerable global viewership of movies, we can estimate the 2025 market size to be approximately $2 billion. This estimation accounts for advertising revenue, premium subscriptions (where applicable), and potential data licensing to film studios and distributors. The projected CAGR suggests continued substantial growth throughout the forecast period (2025-2033), likely driven by technological advancements and the ever-growing global movie-watching audience. However, potential restraints include the risk of biased reviews and the increasing competition from new platforms and emerging technologies like AI-powered recommendation systems. The North American market currently holds a significant share due to the established presence of major players and a large movie-going audience. However, rapid growth is anticipated in the Asia-Pacific region, particularly in countries like India and China, fueled by the expansion of streaming platforms and increasing internet penetration. Europe, with its diverse film culture and established digital infrastructure, also represents a substantial market segment. Competitive pressures are intensifying, with existing players continually innovating to enhance user experiences, introduce new features, and attract and retain users in a crowded market. The market's future trajectory will be shaped by the strategic moves of key players, technological disruptions, and evolving consumer preferences regarding how they discover and choose movies to watch. Strategic partnerships and acquisitions could also play a significant role in shaping the market landscape in the coming years.

  12. TMDB Top 260 Movies with IMDb Ratings

    • kaggle.com
    Updated Jun 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    102203218_Digvijay_Singh (2025). TMDB Top 260 Movies with IMDb Ratings [Dataset]. https://www.kaggle.com/datasets/diggusingh/top-260-movies-on-tmdb-with-imdb/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 14, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    102203218_Digvijay_Singh
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context Movies are a powerful lens into culture, emotion, and storytelling. This dataset brings together the top 260 highest-rated movies with enriched metadata from two authoritative sources: TMDb (The Movie Database) and OMDb (Open Movie Database).

    It is ideal for researchers, data scientists, and developers working on: -Movie recommendation systems -NLP with plot summaries -Data visualization of film trends -Sentiment and genre analysis

    Overview Category Detail - Records 260 top-rated movies based on TMDb user ratings - Timeframe Includes titles from classic to contemporary cinema - Metadata Title, Release Year, IMDb Rating, Genre(s), Runtime, Director, Plot - Sources TMDb API, OMDb API (retrieved via custom Python scripts) - Format Single CSV file: tmdb_top260_with_imdb.csv

    🧾 Column Descriptions

    Column NameDescriptionData Type
    TitleOfficial title of the movieString
    YearYear the movie was releasedInteger
    IMDb RatingIMDb user rating (scale of 1–10)Float
    RuntimeDuration of the movie (e.g., "142 min")String
    GenreComma-separated list of genresString
    DirectorName(s) of the movie’s director(s)String
    ActorsLeading cast members listed on IMDbString
    PlotShort summary or synopsis of the storylineString

    Files - tmdb_top260_with_imdb.csv Each row represents one film

    Key Features - Multi-source Integration: Combines crowd-sourced user ratings (TMDb) with metadata-rich records (OMDb). - Diverse Genre Coverage: Drama, thriller, animation, sci-fi, and more. - Chronological Range: Spans across decades from vintage masterpieces to modern blockbusters. - Plot Summaries Included: Excellent for NLP projects like topic modeling, keyword extraction, or classification. - Standardized Format: Clean, ready-to-use data for ML, visualization, or statistical analysis.

    Use Cases This dataset is well-suited for: - Recommendation Systems: Build hybrid or content-based models using genre, director, and plot. - Natural Language Processing: Use plot summaries for sentiment analysis or thematic clustering. - Trend Analysis: Explore how movie length, genres, or ratings evolved over time. - Director Impact: Analyze how specific filmmakers influence ratings or genre styles.

    Licensing This dataset is released under the Creative Commons Zero (CC0) license. It is free to use for personal, academic, or commercial purposes with no attribution required.

  13. IMDB SQL dataset project

    • kaggle.com
    Updated Jun 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mithilesh Kale (2024). IMDB SQL dataset project [Dataset]. https://www.kaggle.com/datasets/mithilesh9/sql-dataset-project
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 11, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mithilesh Kale
    Description

    This dataset provides valuable insights derived from an analysis of IMDB movie data, specifically tailored to inform strategic decision-making for film production companies. It offers a comprehensive overview of trends in movie genres, release timing, ratings, top-performing directors and actors, and potential production partners.

    The analysis includes:

    Monthly Production Trends: Identifies peak production months and average annual output.

    Genre Popularity: Analyzes genre popularity based on quantity and average duration.

    Rating Distribution: Reveals common rating ranges and target ratings for success.

    High-Rated Production Houses: Highlights production houses associated with top-rated films.

    Top Directors: Lists directors with a track record of successful films.

    Popular Actors: Identifies popular actors with high average ratings and vote counts.

    Potential Global Partners: Suggests potential global partners based on audience reach.

  14. 250 best ever films analysis

    • zenodo.org
    csv
    Updated Nov 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorge Martín Rota; Jorge Moreno Fuentes; Jorge Martín Rota; Jorge Moreno Fuentes (2024). 250 best ever films analysis [Dataset]. http://doi.org/10.5281/zenodo.14062156
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 10, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jorge Martín Rota; Jorge Moreno Fuentes; Jorge Martín Rota; Jorge Moreno Fuentes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is a curated collection of IMDb's top 250 movies, capturing the unique qualities that make each film a standout. For each movie, you’ll find details like the title, IMDb rating, genre, release date, director, writers, and actors. This gives a snapshot of what defines each film. There’s also a link to the IMDb page for each movie to make it easy to dive deeper into any title that catches your interest. This dataset is perfect for anyone looking to analyze film trends, explore popular genres, or just get a better understanding of what makes these films so iconic.

  15. Movie Metadata and Reviews

    • kaggle.com
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valentina Acevedo Lopez (2024). Movie Metadata and Reviews [Dataset]. https://www.kaggle.com/datasets/valentinaacevedo/movie-metadata-and-reviews
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Valentina Acevedo Lopez
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Overview

    This dataset contains detailed metadata and user reviews for movies. It includes information such as movie titles, genres, user scores, certifications, metascores, directors, top cast members, plot summaries, and user reviews. The data was scraped from IMDb and may contain some inconsistencies and missing values, making it a great resource for practicing data cleaning and preprocessing.

    Columns Description

    • Name: The title of the movie.
    • Year: The release year of the movie.
    • Genres: The genres associated with the movie (e.g., Action, Adventure, Sci-Fi).
    • Users-Score: Average user score.
    • Certification: Movie certification rating (e.g., PG-13, R).
    • Metascore: Metacritic score.
    • Director: The director of the movie.
    • Top-Cast: Main cast members.
    • Plot-Summary: A brief summary of the movie's plot.
    • Users-Reviews: User-submitted reviews.

    Data Cleaning and Preprocessing

    The dataset may include the following issues:

    • Missing Values: Some columns have missing values.
    • Inconsistent Delimiters: Certain rows may have inconsistent delimiters.
    • Duplicate Entries: There might be duplicate records.
    • Formatting Issues: Some columns may contain improperly formatted data.

    Steps for Data Cleaning:

    • Identify and handle missing values.
    • Correct delimiter issues using text processing techniques.
    • Remove duplicate records to ensure data integrity.
    • Standardize formats for categorical variables.

    Potential Use Cases

    • Movie Recommendation Systems: Use the metadata to build recommendation algorithms.
    • Sentiment Analysis: Analyze user reviews to gauge audience sentiment.
    • Trend Analysis: Explore trends in movie genres, ratings, and user reviews.

    License

    This dataset is shared under the MIT License. If you use this data, please attribute IMDb as the source.

  16. Film Circulation dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, png
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Skadi Loist; Skadi Loist; Evgenia (Zhenya) Samoilova; Evgenia (Zhenya) Samoilova (2024). Film Circulation dataset [Dataset]. http://doi.org/10.5281/zenodo.7887672
    Explore at:
    csv, png, binAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Skadi Loist; Skadi Loist; Evgenia (Zhenya) Samoilova; Evgenia (Zhenya) Samoilova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Complete dataset of “Film Circulation on the International Film Festival Network and the Impact on Global Film Culture”

    A peer-reviewed data paper for this dataset is in review to be published in NECSUS_European Journal of Media Studies - an open access journal aiming at enhancing data transparency and reusability, and will be available from https://necsus-ejms.org/ and https://mediarep.org

    Please cite this when using the dataset.


    Detailed description of the dataset:

    1 Film Dataset: Festival Programs

    The Film Dataset consists a data scheme image file, a codebook and two dataset tables in csv format.

    The codebook (csv file “1_codebook_film-dataset_festival-program”) offers a detailed description of all variables within the Film Dataset. Along with the definition of variables it lists explanations for the units of measurement, data sources, coding and information on missing data.

    The csv file “1_film-dataset_festival-program_long” comprises a dataset of all films and the festivals, festival sections, and the year of the festival edition that they were sampled from. The dataset is structured in the long format, i.e. the same film can appear in several rows when it appeared in more than one sample festival. However, films are identifiable via their unique ID.

    The csv file “1_film-dataset_festival-program_wide” consists of the dataset listing only unique films (n=9,348). The dataset is in the wide format, i.e. each row corresponds to a unique film, identifiable via its unique ID. For easy analysis, and since the overlap is only six percent, in this dataset the variable sample festival (fest) corresponds to the first sample festival where the film appeared. For instance, if a film was first shown at Berlinale (in February) and then at Frameline (in June of the same year), the sample festival will list “Berlinale”. This file includes information on unique and IMDb IDs, the film title, production year, length, categorization in length, production countries, regional attribution, director names, genre attribution, the festival, festival section and festival edition the film was sampled from, and information whether there is festival run information available through the IMDb data.


    2 Survey Dataset

    The Survey Dataset consists of a data scheme image file, a codebook and two dataset tables in csv format.

    The codebook “2_codebook_survey-dataset” includes coding information for both survey datasets. It lists the definition of the variables or survey questions (corresponding to Samoilova/Loist 2019), units of measurement, data source, variable type, range and coding, and information on missing data.

    The csv file “2_survey-dataset_long-festivals_shared-consent” consists of a subset (n=161) of the original survey dataset (n=454), where respondents provided festival run data for films (n=206) and gave consent to share their data for research purposes. This dataset consists of the festival data in a long format, so that each row corresponds to the festival appearance of a film.

    The csv file “2_survey-dataset_wide-no-festivals_shared-consent” consists of a subset (n=372) of the original dataset (n=454) of survey responses corresponding to sample films. It includes data only for those films for which respondents provided consent to share their data for research purposes. This dataset is shown in wide format of the survey data, i.e. information for each response corresponding to a film is listed in one row. This includes data on film IDs, film title, survey questions regarding completeness and availability of provided information, information on number of festival screenings, screening fees, budgets, marketing costs, market screenings, and distribution. As the file name suggests, no data on festival screenings is included in the wide format dataset.


    3 IMDb & Scripts

    The IMDb dataset consists of a data scheme image file, one codebook and eight datasets, all in csv format. It also includes the R scripts that we used for scraping and matching.

    The codebook “3_codebook_imdb-dataset” includes information for all IMDb datasets. This includes ID information and their data source, coding and value ranges, and information on missing data.

    The csv file “3_imdb-dataset_aka-titles_long” contains film title data in different languages scraped from IMDb in a long format, i.e. each row corresponds to a title in a given language.

    The csv file “3_imdb-dataset_awards_long” contains film award data in a long format, i.e. each row corresponds to an award of a given film.

    The csv file “3_imdb-dataset_companies_long” contains data on production and distribution companies of films. The dataset is in a long format, so that each row corresponds to a particular company of a particular film.

    The csv file “3_imdb-dataset_crew_long” contains data on names and roles of crew members in a long format, i.e. each row corresponds to each crew member. The file also contains binary gender assigned to directors based on their first names using the GenderizeR application.

    The csv file “3_imdb-dataset_festival-runs_long” contains festival run data scraped from IMDb in a long format, i.e. each row corresponds to the festival appearance of a given film. The dataset does not include each film screening, but the first screening of a film at a festival within a given year. The data includes festival runs up to 2019.

    The csv file “3_imdb-dataset_general-info_wide” contains general information about films such as genre as defined by IMDb, languages in which a film was shown, ratings, and budget. The dataset is in wide format, so that each row corresponds to a unique film.

    The csv file “3_imdb-dataset_release-info_long” contains data about non-festival release (e.g., theatrical, digital, tv, dvd/blueray). The dataset is in a long format, so that each row corresponds to a particular release of a particular film.

    The csv file “3_imdb-dataset_websites_long” contains data on available websites (official websites, miscellaneous, photos, video clips). The dataset is in a long format, so that each row corresponds to a website of a particular film.

    The dataset includes 8 text files containing the script for webscraping. They were written using the R-3.6.3 version for Windows.

    The R script “r_1_unite_data” demonstrates the structure of the dataset, that we use in the following steps to identify, scrape, and match the film data.

    The R script “r_2_scrape_matches” reads in the dataset with the film characteristics described in the “r_1_unite_data” and uses various R packages to create a search URL for each film from the core dataset on the IMDb website. The script attempts to match each film from the core dataset to IMDb records by first conducting an advanced search based on the movie title and year, and then potentially using an alternative title and a basic search if no matches are found in the advanced search. The script scrapes the title, release year, directors, running time, genre, and IMDb film URL from the first page of the suggested records from the IMDb website. The script then defines a loop that matches (including matching scores) each film in the core dataset with suggested films on the IMDb search page. Matching was done using data on directors, production year (+/- one year), and title, a fuzzy matching approach with two methods: “cosine” and “osa.” where the cosine similarity is used to match titles with a high degree of similarity, and the OSA algorithm is used to match titles that may have typos or minor variations.

    The script “r_3_matching” creates a dataset with the matches for a manual check. Each pair of films (original film from the core dataset and the suggested match from the IMDb website was categorized in the following five categories: a) 100% match: perfect match on title, year, and director; b) likely good match; c) maybe match; d) unlikely match; and e) no match). The script also checks for possible doubles in the dataset and identifies them for a manual check.

    The script “r_4_scraping_functions” creates a function for scraping the data from the identified matches (based on the scripts described above and manually checked). These functions are used for scraping the data in the next script.

    The script “r_5a_extracting_info_sample” uses the function defined in the “r_4_scraping_functions”, in order to scrape the IMDb data for the identified matches. This script does that for the first 100 films, to check, if everything works. Scraping for the entire dataset took a few hours. Therefore, a test with a subsample of 100 films is advisable.

    The script “r_5b_extracting_info_all” extracts the data for the entire dataset of the identified matches.

    The script “r_5c_extracting_info_skipped” checks the films with missing data (where data was not scraped) and tried to extract data one more time to make sure that the errors were not caused by disruptions in the internet connection or other technical issues.

    The script “r_check_logs” is used for troubleshooting and tracking the progress of all of the R scripts used. It gives information on the amount of missing values and errors.


    4 Festival Library Dataset

    The Festival Library Dataset consists of a data scheme image file, one codebook and one dataset, all in csv format.

    The codebook (csv file “4_codebook_festival-library_dataset”) offers a detailed description of all variables within the Library Dataset. It lists the definition of variables, such as location and festival name, and festival categories,

  17. Imdb genre wise Top 50 movies

    • kaggle.com
    Updated Jul 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhimanyu Kundu (2022). Imdb genre wise Top 50 movies [Dataset]. http://doi.org/10.34740/kaggle/dsv/3904815
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 5, 2022
    Dataset provided by
    Kaggle
    Authors
    Abhimanyu Kundu
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This contains different csv files related top 50 movies according to different genres. Having fields like the duration of the movie , the Director ,Rating of the movie ,How many people voted for the rating ,the amount that the movie made all around the world and the description of the movie can be used to analyze why certain highly rated movies attracted many people

  18. IMDb Top 5000 TV Shows

    • kaggle.com
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tiago Adria Nunes (2025). IMDb Top 5000 TV Shows [Dataset]. https://www.kaggle.com/datasets/tiagoadrianunes/imdb-top-5000-tv-shows/versions/61
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 9, 2025
    Dataset provided by
    Kaggle
    Authors
    Tiago Adria Nunes
    Description

    This dataset brings together the Top 5000 highest-rated TV shows according to IMDb users. It was curated to enable analysis of rating patterns, popularity trends, genres, and other relevant attributes in the TV show landscape.

    Data Source: https://developer.imdb.com/non-commercial-datasets/

    Processing and Code Repository: https://github.com/TiagoAdriaNunes/imdb_top_5000_tv_shows/blob/main/imdb_tv_shows_analysis.R

    Purpose: Inspired by the structure of the "IMDB Top 5000 Movies" dataset, this version focuses exclusively on TV series, offering a solid base for data analysis and visualization projects in the entertainment domain.

    Pipeline: https://github.com/TiagoAdriaNunes/imdb_top_5000_tv_shows/blob/main/.github/workflows/imdb-tv-shows-pipeline.yml

    Shiny App for Data Visualization: https://tiagoadrianunes.shinyapps.io/IMDB_TOP_5000_TV_SHOWS/

    Kaggle Notebook using this dataset: https://www.kaggle.com/code/tiagoadrianunes/imdb-top-5000-tv-shows-notebook

    Information courtesy of IMDb (https://www.imdb.com). Used with permission.

    See also the Movies version: https://www.kaggle.com/datasets/tiagoadrianunes/imdb-top-5000-movies

  19. IMDb India Movies

    • kaggle.com
    Updated Jun 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adrian McMahon (2021). IMDb India Movies [Dataset]. https://www.kaggle.com/adrianmcmahon/imdb-india-movies/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 18, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Adrian McMahon
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    India
    Description

    Task Details

    Every dataset has a story and this set is pulled from IMDb.com of all the Indian movies on the platform. Clean this data by removing missing values or adding average values this process will help to manipulate the data to help with your EDA.

    Analyze data and provide some trends.

    • Year with best rating
    • Does length of movie have any impact with the rating?
    • Top 10 movies according to rating per year and overall.
    • Number of popular movies released each year.
    • Counting the number of votes which movies preformed better in rating per year and overall.
    • Any other trends or future prediction you may have
    • Which director directed the most movies
    • Which actor starred in the movie
    • Any other trends you can find

    Thank you for viewing my dataset, looking forward to seeing some codes.

  20. IMDb Film & Series Data Analysis

    • zenodo.org
    csv
    Updated Apr 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernando José Cofiño Gavito; Fernando José Cofiño Gavito (2024). IMDb Film & Series Data Analysis [Dataset]. http://doi.org/10.5281/zenodo.10982158
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Fernando José Cofiño Gavito; Fernando José Cofiño Gavito
    License

    Attribution-NonCommercial-ShareAlike 1.0 (CC BY-NC-SA 1.0)https://creativecommons.org/licenses/by-nc-sa/1.0/
    License information was derived automatically

    Time period covered
    Apr 2024
    Description

    El conjunto de datos para este proyecto contendrá los siguientes descriptivos sobre películas y series de IMDb, lo que permitirá analizar las distintas tendencias en la industria: Title, Year, Genres, Directors, Actors, Rating, Reviews, Duration, Type, Episode, Season, Budget, Revenue. Estos campos creo que son lo suficientemente descriptivos como para permitirnos un análisis en profundidad de las películas, series, actores, directores, etc. a lo largo del tiempo.

    · Title: El título de la película o serie.

    · Year: El año en que se lanzó la película o serie.

    · Genres: El género de la película o serie (por ejemplo, drama, comedia, acción, etc.).

    · Directors: El director de la película o serie.

    · Actors: Los actores principales de la película o serie.

    · Rating: La calificación de la película o serie en IMDb.

    · Reviews: El número de reseñas de usuarios para la película o serie.

    · Duration: La duración de la película o serie en minutos.

    · Type: Si es una película o serie.

    · Episode: El número de episodios si es una serie.

    · Season: El número de temporadas si es una serie.

    · Budget: El presupuesto de la película o serie.

    · Revenue: La recaudación de la película o serie.

    Los datos del conjunto abarcan un periodo de tiempo que se extiende desde el lanzamiento de IMDb en octubre de 1990 hasta el presente mes de abril de 2024.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Crawl Feeds (2025). IMDB movie details dataset [Dataset]. https://crawlfeeds.com/datasets/imdb-movie-details-dataset
Organization logo

IMDB movie details dataset

IMDB movie details dataset from imdb.com

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip, csvAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Crawl Feeds
License

https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

Description
The IMDB Movie Details Dataset is a comprehensive collection of movie datasets that offers a treasure trove of information about movies, TV shows, and streaming content listed on IMDB. This dataset includes detailed data such as titles, release years, genres, cast, crew, ratings, and more, making it a go-to resource for film and entertainment enthusiasts. Ideal for data analysis, IMDB movie dataset applications span machine learning projects, predictive modeling, and insights into industry trends.
Researchers can explore patterns in movie ratings and genre popularity, while developers can use the dataset to build recommendation systems or applications. Movie buffs can dive deep into historical and contemporary trends in the world of cinema. This dataset not only supports academic and professional pursuits but also opens doors for creative projects in storytelling, content creation, and audience engagement. Whether you’re a developer, researcher, or film enthusiast, the IMDB movie dataset is a powerful tool for uncovering trends and gaining deeper insights into the evolving entertainment landscape.
Search
Clear search
Close search
Google apps
Main menu