100+ datasets found
  1. Movies Performance and Feature Statistics

    • kaggle.com
    Updated Jan 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Movies Performance and Feature Statistics [Dataset]. https://www.kaggle.com/datasets/thedevastator/movies-performance-and-feature-statistics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 16, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    Movies Performance and Feature Statistics

    Analyzing Box Office Performance, Rating and Audience Reactions

    By Yashwanth Sharaff [source]

    About this dataset

    This dataset contains essential characteristics of a variety of movies, including basic pieces of information such as the movie's title and budget, as well as performance indicators like the movie's MPAA rating, gross revenue, release date, genre, runtime, rating count and summary. With this data set we can better understand the film industry and uncover insights on how different features and performance metrics impact one another to guarantee a movie's success. The movies dataset also helps you make informed decisions about which features are key indicators in setting up a high-grossing feature film

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    To get the most out of this data set you need to understand what each column in it represents. The ‘Title’ column gives you the title of the movie which can be used for further search or exploration on popular streaming services and websites that are dedicated to providing detailed information about movies. The ‘MPAA Rating’ lists any Motion Picture Association (MPAA) rating for a movie which consists of G (General Audiences), PG (Parental Guidance Suggested), PG-13 (Parents Strongly Cautioned), R (Under 17 Requires Accompanying Parent or Guardian) etc. The 'Budget' column give you an approximate idea about how much a particular production cost while the 'Gross' columns depicts its earnings if it was released in theaters while its successor 'Release Date' reveals when each film has been released or is going to release in future. The columns 'Genre', 'Runtime', and ‘Rating Count’ cover subje​cts such as what type of movie is it? Every genre will have an associated runtime limit along with rating count which refers to number people who have rated/reviewed a particular flick whether on IMDB or other streaming services as well as paper mediums like newspapers . Last but not least summary field states an overview of what we can expect from film so take this in account before watching anything especially if include children members in your family.

    So go ahead - start exploring this interesting dataset today!

    Research Ideas

    • Creating a box office prediction model using budget, genre, release date and MPAA rating
    • Using the summary data to create a sentiment analysis tool for movie reviews
    • Building a recommendation engine for users based on their prior ratings and what other users with similar tastes have rated as highly

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: movies.csv | Column name | Description | |:-----------------|:-------------------------------------------------------------------------------| | Title | The title of the movie. (String) | | MPAA Rating | The Motion Picture Association of America (MPAA) rating of the movie. (String) | | Budget | The budget of the movie in US dollars. (Integer) | | Gross | The gross revenue of the movie in US dollars. (Integer) | | Release Date | The date the movie was released. (Date) | | Genre | The genre of the movie. (String) | | Runtime | The length of the movie in minutes. (Integer) | | Rating Count | The number of ratings the movie has received. (Integer) | | Summary | A brief summary of the movie. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Yashwanth Sharaff.

  2. Full TMDB Movies Dataset 2024 (1M Movies)

    • kaggle.com
    zip
    Updated Nov 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    asaniczka (2025). Full TMDB Movies Dataset 2024 (1M Movies) [Dataset]. https://www.kaggle.com/datasets/asaniczka/tmdb-movies-dataset-2023-930k-movies
    Explore at:
    zip(239404730 bytes)Available download formats
    Dataset updated
    Nov 11, 2025
    Authors
    asaniczka
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    The TMDb (The Movie Database) is a comprehensive movie database that provides information about movies, including details like titles, ratings, release dates, revenue, genres, and much more.

    This dataset contains a collection of 1,000,000 movies from the TMDB database.

    Dataset is updated daily. If you find this dataset valuable, don't forget to hit the upvote button! 😊💝

    Interesting Task Ideas:

    1. Predict movie ratings based on features such as revenue, popularity, genre, and runtime.
    2. Identify trends in movie release dates and analyze their impact on revenue.
    3. Analyze the relationship between budget, revenue, and popularity to determine factors that contribute to a movie's success.
    4. Build a recommendation system that suggests similar movies based on genres, production companies, and language.
    5. Perform sentiment analysis on movie reviews to understand audience reactions.
    6. Explore the impact of movie genres on popularity and revenue.
    7. Investigate the correlation between runtime and audience engagement.
    8. Identify successful production companies and analyze their strategies.
    9. Utilize natural language processing techniques to extract meaningful insights from movie overviews.
    10. Visualize movie popularity over time and identify popular genres in different periods.

    Checkout my other datasets

    Clash of Clans Clans Dataset 2023 (3.5M Clans)

    Black-White Wage Gap in the USA Dataset

    130K Kindle Books

    USA Unemployment Rates by Demographics & Race

    150K TMDb TV Shows

    Photo by Onur Binay on Unsplash

  3. c

    IMDB movie details dataset

    • crawlfeeds.com
    csv, zip
    Updated Nov 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). IMDB movie details dataset [Dataset]. https://crawlfeeds.com/datasets/imdb-movie-details-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Nov 9, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description
    The IMDB Movie Details Dataset is a comprehensive collection of movie datasets that offers a treasure trove of information about movies, TV shows, and streaming content listed on IMDB. This dataset includes detailed data such as titles, release years, genres, cast, crew, ratings, and more, making it a go-to resource for film and entertainment enthusiasts. Ideal for data analysis, IMDB movie dataset applications span machine learning projects, predictive modeling, and insights into industry trends.
    Researchers can explore patterns in movie ratings and genre popularity, while developers can use the dataset to build recommendation systems or applications. Movie buffs can dive deep into historical and contemporary trends in the world of cinema. This dataset not only supports academic and professional pursuits but also opens doors for creative projects in storytelling, content creation, and audience engagement. Whether you’re a developer, researcher, or film enthusiast, the IMDB movie dataset is a powerful tool for uncovering trends and gaining deeper insights into the evolving entertainment landscape.
  4. c

    Movies dataset from allmovie

    • crawlfeeds.com
    json, zip
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2024). Movies dataset from allmovie [Dataset]. https://crawlfeeds.com/datasets/movies-dataset-form-allmovie
    Explore at:
    json, zipAvailable download formats
    Dataset updated
    Dec 26, 2024
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Movies Dataset from AllMovie is a comprehensive collection featuring over 430,000 records, encompassing a wide range of films across various genres and languages. This extensive dataset includes essential data points such as movie titles, genres, release dates, posters, languages, directors, durations, synopses, trailers, average ratings, cast information, and URLs. Such detailed metadata is invaluable for developers, researchers, and enthusiasts aiming to analyze trends, build recommendation systems, or conduct in-depth studies of the film industry.

    For those interested in alternative datasets, the IMDb Non-Commercial Datasets provide subsets of IMDb data accessible for personal and non-commercial use. These datasets allow users to hold local copies of movie information, facilitating various analytical projects.

    Additionally, the MovieLens datasets offer a range of movie rating data suitable for research purposes. For instance, the MovieLens 20M dataset comprises 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users, making it a valuable resource for studies in user preferences and recommendation algorithms.

    Incorporating these datasets into your projects can significantly enhance the quality and depth of your analyses, providing a solid foundation for exploring various aspects of the cinematic world.

    Why Choose Crawl Feeds for Your Data Needs?

    Crawl Feeds is your trusted partner in acquiring high-quality, curated datasets tailored to your specific requirements. With a vast repository that includes the Movies Dataset, we empower developers and businesses to drive innovation. Explore our easy-to-use platform and transform your ideas into actionable insights.

    Get Started with Crawl Feeds Today

  5. d

    National box office statistics

    • data.gov.tw
    csv, json
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministry of Culture, National box office statistics [Dataset]. https://data.gov.tw/en/datasets/94224
    Explore at:
    json, csvAvailable download formats
    Dataset authored and provided by
    Ministry of Culture
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    This dataset provides national theater box office statistics for films distributed by the Administrative Institution National Film and Audiovisual Culture Center. The data is up to the last Sunday before the announcement date and does not include films that have not been screened for less than 7 calendar days. The earliest CSV format data in this dataset begins on July 30, 2018, and the earliest JSON format data begins on March 1, 2020. JSON format queries require entering the start and end dates (in the format of year, month, and day), and can provide data for a maximum of 90 days at a time.

  6. Movie data (100K+ titles with budget, credits)

    • kaggle.com
    zip
    Updated Sep 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ramcharan (2020). Movie data (100K+ titles with budget, credits) [Dataset]. https://www.kaggle.com/datasets/kakarlaramcharan/tmdb-data-0920
    Explore at:
    zip(126738859 bytes)Available download formats
    Dataset updated
    Sep 10, 2020
    Authors
    ramcharan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This data contains information on 119K movies & TV shows released internationally scraped from TMBD (TMDB : https://www.themoviedb.org/). TMDB is a community built movie and TV database. We have the following information in the dataset. This dataset is in form of csv which is pipe delimited. This dataset has rich information on title, synposis, year of release, budget, revenue , popularity, original language in which movie/tv show was produced, production companies, production countries, user vote averages, runtime, release date, tagline, actors & directors

    Content

    VariableDescription
    belongs_to_collectionIndicates whether movie belongs to a collection, collection is specified if exists
    budgetMovies budget
    idUnique identifier for the movie
    original_languageOriginal language in whch movie is produced
    original_titleTitle of the movie
    overviewSummary of the movie
    popularityPopularity index of the movie
    production_companiesList of companies that produced the movies
    production_countriesCountry where the movie is produced
    release_dateMovie released date
    revenueMovie collection, missing is represented by 0
    runtimeMovie runtime in minutes
    statusindicates whether movies is released or not
    taglineMovie tagline
    titleMovie alias english title
    vote_averageAverage vote rating by the viewers
    overviewsynopsis of the movie
    castCast credits (Actors)
    directorsDirector credits

    Acknowledgements

    Thanks to TMDB for making their data available

    Inspiration

    Hope this will be helpful for your research or any academic work

    Thank you Ram

  7. Film Genre Statistics

    • kaggle.com
    zip
    Updated Dec 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Film Genre Statistics [Dataset]. https://www.kaggle.com/datasets/thedevastator/film-genre-statistics
    Explore at:
    zip(36435 bytes)Available download formats
    Dataset updated
    Dec 19, 2023
    Authors
    The Devastator
    Description

    Film Genre Statistics

    Movie genre statistics and revenue data from 1995-2018

    By Throwback Thursday [source]

    About this dataset

    This dataset contains genre statistics for movies released between 1995 and 2018. It provides information on various aspects of the movies, such as gross revenue, tickets sold, and inflation-adjusted figures. The dataset includes columns for genre, year of release, number of movies released in each genre and year, total gross revenue generated by movies in each genre and year, total number of tickets sold for movies in each genre and year, inflation-adjusted gross revenue that takes into account changes in the value of money over time, title of the highest-grossing movie in each genre and year, gross revenue generated by the highest-grossing movie in each genre and year, and inflation-adjusted gross revenue of the highest-grossing movie in each genre and year. This dataset offers insights into film industry trends over a span of more than two decades

    How to use the dataset

    Understanding the Columns

    Before diving into the analysis, let's familiarize ourselves with the different columns in this dataset:

    • Genre: This column represents the genre of each movie.
    • Year: The year in which the movies were released.
    • Movies Released: The number of movies released in a particular genre and year.
    • Gross: The total gross revenue generated by movies in a specific genre and year.
    • Tickets Sold: The total number of tickets sold for movies in a specific genre and year.
    • Inflation-Adjusted Gross: The gross revenue adjusted for inflation, taking into account changes in the value of money over time.
    • Top Movie: The title of the highest-grossing movie in a specific genre and year.
    • Top Movie Gross (That Year): The gross revenue generated by the highest-grossing movie in a specific genre and year.
    • Top Movie Inflation-Adjusted Gross (That Year): The inflation-adjusted gross revenue of the highest-grossing movie in a specific genre and year.

    Analyzing Data

    To make use of this dataset effectively, here are some potential analyses you can perform:

    • Find popular genres: You can determine which genres are popular by looking at columns like Movies Released or Tickets Sold. Analyzing these numbers will give you insights into what types of movies attract more audiences.

    • Measure financial success: Explore columns like Gross, Inflation Adjusted Gross, or Top Movie Gross (That Year) to compare the financial success of different genres. This will allow you to identify genres that generate higher revenue.

    • Understand movie trends: By analyzing the dataset over different years, you can observe trends in movie releases and gross revenue for specific genres. This information is crucial for understanding how movie preferences change over time.

    • Identify highest-grossing movies: The column Top Movie gives you the title of the highest-grossing movie in each genre and year. You can use this information to analyze the success of specific movies within their respective genres.

    Data Visualization

    To enhance your analysis, consider using data visualization techniques

    Research Ideas

    • Predicting the popularity and success of movies in different genres: By analyzing the data on tickets sold and gross revenue, we can identify trends and patterns in movie genres that attract more audiences and generate higher revenue. This information can be useful for filmmakers, production studios, and investors to make informed decisions about which genres to focus on for future movie releases.
    • Comparing the performance of movies over time: With the inclusion of inflation-adjusted figures, this dataset allows us to compare the box office success of movies across different years. We can analyze how movies in specific genres have performed over time in terms of gross revenue and adjust these figures for inflation to get a better understanding of their true financial success.
    • Analyzing the impact of genre popularity on ticket sales: By examining the relationship between genre popularity (measured by tickets sold) and total gross revenue, we can gain insights into audience preferences and behavior. This information is valuable for marketing strategies, as it helps determine which movie genres are most likely to attract a larger audience base and generate higher ticket sales

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns...

  8. q

    Movie Data - X - Test - w2v

    • data.researchdatafinder.qut.edu.au
    Updated Apr 8, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Movie Data - X - Test - w2v [Dataset]. https://data.researchdatafinder.qut.edu.au/dataset/survey-word-vector/resource/e638fc06-7ef3-4a41-85e2-21f7fad2dfb3
    Explore at:
    Dataset updated
    Apr 8, 2018
    License

    http://researchdatafinder.qut.edu.au/display/n15252http://researchdatafinder.qut.edu.au/display/n15252

    Description

    This file contains the features for the test portion of the movie dataset. The data has been changed into an average word vector. This is 50% of the total movie results. QUT Research Data Respository Dataset Resource available for download

  9. Box office revenue in the U.S. & Canada 1995-2025, by movie rating

    • statista.com
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Box office revenue in the U.S. & Canada 1995-2025, by movie rating [Dataset]. https://www.statista.com/statistics/433709/highest-grossing-movies-domestic-box-office-rating/
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Canada, United States
    Description

    Between 1995 and 2025, PG-13-rated movies grossed approximately 129.7 billion U.S. dollars at the North American box office – a term that excludes Mexico and includes Canada and the United States. R-rated and PG-rated films grossed around 72.22 billion and 58.41 billion dollars, respectively.

  10. h

    letterboxd-all-movie-data

    • huggingface.co
    Updated Jul 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salih Mert Canseven (2025). letterboxd-all-movie-data [Dataset]. https://huggingface.co/datasets/pkchwy/letterboxd-all-movie-data
    Explore at:
    Dataset updated
    Jul 21, 2025
    Authors
    Salih Mert Canseven
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Letterboxd Film Dataset

    This dataset contains a comprehensive collection of 847,209 films from the Letterboxd platform, including movie information, user reviews, and ratings.

      Dataset Summary
    

    Total Films: 847,209 File Size: ~1.12 GB (1,120,572,122 bytes) Format: JSONL (JSON Lines) Language: Primarily English, with some multilingual content

      Data Structure
    

    Each line contains a JSON object with the following fields: { "url":… See the full description on the dataset page: https://huggingface.co/datasets/pkchwy/letterboxd-all-movie-data.

  11. IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage)

    • crawlfeeds.com
    csv, zip
    Updated Nov 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage) [Dataset]. https://crawlfeeds.com/datasets/imdb-movies-metadata-dataset-4-5m-records-global-coverage
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Nov 9, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Unlock one of the most comprehensive movie datasets available—4.5 million structured IMDb movie records, extracted and enriched for data science, machine learning, and entertainment research.

    This dataset includes a vast collection of global movie metadata, including details on title, release year, genre, country, language, runtime, cast, directors, IMDb ratings, reviews, and synopsis. Whether you're building a recommendation engine, benchmarking trends, or training AI models, this dataset is designed to give you deep and wide access to cinematic data across decades and continents.

    Perfect for use in film analytics, OTT platforms, review sentiment analysis, knowledge graphs, and LLM fine-tuning, the dataset is cleaned, normalized, and exportable in multiple formats.

    What’s Included:

    • Genres: Drama, Comedy, Horror, Action, Sci-Fi, Documentary, and more

    • Delivery: Direct download

    Use Cases:

    • Train LLMs or chatbots on cinematic language and metadata

    • Build or enrich movie recommendation engines

    • Run cross-lingual or multi-region film analytics

    • Benchmark genre popularity across time periods

    • Power academic studies or entertainment dashboards

    • Feed into knowledge graphs, search engines, or NLP pipelines

  12. m

    Bollywood Movies data

    • data.mendeley.com
    • kaggle.com
    Updated May 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prashant Premkumar (2020). Bollywood Movies data [Dataset]. http://doi.org/10.17632/3c57btcxy9.1
    Explore at:
    Dataset updated
    May 12, 2020
    Authors
    Prashant Premkumar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Using a Python script to scrape data from the web, we collected data pertaining to all 1698 Hindi language movies that released in India across a 13 year period (2005-2017) from the website of Box Office India.

  13. c

    Movies and Tv Shows Dataset

    • crawlfeeds.com
    • kaggle.com
    csv, zip
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Movies and Tv Shows Dataset [Dataset]. https://crawlfeeds.com/datasets/movies-and-tv-shows-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jul 4, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Explore our meticulously curated Movies dataset and TV shows dataset, designed to cater to diverse analytical and research needs. Whether you're a data scientist, a student, or a business professional, these datasets provide valuable insights into the entertainment industry.

    Key Features of the Movies Dataset:

    1. Extensive collection of global movies across various genres and languages.

    2. Detailed metadata, including titles, release dates, genres, directors, cast, and ratings.

    3. Regularly updated to ensure relevance and accuracy.

    Why Choose Our TV Shows Dataset?

    Our TV shows dataset is your gateway to understanding trends in episodic content. It includes:

    • Comprehensive details about popular and niche TV shows.

    • Information on episode counts, seasons, ratings, and networks.

    • Insights into audience preferences and regional programming.

    Applications of These Datasets

    These datasets are perfect for:

    • Machine learning models for recommendation systems.

    • Academic research on media trends and audience behavior.

    • Business strategies for entertainment platforms.

    Unlock the power of TV show data with our Crawl Feeds TV Shows Dataset. Start analyzing today and gain valuable insights into your favorite shows!

  14. T

    movielens

    • tensorflow.org
    • opendatalab.com
    • +1more
    Updated Jul 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). movielens [Dataset]. https://www.tensorflow.org/datasets/catalog/movielens
    Explore at:
    Dataset updated
    Jul 8, 2020
    Description

    This dataset contains a set of movie ratings from the MovieLens website, a movie recommendation service. This dataset was collected and maintained by GroupLens, a research group at the University of Minnesota. There are 5 versions included: "25m", "latest-small", "100k", "1m", "20m". In all datasets, the movies data and ratings data are joined on "movieId". The 25m dataset, latest-small dataset, and 20m dataset contain only movie data and rating data. The 1m dataset and 100k dataset contain demographic data in addition to movie and rating data.

    • "25m": This is the latest stable version of the MovieLens dataset. It is recommended for research purposes.
    • "latest-small": This is a small subset of the latest version of the MovieLens dataset. It is changed and updated over time by GroupLens.
    • "100k": This is the oldest version of the MovieLens datasets. It is a small dataset with demographic data.
    • "1m": This is the largest MovieLens dataset that contains demographic data.
    • "20m": This is one of the most used MovieLens datasets in academic papers along with the 1m dataset.

    For each version, users can view either only the movies data by adding the "-movies" suffix (e.g. "25m-movies") or the ratings data joined with the movies data (and users data in the 1m and 100k datasets) by adding the "-ratings" suffix (e.g. "25m-ratings").

    The features below are included in all versions with the "-ratings" suffix.

    • "movie_id": a unique identifier of the rated movie
    • "movie_title": the title of the rated movie with the release year in parentheses
    • "movie_genres": a sequence of genres to which the rated movie belongs
    • "user_id": a unique identifier of the user who made the rating
    • "user_rating": the score of the rating on a five-star scale
    • "timestamp": the timestamp of the ratings, represented in seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970

    The "100k-ratings" and "1m-ratings" versions in addition include the following demographic features.

    • "user_gender": gender of the user who made the rating; a true value corresponds to male
    • "bucketized_user_age": bucketized age values of the user who made the rating, the values and the corresponding ranges are:
      • 1: "Under 18"
      • 18: "18-24"
      • 25: "25-34"
      • 35: "35-44"
      • 45: "45-49"
      • 50: "50-55"
      • 56: "56+"
    • "user_occupation_label": the occupation of the user who made the rating represented by an integer-encoded label; labels are preprocessed to be consistent across different versions
    • "user_occupation_text": the occupation of the user who made the rating in the original string; different versions can have different set of raw text labels
    • "user_zip_code": the zip code of the user who made the rating

    In addition, the "100k-ratings" dataset would also have a feature "raw_user_age" which is the exact ages of the users who made the rating

    Datasets with the "-movies" suffix contain only "movie_id", "movie_title", and "movie_genres" features.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('movielens', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  15. Average revenue of films in the U.S. & Canada 1995-2025, by selected source...

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Average revenue of films in the U.S. & Canada 1995-2025, by selected source material [Dataset]. https://www.statista.com/statistics/188689/movie-sources-in-north-america-by-average-box-office-revenue/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Canada, United States
    Description

    Between 1995 and 2025, a movie based on comics or graphic novels grossed, on average, about 88.36 million U.S. dollars across the United States and Canada – collectively known as the North American box office. Spin-offs followed as the second-most commercially successful film source material, with average box office revenue of around 86.32 million dollars.

  16. Global Movie Franchise Revenue and Budget Data

    • kaggle.com
    zip
    Updated Jan 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Global Movie Franchise Revenue and Budget Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/global-movie-franchise-revenue-and-budget-data
    Explore at:
    zip(8820 bytes)Available download formats
    Dataset updated
    Jan 16, 2023
    Authors
    The Devastator
    Description

    Global Movie Franchise Revenue and Budget Data

    Tracks Lifetime Gross, Budgets, Ratings, and Release Dates

    By Emma Culwell [source]

    About this dataset

    This dataset offers an extensive look at some of the most popular movie franchises in history, shedding light on their financial success and public reception. It includes data on the lifetime gross sales, budgets, ratings, and release dates of each featured movie. Furthermore, this dataset provides invaluable insights into how different elements such as ratings and runtime can affect the performance of a film at the box office. Whether you are an aspiring or established filmmaker looking for inspiration to craft your own successful blockbuster or simply a fan curious about these films’ inner workings, this dataset offers an unprecedented level of detail regarding many beloved franchises

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides comprehensive information on movie franchises released worldwide between 2000 and 2020. It includes data such as lifetime gross, budget, rating, runtime, release date and vote count/average. This dataset can be used to gain insights on the global movie industry trends over this time period.

    The data can be explored in various ways to identify patterns of success or failure among movie franchises across countries, genres or decades. For example, you may want to examine the average budget for movies released each year or calculate the average number of votes received by movies of a particular genre. Additionally, you could use this dataset to compare different types of media (e.g., cable vs streaming) and understand how they impact box-office performance.

    To get the most out of this data set it is essential that you first familiarize yourself with all the columns provided: Title: The title of the movie; Lifetime Gross: Total amount money earned by a franchise in all territories; Year: The year in which it was first made available publicly; Studio: The production company behind the production; Rating: Classification given by MPAA/BBFC; Runtime: Length in minutes/hours; Budget: Amount spent producing it ; Release Date : Date when publically announced Availability ; Vote Average : Average ratings based on user reviews ; Vote Count : Number people who rated franchise).
    Once you have become comfortable with these variables then feel free to try out some larger analysis techniques such as predictive analytics (predicting future success based on existing trends) or clustering (grouping similar outcomes together). No matter which methods you decide to utilize it is important that you remember – always validate your assumptions! Good luck exploring!

    Research Ideas

    • A comparison of movie budget to box office returns, to identify over/underperforming movies.
    • A study of the correlation between movie rating and viewership.
    • An analysis of what types of movies tend to become franchise success stories (big budget, PG-13 rating, etc.)

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: MovieFranchises.csv | Column name | Description | |:-------------------|:------------------------------------------------------------------------| | Title | The title of the movie. (String) | | Lifetime Gross | The total amount of money the movie has made in its lifetime. (Integer) | | Year | The year the movie was released. (Integer) | | Studio | The studio that produced the movie. (String) | | Rating | The rating of the movie (e.g. PG-13, R, etc). (String) | | Runtime | The length of the movie in minutes. (Integer) | | Budget | The budget of the movie in USD. (Integer) | | ReleaseDate | The date the movie was released. (Date) | | VoteAvg | The average rating of the movie from users. (Float) | | VoteCount | The total number of votes the movie has received from users. (Integer) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Emma Culwell.

  17. Movie Dataset for ML

    • kaggle.com
    zip
    Updated Oct 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhik Dhar (2023). Movie Dataset for ML [Dataset]. https://www.kaggle.com/datasets/abhikdhar/movie-dataset-random
    Explore at:
    zip(19713 bytes)Available download formats
    Dataset updated
    Oct 2, 2023
    Authors
    Abhik Dhar
    Description

    Description: This dataset contains information about 616 movies spanning various genres, years of release, and creative talents involved in their production. The dataset is intended for use in data analysis, visualization, and machine learning projects related to the film industry. Each row represents a single movie entry, and the dataset includes the following columns:

    Movie: The title of the movie. Year: The year of release for the movie. Genres: The genres or categories associated with the movie. Certification/Rating: The film's certification or rating according to the relevant rating board or organization. IMDb ID: The unique IMDb identifier for the movie. Writer: The name(s) of the writer(s) or screenwriter(s) responsible for the movie's screenplay. Director: The name of the movie's director. Potential Use Cases:

    Film industry analysis: Analyze trends in movie genres and ratings over time. Predicting movie success: Build predictive models to forecast a movie's success based on its features. Recommender systems: Develop movie recommendation systems for users based on their preferences. Creative insights: Explore relationships between directors, writers, and movie genres.

  18. Movie releases in the U.S. & Canada 2000-2024

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Movie releases in the U.S. & Canada 2000-2024 [Dataset]. https://www.statista.com/statistics/187122/movie-releases-in-north-america-since-2001/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Canada, United States
    Description

    In 2024, a total of 569 movies were released in the United States and Canada, up from 506 in the previous year. Still, these figures are under the 792 titles released in 2019, before the COVID-19 outbreak. Will moviegoers return? The box office revenue in the U.S. and Canada more than tripled between 2020 and 2022, when it reached almost 7.4 billion U.S. dollars. The 2022 result still fell way behind the 11.3-billion-dollar annual revenue recorded just before the pandemic. But there are ways to attract newcomers to the moviegoing experience. During a mid-2022 survey conducted among members of the Generation Z – aged between 13 and 24 years – more than half of respondents mentioned movie offering as a leading motivation to go to the movies. About 40 percent of interviewees included the quality of the service and the physical comfort of the seats at the movie theater among their main incentives. Cinema circuits As the industry tries to reinvent itself for a post-pandemic scenario, the top movie theater chains in North America slowly bounce back. Their financial results improved since the coronavirus outbreak, but when or if they will see figures similar to those recorded before 2020 remains an open question. The leading circuit, AMC Theatres, reported a revenue of more than 2.5 billion dollars in 2021, over twice as much as in the previous year.

  19. Description of data from IMDb.

    • plos.figshare.com
    zip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marlon Ramos; Angelo M. Calvão; Celia Anteneodo (2023). Description of data from IMDb. [Dataset]. http://doi.org/10.1371/journal.pone.0136083.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Marlon Ramos; Angelo M. Calvão; Celia Anteneodo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We collected votes (from 1 to 10 stars) for all movies, excluding TV episodes (total number of 336,090,882 votes for 300,723 movies), from March 19 to 28, 2013 (set # 1). Using the same list of movies, we collected the number of votes again from December 8 to 18, 2014 (set #2, 465,292,451 votes) and from January 5 to 10, 2015 (set # 3, 471,222,420), as shown in (Fig 10). For budgets, we use a new list and collected data from February 5 to 8, 2015. Results with fewer than 5 votes (in 2013) are not exhibited. Number of items by type: 33,941 (Documentary) 133,775 (Feature Film) 3,172 (Mini-Series) 50,408 (Short Film) 1,071 (TV Episode) 25,168 (TV Movie) 33,165 (TV Series) 2,450 (TV Special) 12,120 (Video) 5,453 (Video Game) By genre: 24,911 (Action); 93 (Adult); 15,651 (Adventure); 18,918 (Animation); 5,385 (Biography); 74,393 (Comedy); 18,693 (Crime); 37,250 (Documentary); 97,087 (Drama); 16,022 (Family); 8,677 (Fantasy); 567 (Film Noir); 1,575 (Game Show); 5,525 (History); 15,072 (Horror); 10,212 (Music); 5,840 (Musical); 8,170 (Mystery); 1,036 (News); 3,605 (Reality TV); 21,165 (Romance); 8,239 (Sci-Fi); 61,538 (Short); 4,360 (Sport); 1,467 (Talk Show); 16,246 (Thriller); 5,080 (War); 4,549 (Western). An item could be defined by more the one genre. As a final observation, it is possible for a user to remove his or her vote; as a consequence, a small fraction of movies have a decreasing number of votes. However, this represents a negligible fraction of the movies. We used the following list: http://www.imdb.com/search/title?title_type=feature,tv_movie,tv_series,tv_special,mini_series,documentary,game,short,video,unknown&user_rating=1.0,10. (ZIP)

  20. Movie genre distribution in the U.S. & Canada 2010-2025, by box office...

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Movie genre distribution in the U.S. & Canada 2010-2025, by box office market share [Dataset]. https://www.statista.com/statistics/668712/movie-genres-in-north-america-by-average-box-office-revenue/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Canada, United States
    Description

    Throughout 2024, the action movie genre accounted for almost ** percent of the box office revenue in the United States and Canada, collectively known as the North American film market. Adventure, which historically tends to lead the market, ranked second with around ** percent and comedy ranked third with around ***** percent in 2024.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2023). Movies Performance and Feature Statistics [Dataset]. https://www.kaggle.com/datasets/thedevastator/movies-performance-and-feature-statistics
Organization logo

Movies Performance and Feature Statistics

Analyzing Box Office Performance, Rating and Audience Reactions

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 16, 2023
Dataset provided by
Kaggle
Authors
The Devastator
Description

Movies Performance and Feature Statistics

Analyzing Box Office Performance, Rating and Audience Reactions

By Yashwanth Sharaff [source]

About this dataset

This dataset contains essential characteristics of a variety of movies, including basic pieces of information such as the movie's title and budget, as well as performance indicators like the movie's MPAA rating, gross revenue, release date, genre, runtime, rating count and summary. With this data set we can better understand the film industry and uncover insights on how different features and performance metrics impact one another to guarantee a movie's success. The movies dataset also helps you make informed decisions about which features are key indicators in setting up a high-grossing feature film

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

To get the most out of this data set you need to understand what each column in it represents. The ‘Title’ column gives you the title of the movie which can be used for further search or exploration on popular streaming services and websites that are dedicated to providing detailed information about movies. The ‘MPAA Rating’ lists any Motion Picture Association (MPAA) rating for a movie which consists of G (General Audiences), PG (Parental Guidance Suggested), PG-13 (Parents Strongly Cautioned), R (Under 17 Requires Accompanying Parent or Guardian) etc. The 'Budget' column give you an approximate idea about how much a particular production cost while the 'Gross' columns depicts its earnings if it was released in theaters while its successor 'Release Date' reveals when each film has been released or is going to release in future. The columns 'Genre', 'Runtime', and ‘Rating Count’ cover subje​cts such as what type of movie is it? Every genre will have an associated runtime limit along with rating count which refers to number people who have rated/reviewed a particular flick whether on IMDB or other streaming services as well as paper mediums like newspapers . Last but not least summary field states an overview of what we can expect from film so take this in account before watching anything especially if include children members in your family.

So go ahead - start exploring this interesting dataset today!

Research Ideas

  • Creating a box office prediction model using budget, genre, release date and MPAA rating
  • Using the summary data to create a sentiment analysis tool for movie reviews
  • Building a recommendation engine for users based on their prior ratings and what other users with similar tastes have rated as highly

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: movies.csv | Column name | Description | |:-----------------|:-------------------------------------------------------------------------------| | Title | The title of the movie. (String) | | MPAA Rating | The Motion Picture Association of America (MPAA) rating of the movie. (String) | | Budget | The budget of the movie in US dollars. (Integer) | | Gross | The gross revenue of the movie in US dollars. (Integer) | | Release Date | The date the movie was released. (Date) | | Genre | The genre of the movie. (String) | | Runtime | The length of the movie in minutes. (Integer) | | Rating Count | The number of ratings the movie has received. (Integer) | | Summary | A brief summary of the movie. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Yashwanth Sharaff.

Search
Clear search
Close search
Google apps
Main menu