https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description: This dataset provides comprehensive movie statistics compiled from multiple sources, including Wikipedia, The Numbers, and IMDb. It offers a rich collection of information and insights into various aspects of movies, such as movie titles, production dates, genres, runtime minutes, director information, average ratings, number of votes, approval index, production budgets, domestic gross earnings, and worldwide gross earnings.
The dataset combines data scraped from Wikipedia, which includes details about movie titles, production dates, genres, runtime minutes, and director information, with data from The Numbers, a reliable source for box office statistics. Additionally, IMDb data is integrated to provide information on average ratings, number of votes, and other movie-related attributes.
With this dataset, users can analyze and explore trends in the film industry, assess the financial success of movies, identify popular genres, and investigate the relationship between average ratings and box office performance. Researchers, movie enthusiasts, and data analysts can leverage this dataset for various purposes, including data visualization, predictive modeling, and deeper understanding of the movie landscape.
Features: - Movie_title - Production_date - Genres - Runtime_minutes - Director_name (primaryName) - Director_professions (primaryProfession) - Director_birthYear - Director_deathYear - Movie_averageRating : refers to the average rating given by online users for a particular movie - Movie_numberOfVotes : refers to the number of votes given by online users for a particular movie - Approval_Index :is a normalized indicator (on scale 0-10) calculated by multiplying the logarithm of the number of votes by the average users rating. It provides a concise measure of a movie's overall popularity and approval among online viewers, penalizing both films that got too few reviews and blockbusters that got too many. - Production_budget ( $) - Domestic_gross ($) - Worldwide_gross ($)
Potential Applications:
Box office analysis: Analyze the relationship between production budgets, domestic and worldwide gross earnings, and profitability. Genre analysis: Identify the most popular genres based on movie counts and analyze their performance. Rating analysis: Explore the relationship between average ratings, number of votes, and financial success. Director analysis: Investigate the impact of directors on movie ratings and financial performance. Time-based analysis: Study movie trends over different production years and observe changes in production budgets, box office earnings, and genre preferences. By utilizing this dataset, users can gain valuable insights into the movie industry and uncover patterns that can inform decision-making, market research, and creative strategies.
https://data.gov.tw/licensehttps://data.gov.tw/license
This dataset provides national theater box office statistics for films distributed by the Administrative Institution National Film and Audiovisual Culture Center. The data is up to the last Sunday before the announcement date and does not include films that have not been screened for less than 7 calendar days. The earliest CSV format data in this dataset begins on July 30, 2018, and the earliest JSON format data begins on March 1, 2020. JSON format queries require entering the start and end dates (in the format of year, month, and day), and can provide data for a maximum of 90 days at a time.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Between 1995 and 2025, a movie based on comics or graphic novels grossed, on average, about 88.36 million U.S. dollars across the United States and Canada – collectively known as the North American box office. Spin-offs followed as the second-most commercially successful film source material, with average box office revenue of around 86.32 million dollars.
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Dataset Card for "rotten_tomatoes"
Dataset Summary
Movie Review Dataset. This is a dataset of containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. This data was first used in Bo Pang and Lillian Lee, ``Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales.'', Proceedings of the ACL, 2005.
Supported Tasks and Leaderboards
More Information Needed
Languages… See the full description on the dataset page: https://huggingface.co/datasets/cornell-movie-review-data/rotten_tomatoes.
This data set was scraped from the site https://www.the-numbers.com/ using Python 3. it has data of more than 13k movies - and contains monetary data (Domestic Box Office, Infl. Adj. Dom. BO, Opening Weekend, and more) as well as "creative" cinema data (Comparisons, Creative Type, Genre, and more). The complete scraping code I wrote to create the data set is available in my profile: https://www.kaggle.com/code/mayasoffer/movies-data-scraper
Please note, that the data was scraped fully from the "The-numbers" website, therefore: - There is some missing data in accordance with the missing data on the site. - The scraping was committed on 01.03.22 (March 2022) so all the data is true to that time. - For more data on how the columns were created and where the site got that data initially, please look into the site itself. - Lastly, note that I scraped the data and saved it as CSV. however, all the columns were scraped in their original form - how they were written on the website. so some "cleaning" of the columns is necessary before any analysis can take place.
The data is very diverse and contains a lot of different columns and goes back to 1995. so the analysis options are many. here are a few analysis leads I thought about: - How have genres changed throughout the years? what genres are the most popular throughout the years? (revenue-wise, legs, opening week...). new genres that gained popularity (animation for example) - Does MPAA rating impact revenue? and much more...
Thank you for using my dataset!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Using data from Polly sourced from an independent sample of 2,950,625 people from Twitter, Reddit and TikTok worldwide from March 13, 2023, to March 13, 2024, we delved deeper into what people really think about the state of the film industry. USA aka Hollywood (68%) overwhelmingly leads over India aka Bollywood (5.8%) followed by Italy (5.6%), Japan (5%), South Korea (4.1%), France (35%), Nigeria aka Nollywood (29%) then China (1.1%) engagement. This report has a breakdown by gender, age and worldwide region.
Between 1995 and 2024, PG-13-rated movies grossed approximately 126.64 billion U.S. dollars at the North American box office – a term that excludes Mexico and includes Canada and the United States. R-rated and PG-rated films grossed around 69.28 billion and 56.04 billion dollars, respectively.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Title: 9,565 Top-Rated Movies Dataset
Description:
This dataset offers a comprehensive collection of 9,565 of the highest-rated movies according to audience ratings on the Movie Database (TMDb). The dataset includes detailed information about each movie, such as its title, overview, release date, popularity score, average vote, and vote count. It is designed to be a valuable resource for anyone interested in exploring trends in popular cinema, analyzing factors that contribute to a movie’s success, or building recommendation engines.
Key Features:
- Title: The official title of each movie.
- Overview: A brief synopsis or description of the movie's plot.
- Release Date: The release date of the movie, formatted as YYYY-MM-DD
.
- Popularity: A score indicating the current popularity of the movie on TMDb, which can be used to gauge current interest.
- Vote Average: The average rating of the movie, based on user votes.
- Vote Count: The total number of votes the movie has received.
Data Source:
The data was sourced from the TMDb API, a well-regarded platform for movie information, using the /movie/top_rated
endpoint. The dataset represents a snapshot of the highest-rated movies as of the time of data collection.
Data Collection Process:
- API Access: Data was retrieved programmatically using TMDb’s API.
- Pagination Handling: Multiple API requests were made to cover all pages of top-rated movies, ensuring the dataset’s comprehensiveness.
- Data Aggregation: Collected data was aggregated into a single, unified dataset using the pandas
library.
- Cleaning: Basic data cleaning was performed to remove duplicates and handle missing or malformed data entries.
Potential Uses: - Trend Analysis: Analyze trends in movie ratings over time or compare ratings across different genres. - Recommendation Systems: Build and train models to recommend movies based on user preferences. - Sentiment Analysis: Perform text analysis on movie overviews to understand common themes and sentiments. - Statistical Analysis: Explore the relationship between popularity, vote count, and average ratings.
Data Format: The dataset is provided in a structured tabular format (e.g., CSV), making it easy to load into data analysis tools like Python, R, or Excel.
Usage License: The dataset is shared under [appropriate license], ensuring that it can be used for educational, research, or commercial purposes, with proper attribution to the data source (TMDb).
This description provides a clear and detailed overview, helping potential users understand the dataset's content, origin, and potential applications.
This statistic shows the box office revenue of CGI, 3D and animated movies in the United States from 2008 to 2018. According to RenderThat, the total revenue in the U.S. for all movies containing CGI (computer-generated imagery), animation and 3D effects amounted to **** billion U.S. dollars in 2018.
The summary statistics by North American Industry Classification System (NAICS) which include: operating revenue (dollars x 1,000,000), operating expenses (dollars x 1,000,000), salaries wages and benefits (dollars x 1,000,000), and operating profit margin (by percent), of motion picture and video production (NAICS 512110), annual, for five years of data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Transformed, cleaned dataset with reduced number of columns for all 45,000 movies listed in the full MovieLens dataset of movies released in July 2017 or earlier. Data points include movie ID, title, budget, languages, and genres. This dataset also includes 26 million ratings from 270,000 users for all 45,000 movies. Ratings are given on a scale of 1 to 5 and include user ID, movie ID, rating, and timestamp.
This dataset consists of the following files:
* movies.csv: The main movie metadata file. Contains information on 45,000 movies included in the full MovieLens dataset.
* ratings.csv: The full MovieLens dataset with 26 million ratings and 750,000 tag applications from 270,000 users on all 45,000 movies in this dataset.
This dataset is a further development of the following public domain dataset published on Kaggle:
https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset
This data was obtained from the official GroupLens website. The data was originally obtained from The Movies DataBase (TMDB) via the TMDB AP
http://researchdatafinder.qut.edu.au/display/n15252http://researchdatafinder.qut.edu.au/display/n15252
This file contains the features for the test portion of the movie dataset. The data has been changed into an average word vector. This is 50% of the total movie results. QUT Research Data Respository Dataset Resource available for download
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
We provide a high-quality Rotten Tomatoes movie dataset that includes key metadata for thousands of movies. This dataset is ideal for anyone working with movie-related platforms, entertainment analytics, content curation, or movie discovery tools.
Our collection is structured, clean, and designed to support real-time apps, dashboards, and research use cases.
Each record in the dataset contains core information pulled directly from Rotten Tomatoes, including:
Movie Name – The official title of the movie.
Poster URL – High-resolution image link to the movie poster.
Trailer URL – Direct link to the official trailer (when available).
Genre – One or more genres associated with the movie, such as Action, Drama, Comedy, or Horror.
Release Date – The date the movie was released to the public.
Actors – Main cast members listed on Rotten Tomatoes.
Directors – Director(s) responsible for the movie.
Rating – Audience or critic scores, where available.
This dataset spans a wide range of movies across all major genres and decades. From modern releases to timeless classics, from Hollywood blockbusters to independent films — we’ve included movies of all types with relevant data points.
You can expect data on:
U.S. theatrical releases
Netflix, Amazon, and other streaming exclusives
Festival films and limited releases
Animated and documentary films
Here are just a few ways this dataset can be useful:
Movie Recommendation Engines – Use metadata and genre info to power personalized movie suggestions.
Entertainment Search Tools – Build searchable movie listings with visual poster previews and trailer links.
Data Visualization Projects – Create dashboards showing trends by genre, release periods, or actor participation.
AI/ML Training – Use metadata to train classification models or sentiment prediction tools.
Research & Academic Use – Analyze patterns in movie releases, cast dynamics, and genre evolution.
Clean & ready-to-use: No raw HTML, just clean structured data.
Minimal but meaningful fields: Focused on useful movie attributes without clutter.
Updated info: Covers both classic and current titles.
Simple integration: Easy to use for developers, analysts, and product teams.
If you're working on a movie-based product or looking for reliable film metadata for your project, this dataset offers an ideal foundation.
Let us know if you’d like to explore it further.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The data was scraped from Box Office Mojo. This data set includes all-time worldwide box office collections from 2010 to 2022.
Web scrape source: https://github.com/Somnath4/Python-Web-Sraping
Photo by Kilyan Sockalingum on Unsplash Photo link: https://unsplash.com/photos/nW1n9eNHOsc
About Box Office Mojo: Box office mojo is a website that provides box office collection data for movies. It is a valuable resource for movie studios, producers, and film fans alike. The website allows users to view box office collection data for movies released in the US and around the world. It also provides data on movie budgets and box office performance compared to the budget. Users can access this data by browsing through the website or using the search function to find specific movies. The website is updated regularly, so users can always stay up-to-date on the latest box office collection data.
About the data set: The data set, which contains a worldwide box office collection, includes information on the top 200 grossing films of each year from 2010 to 2022. The data consists the title of the film, its worldwide box office collection, domestic box office collection, the percentage of domestic box office collection, foreign box office collection, and the percentage of foreign box office collection. This data can be beneficial for analyzing trends in the film industry, understanding the performance of different films, and predicting future box office success.
According to a survey led in several markets all around the world in January 2025, more than half of respondents across all age brackets wanted to see more action and adventure movies. While younger consumers would like to see more horror movies in theaters, older viewers were hoping to see more dramas.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
+9000 Movie Dataset
Overview
This dataset is sourced from Kaggle and has been granted CC0 1.0 Universal (CC0 1.0) Public Domain Dedication by the original author. This means you can copy, modify, distribute, and perform the work, even for commercial purposes, all without asking permission. I would like to express our gratitude to the original author for their contribution to the data community.
License
This dataset is released under the CC0 1.0 Universal… See the full description on the dataset page: https://huggingface.co/datasets/Pablinho/movies-dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Using a Python script to scrape data from the web, we collected data pertaining to all 1698 Hindi language movies that released in India across a 13 year period (2005-2017) from the website of Box Office India.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Explore our meticulously curated Movies dataset and TV shows dataset, designed to cater to diverse analytical and research needs. Whether you're a data scientist, a student, or a business professional, these datasets provide valuable insights into the entertainment industry.
Extensive collection of global movies across various genres and languages.
Detailed metadata, including titles, release dates, genres, directors, cast, and ratings.
Regularly updated to ensure relevance and accuracy.
Our TV shows dataset is your gateway to understanding trends in episodic content. It includes:
Comprehensive details about popular and niche TV shows.
Information on episode counts, seasons, ratings, and networks.
Insights into audience preferences and regional programming.
These datasets are perfect for:
Machine learning models for recommendation systems.
Academic research on media trends and audience behavior.
Business strategies for entertainment platforms.
Unlock the power of TV show data with our Crawl Feeds TV Shows Dataset. Start analyzing today and gain valuable insights into your favorite shows!
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Dataset of movie plot summaries and associated metadata. This data was collected by David Bamman, Brendan O'Connor, and Noah Smith at the Language Technologies Institute and Machine Learning Department at Carnegie Mellon University.
Plot summaries of 42,306 movies extracted from the November 2, 2012 dump of English-language Wikipedia. Each line contains the Wikipedia movie ID (which indexes into movie.metadata.tsv) followed by the summary.
Metadata for 81,741 movies, extracted from the Noverber 4, 2012 dump of Freebase. Tab-separated; columns: - Wikipedia movie ID - Freebase movie ID - Movie name - Movie release date - Movie box office revenue - Movie runtime - Movie languages (Freebase ID:name tuples) - Movie countries (Freebase ID:name tuples) - Movie genres (Freebase ID:name tuples)
Metadata for 450,669 characters aligned to the movies above, extracted from the Noverber 4, 2012 dump of Freebase. Tab-separated; columns:
72 character types drawn from tvtropes.com, along with 501 instances of those types. The ID field indexes into the Freebase character/actor map ID in character.metadata.tsv.
970 unique character names used in at least two different movies, along with 2,666 instances of those types. The ID field indexes into the Freebase character/actor map ID in character.metadata.tsv.
This research was supported in part by U.S. National Science Foundation grant IIS-0915187.
All data is released under a Creative Commons Attribution-ShareAlike License. For questions or comments, please contact David Bamman (dbamman@cs.cmu.edu).
Foto von Jakob Owens auf Unsplash
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description: This dataset provides comprehensive movie statistics compiled from multiple sources, including Wikipedia, The Numbers, and IMDb. It offers a rich collection of information and insights into various aspects of movies, such as movie titles, production dates, genres, runtime minutes, director information, average ratings, number of votes, approval index, production budgets, domestic gross earnings, and worldwide gross earnings.
The dataset combines data scraped from Wikipedia, which includes details about movie titles, production dates, genres, runtime minutes, and director information, with data from The Numbers, a reliable source for box office statistics. Additionally, IMDb data is integrated to provide information on average ratings, number of votes, and other movie-related attributes.
With this dataset, users can analyze and explore trends in the film industry, assess the financial success of movies, identify popular genres, and investigate the relationship between average ratings and box office performance. Researchers, movie enthusiasts, and data analysts can leverage this dataset for various purposes, including data visualization, predictive modeling, and deeper understanding of the movie landscape.
Features: - Movie_title - Production_date - Genres - Runtime_minutes - Director_name (primaryName) - Director_professions (primaryProfession) - Director_birthYear - Director_deathYear - Movie_averageRating : refers to the average rating given by online users for a particular movie - Movie_numberOfVotes : refers to the number of votes given by online users for a particular movie - Approval_Index :is a normalized indicator (on scale 0-10) calculated by multiplying the logarithm of the number of votes by the average users rating. It provides a concise measure of a movie's overall popularity and approval among online viewers, penalizing both films that got too few reviews and blockbusters that got too many. - Production_budget ( $) - Domestic_gross ($) - Worldwide_gross ($)
Potential Applications:
Box office analysis: Analyze the relationship between production budgets, domestic and worldwide gross earnings, and profitability. Genre analysis: Identify the most popular genres based on movie counts and analyze their performance. Rating analysis: Explore the relationship between average ratings, number of votes, and financial success. Director analysis: Investigate the impact of directors on movie ratings and financial performance. Time-based analysis: Study movie trends over different production years and observe changes in production budgets, box office earnings, and genre preferences. By utilizing this dataset, users can gain valuable insights into the movie industry and uncover patterns that can inform decision-making, market research, and creative strategies.