Facebook
TwitterBy Yashwanth Sharaff [source]
This dataset contains essential characteristics of a variety of movies, including basic pieces of information such as the movie's title and budget, as well as performance indicators like the movie's MPAA rating, gross revenue, release date, genre, runtime, rating count and summary. With this data set we can better understand the film industry and uncover insights on how different features and performance metrics impact one another to guarantee a movie's success. The movies dataset also helps you make informed decisions about which features are key indicators in setting up a high-grossing feature film
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
To get the most out of this data set you need to understand what each column in it represents. The ‘Title’ column gives you the title of the movie which can be used for further search or exploration on popular streaming services and websites that are dedicated to providing detailed information about movies. The ‘MPAA Rating’ lists any Motion Picture Association (MPAA) rating for a movie which consists of G (General Audiences), PG (Parental Guidance Suggested), PG-13 (Parents Strongly Cautioned), R (Under 17 Requires Accompanying Parent or Guardian) etc. The 'Budget' column give you an approximate idea about how much a particular production cost while the 'Gross' columns depicts its earnings if it was released in theaters while its successor 'Release Date' reveals when each film has been released or is going to release in future. The columns 'Genre', 'Runtime', and ‘Rating Count’ cover subjects such as what type of movie is it? Every genre will have an associated runtime limit along with rating count which refers to number people who have rated/reviewed a particular flick whether on IMDB or other streaming services as well as paper mediums like newspapers . Last but not least summary field states an overview of what we can expect from film so take this in account before watching anything especially if include children members in your family.
So go ahead - start exploring this interesting dataset today!
- Creating a box office prediction model using budget, genre, release date and MPAA rating
- Using the summary data to create a sentiment analysis tool for movie reviews
- Building a recommendation engine for users based on their prior ratings and what other users with similar tastes have rated as highly
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: movies.csv | Column name | Description | |:-----------------|:-------------------------------------------------------------------------------| | Title | The title of the movie. (String) | | MPAA Rating | The Motion Picture Association of America (MPAA) rating of the movie. (String) | | Budget | The budget of the movie in US dollars. (Integer) | | Gross | The gross revenue of the movie in US dollars. (Integer) | | Release Date | The date the movie was released. (Date) | | Genre | The genre of the movie. (String) | | Runtime | The length of the movie in minutes. (Integer) | | Rating Count | The number of ratings the movie has received. (Integer) | | Summary | A brief summary of the movie. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Yashwanth Sharaff.
Facebook
TwitterBy Throwback Thursday [source]
This dataset contains genre statistics for movies released between 1995 and 2018. It provides information on various aspects of the movies, such as gross revenue, tickets sold, and inflation-adjusted figures. The dataset includes columns for genre, year of release, number of movies released in each genre and year, total gross revenue generated by movies in each genre and year, total number of tickets sold for movies in each genre and year, inflation-adjusted gross revenue that takes into account changes in the value of money over time, title of the highest-grossing movie in each genre and year, gross revenue generated by the highest-grossing movie in each genre and year, and inflation-adjusted gross revenue of the highest-grossing movie in each genre and year. This dataset offers insights into film industry trends over a span of more than two decades
Understanding the Columns
Before diving into the analysis, let's familiarize ourselves with the different columns in this dataset:
- Genre: This column represents the genre of each movie.
- Year: The year in which the movies were released.
- Movies Released: The number of movies released in a particular genre and year.
- Gross: The total gross revenue generated by movies in a specific genre and year.
- Tickets Sold: The total number of tickets sold for movies in a specific genre and year.
- Inflation-Adjusted Gross: The gross revenue adjusted for inflation, taking into account changes in the value of money over time.
- Top Movie: The title of the highest-grossing movie in a specific genre and year.
- Top Movie Gross (That Year): The gross revenue generated by the highest-grossing movie in a specific genre and year.
- Top Movie Inflation-Adjusted Gross (That Year): The inflation-adjusted gross revenue of the highest-grossing movie in a specific genre and year.
Analyzing Data
To make use of this dataset effectively, here are some potential analyses you can perform:
Find popular genres: You can determine which genres are popular by looking at columns like Movies Released or Tickets Sold. Analyzing these numbers will give you insights into what types of movies attract more audiences.
Measure financial success: Explore columns like Gross, Inflation Adjusted Gross, or Top Movie Gross (That Year) to compare the financial success of different genres. This will allow you to identify genres that generate higher revenue.
Understand movie trends: By analyzing the dataset over different years, you can observe trends in movie releases and gross revenue for specific genres. This information is crucial for understanding how movie preferences change over time.
Identify highest-grossing movies: The column Top Movie gives you the title of the highest-grossing movie in each genre and year. You can use this information to analyze the success of specific movies within their respective genres.
Data Visualization
To enhance your analysis, consider using data visualization techniques
- Predicting the popularity and success of movies in different genres: By analyzing the data on tickets sold and gross revenue, we can identify trends and patterns in movie genres that attract more audiences and generate higher revenue. This information can be useful for filmmakers, production studios, and investors to make informed decisions about which genres to focus on for future movie releases.
- Comparing the performance of movies over time: With the inclusion of inflation-adjusted figures, this dataset allows us to compare the box office success of movies across different years. We can analyze how movies in specific genres have performed over time in terms of gross revenue and adjust these figures for inflation to get a better understanding of their true financial success.
- Analyzing the impact of genre popularity on ticket sales: By examining the relationship between genre popularity (measured by tickets sold) and total gross revenue, we can gain insights into audience preferences and behavior. This information is valuable for marketing strategies, as it helps determine which movie genres are most likely to attract a larger audience base and generate higher ticket sales
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Movies Dataset from AllMovie is a comprehensive collection featuring over 430,000 records, encompassing a wide range of films across various genres and languages. This extensive dataset includes essential data points such as movie titles, genres, release dates, posters, languages, directors, durations, synopses, trailers, average ratings, cast information, and URLs. Such detailed metadata is invaluable for developers, researchers, and enthusiasts aiming to analyze trends, build recommendation systems, or conduct in-depth studies of the film industry.
For those interested in alternative datasets, the IMDb Non-Commercial Datasets provide subsets of IMDb data accessible for personal and non-commercial use. These datasets allow users to hold local copies of movie information, facilitating various analytical projects.
Additionally, the MovieLens datasets offer a range of movie rating data suitable for research purposes. For instance, the MovieLens 20M dataset comprises 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users, making it a valuable resource for studies in user preferences and recommendation algorithms.
Incorporating these datasets into your projects can significantly enhance the quality and depth of your analyses, providing a solid foundation for exploring various aspects of the cinematic world.
Why Choose Crawl Feeds for Your Data Needs?
Crawl Feeds is your trusted partner in acquiring high-quality, curated datasets tailored to your specific requirements. With a vast repository that includes the Movies Dataset, we empower developers and businesses to drive innovation. Explore our easy-to-use platform and transform your ideas into actionable insights.
Get Started with Crawl Feeds Today
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset collected form TMDB this dataset has 1 file
Data has more then 43000 rows and 8 columns
| Columns name | About data (store in columns) |
|---|---|
| popularity | how much popular this movie |
| release_date | release date of the movie |
| title | title of the movie |
| overview | brief description about movie |
| vote_average | average of the vote (rating) |
| vote_count | how many people vote this movie |
| original_language | what is making language of the movie |
| original_title | what is the original tittle of the movie |
Data collected form TMDBl
Facebook
TwitterThe findings of a survey held in the United States in September 2021 revealed that ** percent of adults aged between 35 and 44 years old said that they watched or streamed movies every day, making respondents in this age group the most likely to do so. By comparison, ** percent of total respondents reported watching movies on a daily basis.
Facebook
TwitterThe summary statistics by North American Industry Classification System (NAICS) which include: operating revenue (dollars x 1,000,000), operating expenses (dollars x 1,000,000), salaries wages and benefits (dollars x 1,000,000), and operating profit margin (by percent), of motion picture and video production (NAICS 512110), annual, for five years of data.
Facebook
TwitterDuring a survey carried out in the United States in April and May 2022, approximately 41 percent of responding internet users said they rarely went to the movies. Roughly one-third stated that they went to see a film in theaters sometimes, while eight percent reported doing it often. Almost one out of five interviewees – 18 percent – said they never went to the movies.
Do wage and age affect moviegoing frequency? According to the same source, little more than one-third of Americans whose household income stood below 50 thousand U.S. dollars reported going to the movies often or sometimes in mid-2022. Meanwhile, more than half of those with an income above 100 thousand dollars said the same. The gap added up to 17 percentage points. There was also a generational gap among cinephiles. About half of respondents aged 18 to 34 stated that they usually went to the movies, whereas little more than one-fourth of consumers aged 65 and over reported doing it.
Regional and gender differences in film viewing The moviegoing frequency also varies across the U.S.'s regions. In the Northeast, for example, the share of interviewees saying they went to see a film in theaters either sometimes or often amounted to 45 percent. Within the Midwest, more than 60 percent of respondents in the South said they rarely or never went to the movies as of May 2022. Furthermore, nearly half of American male adults surveyed stated that they visited a movie theater often or sometimes, while little more than one-third of women said the same.
Facebook
TwitterAccording to a study held in July 2023 in the United States about AI use cases in TV and film industries, ** percent of respondents supported the use of such technology to create special effects or to alter actors' appearances. It was the most supported AI use case, while generating voices for animated characters came in second position with ** percent. The majority of entertainment-industry professionals agreed that the role AI was a point of contention for the ongoing writers strike.
Facebook
Twitterhttps://data.gov.tw/licensehttps://data.gov.tw/license
This dataset provides national theater box office statistics for films distributed by the Administrative Institution National Film and Audiovisual Culture Center. The data is up to the last Sunday before the announcement date and does not include films that have not been screened for less than 7 calendar days. The earliest CSV format data in this dataset begins on July 30, 2018, and the earliest JSON format data begins on March 1, 2020. JSON format queries require entering the start and end dates (in the format of year, month, and day), and can provide data for a maximum of 90 days at a time.
Facebook
Twitterhttp://researchdatafinder.qut.edu.au/display/n15252http://researchdatafinder.qut.edu.au/display/n15252
This file contains the features for the test portion of the movie dataset. The data has been changed into an average word vector. This is 50% of the total movie results. QUT Research Data Respository Dataset Resource available for download
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Unlock one of the most comprehensive movie datasets available—4.5 million structured IMDb movie records, extracted and enriched for data science, machine learning, and entertainment research.
This dataset includes a vast collection of global movie metadata, including details on title, release year, genre, country, language, runtime, cast, directors, IMDb ratings, reviews, and synopsis. Whether you're building a recommendation engine, benchmarking trends, or training AI models, this dataset is designed to give you deep and wide access to cinematic data across decades and continents.
Perfect for use in film analytics, OTT platforms, review sentiment analysis, knowledge graphs, and LLM fine-tuning, the dataset is cleaned, normalized, and exportable in multiple formats.
Genres: Drama, Comedy, Horror, Action, Sci-Fi, Documentary, and more
Train LLMs or chatbots on cinematic language and metadata
Build or enrich movie recommendation engines
Run cross-lingual or multi-region film analytics
Benchmark genre popularity across time periods
Power academic studies or entertainment dashboards
Feed into knowledge graphs, search engines, or NLP pipelines
Facebook
TwitterIn 2024, a total of 569 movies were released in the United States and Canada, up from 506 in the previous year. Still, these figures are under the 792 titles released in 2019, before the COVID-19 outbreak. Will moviegoers return? The box office revenue in the U.S. and Canada more than tripled between 2020 and 2022, when it reached almost 7.4 billion U.S. dollars. The 2022 result still fell way behind the 11.3-billion-dollar annual revenue recorded just before the pandemic. But there are ways to attract newcomers to the moviegoing experience. During a mid-2022 survey conducted among members of the Generation Z – aged between 13 and 24 years – more than half of respondents mentioned movie offering as a leading motivation to go to the movies. About 40 percent of interviewees included the quality of the service and the physical comfort of the seats at the movie theater among their main incentives. Cinema circuits As the industry tries to reinvent itself for a post-pandemic scenario, the top movie theater chains in North America slowly bounce back. Their financial results improved since the coronavirus outbreak, but when or if they will see figures similar to those recorded before 2020 remains an open question. The leading circuit, AMC Theatres, reported a revenue of more than 2.5 billion dollars in 2021, over twice as much as in the previous year.
Facebook
TwitterAccording to a survey led in several markets all around the world in January 2025, more than half of respondents across all age brackets wanted to see more action and adventure movies. While younger consumers would like to see more horror movies in theaters, older viewers were hoping to see more dramas.
Facebook
TwitterThe most common way in which Americans and Canadians discover new movies and TV shows is through word of mouth or from friends, with *****percent of respondents to a survey in the fourth quarter of 2024. Getting to know new content via news articles or stories outside social media was less common, according to around *****percent of people interviewed.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
For details about the scraping process, visit the code repository on GitHub.
The final_data.csv file is a consolidated dataset combining data for the most popular 500–600 movies per year from 1920 to 2025, extracted from IMDb. This dataset aggregates all the yearly merged_movies_data_[year].csv files into a comprehensive CSV file for streamlined analysis.
The final_data.csv file includes:
- Basic movie details: id, title, year, duration, MPA, rating, votes, meta_score, description, Movie_Link.
- Financial data: budget, opening_weekend_gross, gross_worldwide, gross_us_canada.
- Credits: directors, writers, stars.
- Additional details: genres, countries_origin, filming_locations, production_companies, languages.
- Awards: awards_content (wins, nominations, Oscars).
- Release info: release_date.
Columns:
id,title,year,duration,MPA,rating,votes,meta_score,description,Movie_Link,writers,directors,stars,budget,opening_weekend_gross,gross_worldwide,gross_us_canada,release_date,countries_origin,filming_locations,production_companies,awards_content,genres,languages
The final_data.csv file is updated annually in December to reflect the most recent data additions and corrections.
This dataset is ideal for:
- Longitudinal Analysis: Studying trends in movie production, popularity, and financial performance over a century.
- Predictive Analytics: Building models to forecast box office performance or award outcomes.
- Recommender Systems: Leveraging attributes like genres, cast, and ratings for personalized recommendations.
- Comparative Studies: Comparing cinematic trends across different eras, regions, or genres.
Please feel free to contact me for more features, errors in the data, suggestions, and enhancements.
Feel free to contact me by mail or open an issue on GitHub.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Using a Python script to scrape data from the web, we collected data pertaining to all 1698 Hindi language movies that released in India across a 13 year period (2005-2017) from the website of Box Office India.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We collected votes (from 1 to 10 stars) for all movies, excluding TV episodes (total number of 336,090,882 votes for 300,723 movies), from March 19 to 28, 2013 (set # 1). Using the same list of movies, we collected the number of votes again from December 8 to 18, 2014 (set #2, 465,292,451 votes) and from January 5 to 10, 2015 (set # 3, 471,222,420), as shown in (Fig 10). For budgets, we use a new list and collected data from February 5 to 8, 2015. Results with fewer than 5 votes (in 2013) are not exhibited. Number of items by type: 33,941 (Documentary) 133,775 (Feature Film) 3,172 (Mini-Series) 50,408 (Short Film) 1,071 (TV Episode) 25,168 (TV Movie) 33,165 (TV Series) 2,450 (TV Special) 12,120 (Video) 5,453 (Video Game) By genre: 24,911 (Action); 93 (Adult); 15,651 (Adventure); 18,918 (Animation); 5,385 (Biography); 74,393 (Comedy); 18,693 (Crime); 37,250 (Documentary); 97,087 (Drama); 16,022 (Family); 8,677 (Fantasy); 567 (Film Noir); 1,575 (Game Show); 5,525 (History); 15,072 (Horror); 10,212 (Music); 5,840 (Musical); 8,170 (Mystery); 1,036 (News); 3,605 (Reality TV); 21,165 (Romance); 8,239 (Sci-Fi); 61,538 (Short); 4,360 (Sport); 1,467 (Talk Show); 16,246 (Thriller); 5,080 (War); 4,549 (Western). An item could be defined by more the one genre. As a final observation, it is possible for a user to remove his or her vote; as a consequence, a small fraction of movies have a decreasing number of votes. However, this represents a negligible fraction of the movies. We used the following list: http://www.imdb.com/search/title?title_type=feature,tv_movie,tv_series,tv_special,mini_series,documentary,game,short,video,unknown&user_rating=1.0,10. (ZIP)
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Letterboxd Film Dataset
This dataset contains a comprehensive collection of 847,209 films from the Letterboxd platform, including movie information, user reviews, and ratings.
Dataset Summary
Total Films: 847,209 File Size: ~1.12 GB (1,120,572,122 bytes) Format: JSONL (JSON Lines) Language: Primarily English, with some multilingual content
Data Structure
Each line contains a JSON object with the following fields: { "url":… See the full description on the dataset page: https://huggingface.co/datasets/pkchwy/letterboxd-all-movie-data.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
The TMDb (The Movie Database) is a comprehensive movie database that provides information about movies, including details like titles, ratings, release dates, revenue, genres, and much more.
This dataset contains a collection of 1,000,000 movies from the TMDB database.
Dataset is updated daily. If you find this dataset valuable, don't forget to hit the upvote button! 😊💝
Clash of Clans Clans Dataset 2023 (3.5M Clans)
Black-White Wage Gap in the USA Dataset
USA Unemployment Rates by Demographics & Race
Photo by Onur Binay on Unsplash
Facebook
TwitterBy Yashwanth Sharaff [source]
This dataset contains essential characteristics of a variety of movies, including basic pieces of information such as the movie's title and budget, as well as performance indicators like the movie's MPAA rating, gross revenue, release date, genre, runtime, rating count and summary. With this data set we can better understand the film industry and uncover insights on how different features and performance metrics impact one another to guarantee a movie's success. The movies dataset also helps you make informed decisions about which features are key indicators in setting up a high-grossing feature film
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
To get the most out of this data set you need to understand what each column in it represents. The ‘Title’ column gives you the title of the movie which can be used for further search or exploration on popular streaming services and websites that are dedicated to providing detailed information about movies. The ‘MPAA Rating’ lists any Motion Picture Association (MPAA) rating for a movie which consists of G (General Audiences), PG (Parental Guidance Suggested), PG-13 (Parents Strongly Cautioned), R (Under 17 Requires Accompanying Parent or Guardian) etc. The 'Budget' column give you an approximate idea about how much a particular production cost while the 'Gross' columns depicts its earnings if it was released in theaters while its successor 'Release Date' reveals when each film has been released or is going to release in future. The columns 'Genre', 'Runtime', and ‘Rating Count’ cover subjects such as what type of movie is it? Every genre will have an associated runtime limit along with rating count which refers to number people who have rated/reviewed a particular flick whether on IMDB or other streaming services as well as paper mediums like newspapers . Last but not least summary field states an overview of what we can expect from film so take this in account before watching anything especially if include children members in your family.
So go ahead - start exploring this interesting dataset today!
- Creating a box office prediction model using budget, genre, release date and MPAA rating
- Using the summary data to create a sentiment analysis tool for movie reviews
- Building a recommendation engine for users based on their prior ratings and what other users with similar tastes have rated as highly
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: movies.csv | Column name | Description | |:-----------------|:-------------------------------------------------------------------------------| | Title | The title of the movie. (String) | | MPAA Rating | The Motion Picture Association of America (MPAA) rating of the movie. (String) | | Budget | The budget of the movie in US dollars. (Integer) | | Gross | The gross revenue of the movie in US dollars. (Integer) | | Release Date | The date the movie was released. (Date) | | Genre | The genre of the movie. (String) | | Runtime | The length of the movie in minutes. (Integer) | | Rating Count | The number of ratings the movie has received. (Integer) | | Summary | A brief summary of the movie. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Yashwanth Sharaff.