Facebook
TwitterBy Throwback Thursday [source]
This dataset contains genre statistics for movies released between 1995 and 2018. It provides information on various aspects of the movies, such as gross revenue, tickets sold, and inflation-adjusted figures. The dataset includes columns for genre, year of release, number of movies released in each genre and year, total gross revenue generated by movies in each genre and year, total number of tickets sold for movies in each genre and year, inflation-adjusted gross revenue that takes into account changes in the value of money over time, title of the highest-grossing movie in each genre and year, gross revenue generated by the highest-grossing movie in each genre and year, and inflation-adjusted gross revenue of the highest-grossing movie in each genre and year. This dataset offers insights into film industry trends over a span of more than two decades
Understanding the Columns
Before diving into the analysis, let's familiarize ourselves with the different columns in this dataset:
- Genre: This column represents the genre of each movie.
- Year: The year in which the movies were released.
- Movies Released: The number of movies released in a particular genre and year.
- Gross: The total gross revenue generated by movies in a specific genre and year.
- Tickets Sold: The total number of tickets sold for movies in a specific genre and year.
- Inflation-Adjusted Gross: The gross revenue adjusted for inflation, taking into account changes in the value of money over time.
- Top Movie: The title of the highest-grossing movie in a specific genre and year.
- Top Movie Gross (That Year): The gross revenue generated by the highest-grossing movie in a specific genre and year.
- Top Movie Inflation-Adjusted Gross (That Year): The inflation-adjusted gross revenue of the highest-grossing movie in a specific genre and year.
Analyzing Data
To make use of this dataset effectively, here are some potential analyses you can perform:
Find popular genres: You can determine which genres are popular by looking at columns like Movies Released or Tickets Sold. Analyzing these numbers will give you insights into what types of movies attract more audiences.
Measure financial success: Explore columns like Gross, Inflation Adjusted Gross, or Top Movie Gross (That Year) to compare the financial success of different genres. This will allow you to identify genres that generate higher revenue.
Understand movie trends: By analyzing the dataset over different years, you can observe trends in movie releases and gross revenue for specific genres. This information is crucial for understanding how movie preferences change over time.
Identify highest-grossing movies: The column Top Movie gives you the title of the highest-grossing movie in each genre and year. You can use this information to analyze the success of specific movies within their respective genres.
Data Visualization
To enhance your analysis, consider using data visualization techniques
- Predicting the popularity and success of movies in different genres: By analyzing the data on tickets sold and gross revenue, we can identify trends and patterns in movie genres that attract more audiences and generate higher revenue. This information can be useful for filmmakers, production studios, and investors to make informed decisions about which genres to focus on for future movie releases.
- Comparing the performance of movies over time: With the inclusion of inflation-adjusted figures, this dataset allows us to compare the box office success of movies across different years. We can analyze how movies in specific genres have performed over time in terms of gross revenue and adjust these figures for inflation to get a better understanding of their true financial success.
- Analyzing the impact of genre popularity on ticket sales: By examining the relationship between genre popularity (measured by tickets sold) and total gross revenue, we can gain insights into audience preferences and behavior. This information is valuable for marketing strategies, as it helps determine which movie genres are most likely to attract a larger audience base and generate higher ticket sales
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description: This dataset provides comprehensive movie statistics compiled from multiple sources, including Wikipedia, The Numbers, and IMDb. It offers a rich collection of information and insights into various aspects of movies, such as movie titles, production dates, genres, runtime minutes, director information, average ratings, number of votes, approval index, production budgets, domestic gross earnings, and worldwide gross earnings.
The dataset combines data scraped from Wikipedia, which includes details about movie titles, production dates, genres, runtime minutes, and director information, with data from The Numbers, a reliable source for box office statistics. Additionally, IMDb data is integrated to provide information on average ratings, number of votes, and other movie-related attributes.
With this dataset, users can analyze and explore trends in the film industry, assess the financial success of movies, identify popular genres, and investigate the relationship between average ratings and box office performance. Researchers, movie enthusiasts, and data analysts can leverage this dataset for various purposes, including data visualization, predictive modeling, and deeper understanding of the movie landscape.
Features: - Movie_title - Production_date - Genres - Runtime_minutes - Director_name (primaryName) - Director_professions (primaryProfession) - Director_birthYear - Director_deathYear - Movie_averageRating : refers to the average rating given by online users for a particular movie - Movie_numberOfVotes : refers to the number of votes given by online users for a particular movie - Approval_Index :is a normalized indicator (on scale 0-10) calculated by multiplying the logarithm of the number of votes by the average users rating. It provides a concise measure of a movie's overall popularity and approval among online viewers, penalizing both films that got too few reviews and blockbusters that got too many. - Production_budget ( $) - Domestic_gross ($) - Worldwide_gross ($)
Potential Applications:
Box office analysis: Analyze the relationship between production budgets, domestic and worldwide gross earnings, and profitability. Genre analysis: Identify the most popular genres based on movie counts and analyze their performance. Rating analysis: Explore the relationship between average ratings, number of votes, and financial success. Director analysis: Investigate the impact of directors on movie ratings and financial performance. Time-based analysis: Study movie trends over different production years and observe changes in production budgets, box office earnings, and genre preferences. By utilizing this dataset, users can gain valuable insights into the movie industry and uncover patterns that can inform decision-making, market research, and creative strategies.
Facebook
TwitterThe summary statistics by North American Industry Classification System (NAICS) which include: operating revenue (dollars x 1,000,000), operating expenses (dollars x 1,000,000), salaries wages and benefits (dollars x 1,000,000), and operating profit margin (by percent), of motion picture and video production (NAICS 512110), annual, for five years of data.
Facebook
TwitterThe source forecast that, by the end of 2022, the annual revenue of the global film production and distribution industry would amount to **** billion U.S. dollars. As of mid-2022, the sector employed almost *** thousand people in a little more than ** thousand businesses worldwide. China, the North American market (a term that includes the United States and Canada and excludes Mexico), and Japan were the world's leading box office markets by revenue in 2021.
Facebook
TwitterBy Yashwanth Sharaff [source]
This dataset contains essential characteristics of a variety of movies, including basic pieces of information such as the movie's title and budget, as well as performance indicators like the movie's MPAA rating, gross revenue, release date, genre, runtime, rating count and summary. With this data set we can better understand the film industry and uncover insights on how different features and performance metrics impact one another to guarantee a movie's success. The movies dataset also helps you make informed decisions about which features are key indicators in setting up a high-grossing feature film
For more datasets, click here.
- šØ Your notebook can be here! šØ!
To get the most out of this data set you need to understand what each column in it represents. The āTitleā column gives you the title of the movie which can be used for further search or exploration on popular streaming services and websites that are dedicated to providing detailed information about movies. The āMPAA Ratingā lists any Motion Picture Association (MPAA) rating for a movie which consists of G (General Audiences), PG (Parental Guidance Suggested), PG-13 (Parents Strongly Cautioned), R (Under 17 Requires Accompanying Parent or Guardian) etc. The 'Budget' column give you an approximate idea about how much a particular production cost while the 'Gross' columns depicts its earnings if it was released in theaters while its successor 'Release Date' reveals when each film has been released or is going to release in future. The columns 'Genre', 'Runtime', and āRating Countā cover subjeācts such as what type of movie is it? Every genre will have an associated runtime limit along with rating count which refers to number people who have rated/reviewed a particular flick whether on IMDB or other streaming services as well as paper mediums like newspapers . Last but not least summary field states an overview of what we can expect from film so take this in account before watching anything especially if include children members in your family.
So go ahead - start exploring this interesting dataset today!
- Creating a box office prediction model using budget, genre, release date and MPAA rating
- Using the summary data to create a sentiment analysis tool for movie reviews
- Building a recommendation engine for users based on their prior ratings and what other users with similar tastes have rated as highly
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: movies.csv | Column name | Description | |:-----------------|:-------------------------------------------------------------------------------| | Title | The title of the movie. (String) | | MPAA Rating | The Motion Picture Association of America (MPAA) rating of the movie. (String) | | Budget | The budget of the movie in US dollars. (Integer) | | Gross | The gross revenue of the movie in US dollars. (Integer) | | Release Date | The date the movie was released. (Date) | | Genre | The genre of the movie. (String) | | Runtime | The length of the movie in minutes. (Integer) | | Rating Count | The number of ratings the movie has received. (Integer) | | Summary | A brief summary of the movie. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Yashwanth Sharaff.
Facebook
TwitterIn the first quarter of 2025, Greater Los Angeles reported *** days spent shooting feature films, a sharp decrease in comparison to the first quarter of 2024. Still, the values recorded remained higher than during the second half of 2023, which was impacted by the writers and actors' strike.
Facebook
Twitterhttps://data.gov.tw/licensehttps://data.gov.tw/license
This dataset provides national theater box office statistics for films distributed by the Administrative Institution National Film and Audiovisual Culture Center. The data is up to the last Sunday before the announcement date and does not include films that have not been screened for less than 7 calendar days. The earliest CSV format data in this dataset begins on July 30, 2018, and the earliest JSON format data begins on March 1, 2020. JSON format queries require entering the start and end dates (in the format of year, month, and day), and can provide data for a maximum of 90 days at a time.
Facebook
TwitterAccording to a study held in July 2023 in the United States about AI use cases in TV and film industries, ** percent of respondents supported the use of such technology to create special effects or to alter actors' appearances. It was the most supported AI use case, while generating voices for animated characters came in second position with ** percent. The majority of entertainment-industry professionals agreed that the role AI was a point of contention for the ongoing writers strike.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Facebook
TwitterBy Emma Culwell [source]
This dataset offers an extensive look at some of the most popular movie franchises in history, shedding light on their financial success and public reception. It includes data on the lifetime gross sales, budgets, ratings, and release dates of each featured movie. Furthermore, this dataset provides invaluable insights into how different elements such as ratings and runtime can affect the performance of a film at the box office. Whether you are an aspiring or established filmmaker looking for inspiration to craft your own successful blockbuster or simply a fan curious about these filmsā inner workings, this dataset offers an unprecedented level of detail regarding many beloved franchises
For more datasets, click here.
- šØ Your notebook can be here! šØ!
This dataset provides comprehensive information on movie franchises released worldwide between 2000 and 2020. It includes data such as lifetime gross, budget, rating, runtime, release date and vote count/average. This dataset can be used to gain insights on the global movie industry trends over this time period.
The data can be explored in various ways to identify patterns of success or failure among movie franchises across countries, genres or decades. For example, you may want to examine the average budget for movies released each year or calculate the average number of votes received by movies of a particular genre. Additionally, you could use this dataset to compare different types of media (e.g., cable vs streaming) and understand how they impact box-office performance.
To get the most out of this data set it is essential that you first familiarize yourself with all the columns provided: Title: The title of the movie; Lifetime Gross: Total amount money earned by a franchise in all territories; Year: The year in which it was first made available publicly; Studio: The production company behind the production; Rating: Classification given by MPAA/BBFC; Runtime: Length in minutes/hours; Budget: Amount spent producing it ; Release Date : Date when publically announced Availability ; Vote Average : Average ratings based on user reviews ; Vote Count : Number people who rated franchise).
Once you have become comfortable with these variables then feel free to try out some larger analysis techniques such as predictive analytics (predicting future success based on existing trends) or clustering (grouping similar outcomes together). No matter which methods you decide to utilize it is important that you remember ā always validate your assumptions! Good luck exploring!
- A comparison of movie budget to box office returns, to identify over/underperforming movies.
- A study of the correlation between movie rating and viewership.
- An analysis of what types of movies tend to become franchise success stories (big budget, PG-13 rating, etc.)
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: MovieFranchises.csv | Column name | Description | |:-------------------|:------------------------------------------------------------------------| | Title | The title of the movie. (String) | | Lifetime Gross | The total amount of money the movie has made in its lifetime. (Integer) | | Year | The year the movie was released. (Integer) | | Studio | The studio that produced the movie. (String) | | Rating | The rating of the movie (e.g. PG-13, R, etc). (String) | | Runtime | The length of the movie in minutes. (Integer) | | Budget | The budget of the movie in USD. (Integer) | | ReleaseDate | The date the movie was released. (Date) | | VoteAvg | The average rating of the movie from users. (Float) | | VoteCount | The total number of votes the movie has received from users. (Integer) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Emma Culwell.
Facebook
TwitterThis table contains 56 series, with data for years 2006 - 2011 (not all combinations necessarily have data for all years), and was last released on 2015-07-28. This table contains data described by the following dimensions (Not all combinations are available): Geography (14 items: Canada; Prince Edward Island; Nova Scotia; Newfoundland and Labrador ...), North American Industry Classification System (NAICS) (1 items: Motion picture and video production ...), Summary statistics (4 items: Operating revenue; Operating profit margin; Salaries; wages and benefits; Operating expenses ...).
Facebook
TwitterThe revenue of the motion picture and video production and distribution industry in the United States sharply increased in 2022 to almost ** billion U.S. dollars. The number of movie tickets sold in the U.S. and Canada amounted to around ***** million tickets that same year.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Movies Dataset from AllMovie is a comprehensive collection featuring over 430,000 records, encompassing a wide range of films across various genres and languages. This extensive dataset includes essential data points such as movie titles, genres, release dates, posters, languages, directors, durations, synopses, trailers, average ratings, cast information, and URLs. Such detailed metadata is invaluable for developers, researchers, and enthusiasts aiming to analyze trends, build recommendation systems, or conduct in-depth studies of the film industry.
For those interested in alternative datasets, the IMDb Non-Commercial Datasets provide subsets of IMDb data accessible for personal and non-commercial use. These datasets allow users to hold local copies of movie information, facilitating various analytical projects.
Additionally, the MovieLens datasets offer a range of movie rating data suitable for research purposes. For instance, the MovieLens 20M dataset comprises 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users, making it a valuable resource for studies in user preferences and recommendation algorithms.
Incorporating these datasets into your projects can significantly enhance the quality and depth of your analyses, providing a solid foundation for exploring various aspects of the cinematic world.
Why Choose Crawl Feeds for Your Data Needs?
Crawl Feeds is your trusted partner in acquiring high-quality, curated datasets tailored to your specific requirements. With a vast repository that includes the Movies Dataset, we empower developers and businesses to drive innovation. Explore our easy-to-use platform and transform your ideas into actionable insights.
Get Started with Crawl Feeds Today
Facebook
TwitterAbstract copyright UK Data Service and data collection copyright owner.
UIS produces data on 'movie-watching', one of the worldās most popular and lucrative cultural industrys. It has developed a biennial survey on feature film statistics to monitor global trends in selected areas of this industry (e.g. production, distribution, infrastructure and audience behaviour), which have been transformed in the recent past by Digital technology. The UNESCO Feature Films Statistics includes 80 indicators for over 200 countries from 1995 onwards.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This file contains data extracted from DPC, āStatistics for: www.cinemacontext.nlā (https://dpc.uba.uva.nl/awstats/awstats.pl?config=www.cinemacontext.nl). The statistics for the Cinema Context website are collected by the Digital Production Center (DPC) of the University Library Amsterdam (UBA), the organization that hosts and maintains the database and web interface. DPC collects the web statistics with the program Advanced Web Statistics (AWStats, version 7.0). The extracted data in this spreadsheet support the analysis of the use of Cinema Context for the article 'Writing Cinema Histories with Digital Databases. The Case of Cinema Contextā, authored by Julia Noordegraaf, Kathleen Lotze and Jaap Boter. Tijdschrift voor Mediageschiedenis vol. 21, no. 2 (2018), 106-126. Http://www.tijdschriftmediageschiedenis.nl/index.php/tmg/article/view/369.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Complete dataset of āFilm Circulation on the International Film Festival Network and the Impact on Global Film Cultureā
A peer-reviewed data paper for this dataset is in review to be published in NECSUS_European Journal of Media Studies - an open access journal aiming at enhancing data transparency and reusability, and will be available from https://necsus-ejms.org/ and https://mediarep.org
Please cite this when using the dataset.
Detailed description of the dataset:
1 Film Dataset: Festival Programs
The Film Dataset consists a data scheme image file, a codebook and two dataset tables in csv format.
The codebook (csv file ā1_codebook_film-dataset_festival-programā) offers a detailed description of all variables within the Film Dataset. Along with the definition of variables it lists explanations for the units of measurement, data sources, coding and information on missing data.
The csv file ā1_film-dataset_festival-program_longā comprises a dataset of all films and the festivals, festival sections, and the year of the festival edition that they were sampled from. The dataset is structured in the long format, i.e. the same film can appear in several rows when it appeared in more than one sample festival. However, films are identifiable via their unique ID.
The csv file ā1_film-dataset_festival-program_wideā consists of the dataset listing only unique films (n=9,348). The dataset is in the wide format, i.e. each row corresponds to a unique film, identifiable via its unique ID. For easy analysis, and since the overlap is only six percent, in this dataset the variable sample festival (fest) corresponds to the first sample festival where the film appeared. For instance, if a film was first shown at Berlinale (in February) and then at Frameline (in June of the same year), the sample festival will list āBerlinaleā. This file includes information on unique and IMDb IDs, the film title, production year, length, categorization in length, production countries, regional attribution, director names, genre attribution, the festival, festival section and festival edition the film was sampled from, and information whether there is festival run information available through the IMDb data.
2 Survey Dataset
The Survey Dataset consists of a data scheme image file, a codebook and two dataset tables in csv format.
The codebook ā2_codebook_survey-datasetā includes coding information for both survey datasets. It lists the definition of the variables or survey questions (corresponding to Samoilova/Loist 2019), units of measurement, data source, variable type, range and coding, and information on missing data.
The csv file ā2_survey-dataset_long-festivals_shared-consentā consists of a subset (n=161) of the original survey dataset (n=454), where respondents provided festival run data for films (n=206) and gave consent to share their data for research purposes. This dataset consists of the festival data in a long format, so that each row corresponds to the festival appearance of a film.
The csv file ā2_survey-dataset_wide-no-festivals_shared-consentā consists of a subset (n=372) of the original dataset (n=454) of survey responses corresponding to sample films. It includes data only for those films for which respondents provided consent to share their data for research purposes. This dataset is shown in wide format of the survey data, i.e. information for each response corresponding to a film is listed in one row. This includes data on film IDs, film title, survey questions regarding completeness and availability of provided information, information on number of festival screenings, screening fees, budgets, marketing costs, market screenings, and distribution. As the file name suggests, no data on festival screenings is included in the wide format dataset.
3 IMDb & Scripts
The IMDb dataset consists of a data scheme image file, one codebook and eight datasets, all in csv format. It also includes the R scripts that we used for scraping and matching.
The codebook ā3_codebook_imdb-datasetā includes information for all IMDb datasets. This includes ID information and their data source, coding and value ranges, and information on missing data.
The csv file ā3_imdb-dataset_aka-titles_longā contains film title data in different languages scraped from IMDb in a long format, i.e. each row corresponds to a title in a given language.
The csv file ā3_imdb-dataset_awards_longā contains film award data in a long format, i.e. each row corresponds to an award of a given film.
The csv file ā3_imdb-dataset_companies_longā contains data on production and distribution companies of films. The dataset is in a long format, so that each row corresponds to a particular company of a particular film.
The csv file ā3_imdb-dataset_crew_longā contains data on names and roles of crew members in a long format, i.e. each row corresponds to each crew member. The file also contains binary gender assigned to directors based on their first names using the GenderizeR application.
The csv file ā3_imdb-dataset_festival-runs_longā contains festival run data scraped from IMDb in a long format, i.e. each row corresponds to the festival appearance of a given film. The dataset does not include each film screening, but the first screening of a film at a festival within a given year. The data includes festival runs up to 2019.
The csv file ā3_imdb-dataset_general-info_wideā contains general information about films such as genre as defined by IMDb, languages in which a film was shown, ratings, and budget. The dataset is in wide format, so that each row corresponds to a unique film.
The csv file ā3_imdb-dataset_release-info_longā contains data about non-festival release (e.g., theatrical, digital, tv, dvd/blueray). The dataset is in a long format, so that each row corresponds to a particular release of a particular film.
The csv file ā3_imdb-dataset_websites_longā contains data on available websites (official websites, miscellaneous, photos, video clips). The dataset is in a long format, so that each row corresponds to a website of a particular film.
The dataset includes 8 text files containing the script for webscraping. They were written using the R-3.6.3 version for Windows.
The R script ār_1_unite_dataā demonstrates the structure of the dataset, that we use in the following steps to identify, scrape, and match the film data.
The R script ār_2_scrape_matchesā reads in the dataset with the film characteristics described in the ār_1_unite_dataā and uses various R packages to create a search URL for each film from the core dataset on the IMDb website. The script attempts to match each film from the core dataset to IMDb records by first conducting an advanced search based on the movie title and year, and then potentially using an alternative title and a basic search if no matches are found in the advanced search. The script scrapes the title, release year, directors, running time, genre, and IMDb film URL from the first page of the suggested records from the IMDb website. The script then defines a loop that matches (including matching scores) each film in the core dataset with suggested films on the IMDb search page. Matching was done using data on directors, production year (+/- one year), and title, a fuzzy matching approach with two methods: ācosineā and āosa.ā where the cosine similarity is used to match titles with a high degree of similarity, and the OSA algorithm is used to match titles that may have typos or minor variations.
The script ār_3_matchingā creates a dataset with the matches for a manual check. Each pair of films (original film from the core dataset and the suggested match from the IMDb website was categorized in the following five categories: a) 100% match: perfect match on title, year, and director; b) likely good match; c) maybe match; d) unlikely match; and e) no match). The script also checks for possible doubles in the dataset and identifies them for a manual check.
The script ār_4_scraping_functionsā creates a function for scraping the data from the identified matches (based on the scripts described above and manually checked). These functions are used for scraping the data in the next script.
The script ār_5a_extracting_info_sampleā uses the function defined in the ār_4_scraping_functionsā, in order to scrape the IMDb data for the identified matches. This script does that for the first 100 films, to check, if everything works. Scraping for the entire dataset took a few hours. Therefore, a test with a subsample of 100 films is advisable.
The script ār_5b_extracting_info_allā extracts the data for the entire dataset of the identified matches.
The script ār_5c_extracting_info_skippedā checks the films with missing data (where data was not scraped) and tried to extract data one more time to make sure that the errors were not caused by disruptions in the internet connection or other technical issues.
The script ār_check_logsā is used for troubleshooting and tracking the progress of all of the R scripts used. It gives information on the amount of missing values and errors.
4 Festival Library Dataset
The Festival Library Dataset consists of a data scheme image file, one codebook and one dataset, all in csv format.
The codebook (csv file ā4_codebook_festival-library_datasetā) offers a detailed description of all variables within the Library Dataset. It lists the definition of variables, such as location and festival name, and festival categories,
Facebook
TwitterAs of 2024, the top film-producing market on a global arena is the United states, with over **** thousand movies made in the country throughout the entire cinematic production period. The UK and France lead the film production in Europe, while China and India dominate the Asia-pacific region in that regard.
Facebook
TwitterIn 2024, Disney alone accounted for over one-quarter (21.4 percent) of the box office revenue in the United States and Canada, thanks to blockbusters such as "Inside Out 2". Universal ranked second in box office market share at about 20 percent. Warner Bros held a share of approximately 13 percent that year. Disney's superpowers The company's performance at the so-called North American box office led to yet another outstanding placement in the U.S.'s mediascape. In 2024, Disney's box office market share once again stood above 25 percent, a milestone the studio has been achieving every other year since the second half of the 2010s. But an overreliance on superhero stories ā noticeable since Disney acquired Marvel in 2009 ā may have its days counted. The share of moviegoers in the U.S. saying they were getting tired of so many superhero movies grew by six percentage points between mid-2018 and the end of 2021. Who has the range? Diversity in film genres seems to also be important to attract newer audiences. During a mid-2021 survey, over a third of responding Gen Zers said their main motivation for attending movie theaters was a variety of movie offerings. This segment is key for the cinema industry. Historically, the 12-17 age group has been recording the highest average of movies seen per capita in a theater in the U.S. In 2021, the figure stood at 2.5. Among people aged 50 and above, the average stood below one.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Letterboxd Film Dataset
This dataset contains a comprehensive collection of 847,209 films from the Letterboxd platform, including movie information, user reviews, and ratings.
Dataset Summary
Total Films: 847,209 File Size: ~1.12 GB (1,120,572,122 bytes) Format: JSONL (JSON Lines) Language: Primarily English, with some multilingual content
Data Structure
Each line contains a JSON object with the following fields: { "url":⦠See the full description on the dataset page: https://huggingface.co/datasets/pkchwy/letterboxd-all-movie-data.
Facebook
TwitterBy Throwback Thursday [source]
This dataset contains genre statistics for movies released between 1995 and 2018. It provides information on various aspects of the movies, such as gross revenue, tickets sold, and inflation-adjusted figures. The dataset includes columns for genre, year of release, number of movies released in each genre and year, total gross revenue generated by movies in each genre and year, total number of tickets sold for movies in each genre and year, inflation-adjusted gross revenue that takes into account changes in the value of money over time, title of the highest-grossing movie in each genre and year, gross revenue generated by the highest-grossing movie in each genre and year, and inflation-adjusted gross revenue of the highest-grossing movie in each genre and year. This dataset offers insights into film industry trends over a span of more than two decades
Understanding the Columns
Before diving into the analysis, let's familiarize ourselves with the different columns in this dataset:
- Genre: This column represents the genre of each movie.
- Year: The year in which the movies were released.
- Movies Released: The number of movies released in a particular genre and year.
- Gross: The total gross revenue generated by movies in a specific genre and year.
- Tickets Sold: The total number of tickets sold for movies in a specific genre and year.
- Inflation-Adjusted Gross: The gross revenue adjusted for inflation, taking into account changes in the value of money over time.
- Top Movie: The title of the highest-grossing movie in a specific genre and year.
- Top Movie Gross (That Year): The gross revenue generated by the highest-grossing movie in a specific genre and year.
- Top Movie Inflation-Adjusted Gross (That Year): The inflation-adjusted gross revenue of the highest-grossing movie in a specific genre and year.
Analyzing Data
To make use of this dataset effectively, here are some potential analyses you can perform:
Find popular genres: You can determine which genres are popular by looking at columns like Movies Released or Tickets Sold. Analyzing these numbers will give you insights into what types of movies attract more audiences.
Measure financial success: Explore columns like Gross, Inflation Adjusted Gross, or Top Movie Gross (That Year) to compare the financial success of different genres. This will allow you to identify genres that generate higher revenue.
Understand movie trends: By analyzing the dataset over different years, you can observe trends in movie releases and gross revenue for specific genres. This information is crucial for understanding how movie preferences change over time.
Identify highest-grossing movies: The column Top Movie gives you the title of the highest-grossing movie in each genre and year. You can use this information to analyze the success of specific movies within their respective genres.
Data Visualization
To enhance your analysis, consider using data visualization techniques
- Predicting the popularity and success of movies in different genres: By analyzing the data on tickets sold and gross revenue, we can identify trends and patterns in movie genres that attract more audiences and generate higher revenue. This information can be useful for filmmakers, production studios, and investors to make informed decisions about which genres to focus on for future movie releases.
- Comparing the performance of movies over time: With the inclusion of inflation-adjusted figures, this dataset allows us to compare the box office success of movies across different years. We can analyze how movies in specific genres have performed over time in terms of gross revenue and adjust these figures for inflation to get a better understanding of their true financial success.
- Analyzing the impact of genre popularity on ticket sales: By examining the relationship between genre popularity (measured by tickets sold) and total gross revenue, we can gain insights into audience preferences and behavior. This information is valuable for marketing strategies, as it helps determine which movie genres are most likely to attract a larger audience base and generate higher ticket sales
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.