19 datasets found

c
IMDB movie details dataset
crawlfeeds.com
csv, zip
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). IMDB movie details dataset [Dataset]. https://crawlfeeds.com/datasets/imdb-movie-details-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description

The IMDB Movie Details Dataset is a comprehensive collection of movie datasets that offers a treasure trove of information about movies, TV shows, and streaming content listed on IMDB. This dataset includes detailed data such as titles, release years, genres, cast, crew, ratings, and more, making it a go-to resource for film and entertainment enthusiasts. Ideal for data analysis, IMDB movie dataset applications span machine learning projects, predictive modeling, and insights into industry trends.

Researchers can explore patterns in movie ratings and genre popularity, while developers can use the dataset to build recommendation systems or applications. Movie buffs can dive deep into historical and contemporary trends in the world of cinema. This dataset not only supports academic and professional pursuits but also opens doors for creative projects in storytelling, content creation, and audience engagement. Whether you’re a developer, researcher, or film enthusiast, the IMDB movie dataset is a powerful tool for uncovering trends and gaining deeper insights into the evolving entertainment landscape.
T
imdb_reviews
tensorflow.org
kaggle.com
Updated Sep 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). imdb_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/imdb_reviews
Explore at:
Dataset updated
Sep 20, 2024
Description
Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('imdb_reviews', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
o
Filtered IMDb Movies & TV Shows Dataset
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Filtered IMDb Movies & TV Shows Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/ca25d396-b298-4765-ab3b-8adf955bfc63
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Entertainment & Media Consumption
Description
This dataset provides detailed information on IMDb movies and television shows, integrating descriptions sourced from Rotten Tomatoes. It contains data for approximately 7800 titles, primarily from the 1990s onwards, and has been filtered to include English language content with specific criteria for ratings and votes. The purpose of this dataset is to facilitate projects involving cross-content analysis, content-based recommendation systems, and genre prediction tasks. It offers a rich resource for understanding entertainment media consumption and developing machine learning applications.

Columns

SNo.: Serial number for each record.

index: An internal index for the record.

tconst: A unique identifier for the title.

titleType: Specifies the type of content, such as 'movie' or 'tvSeries'.

primaryTitle: The most commonly known title for the content.

originalTitle: The official original title of the content.

isAdult?: A boolean indicator for adult content.

startYear: The year the title was released or started.

endYear: The year the title concluded (for TV series) or was released.

runtimeMinutes: The duration of the content in minutes.

Genres: Categories or types of content (multiple values may be present).

Average Rating: The average rating of the title as found on IMDb.

Num. of Votes: The total number of votes received for the rating on IMDb.

Region: The geographic region associated with the title's availability or origin.

Number of Ratings Types: Details related to how ratings are categorised.

Attributes: Additional characteristics or tags associated with the title.

Description: A textual description of the title, likely from Rotten Tomatoes.

Distribution

The dataset comprises approximately 7800 individual movie and TV show records. It is typically provided in a CSV file format. The data has been curated, filtering the original IMDb dataset to focus on content from the 1990s through to 2023. Only titles in English ('en') have been retained, and specific rating and vote thresholds have been applied, such as movies/shows from the 90s-00s with ratings of 7.9 or higher, and those from the 2000s onwards with ratings of 6.5 or higher. Titles from Canada, Greater Britain, India, and the USA are represented.

Usage

This dataset is highly suitable for various analytical and machine learning tasks, including: * Developing content-based recommendation systems using genres, descriptions, and ratings. * Performing exploratory data analysis on movie and TV show trends. * Implementing Natural Language Processing (NLP) techniques on title descriptions for insights or feature extraction. * Executing multi-label classification to predict genres from description data. * Clustering movies and shows based on their descriptions and genre attributes. * Aiding projects that require cross-content analysis across different media types.

Coverage

The dataset primarily covers movies and TV shows released from 1990 to 2023. Geographically, the data includes titles relevant to Canada, Greater Britain, India, and the USA. There is no specific demographic scope mentioned beyond the inclusion of English-language titles. The dataset has specific filtering criteria for data availability based on rating scores and the number of votes, ensuring a focus on well-received or highly-engaged content.

License

CCO

Who Can Use It

This dataset is ideal for: * Data Scientists and Analysts: For conducting exploratory data analysis, building predictive models, and deriving insights into media consumption. * Machine Learning Engineers: For developing and training recommendation engines, NLP models, and classification algorithms. * Researchers: Studying trends in film and television, cross-media analysis, and content categorisation. * Developers: Creating applications that require rich movie and TV show data, such as content discovery platforms. * Academics and Students: For educational purposes, coursework, and research projects in data science, AI, and media studies.

Dataset Name Suggestions

IMDb Films & Shows with Descriptions

Nineties and Beyond IMDb Data

Rotten Tomatoes-IMDb Integrated Dataset

Filtered IMDb Movies & TV Shows

Entertainment Content Analytics Dataset

Attributes

Original Data Source: IMDb Movies/Shows with Descriptions
IMDb Movie Review Sentiment
kaggle.com
Updated Dec 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). IMDb Movie Review Sentiment [Dataset]. https://www.kaggle.com/datasets/thedevastator/imdb-movie-review-sentiment-dataset/suggestions?status=pending&yourSuggestions=true
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 2, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
IMDb Movie Review Sentiment

Movie Review Sentiment

By imdb (From Huggingface) [source]

About this dataset

The IMDb Large Movie Review Dataset is a comprehensive collection of movie reviews used for sentiment classification. The dataset includes a wide range of movie reviews along with their corresponding sentiment labels, which indicate whether the review is positive or negative in nature. This invaluable dataset is aimed at facilitating sentiment analysis and classification tasks in the field of natural language processing.

The main purpose of the train.csv file within this dataset is to provide a curated collection of movie reviews, each accompanied by its respective sentiment label. This file proves particularly useful for training machine learning models to accurately predict sentiment and classify reviews based on their emotional tone.

Similarly, the test.csv file contains another set of movie reviews along with corresponding sentiment labels. Meant for testing and validating the performance of trained models, this dataset enables researchers and developers to evaluate their models' effectiveness in real-world scenarios.

Additionally, the unsupervised.csv file offers an alternative subset within the dataset. Unlike train.csv and test.csv, unsupervised.csv does not include any associated sentiment labels for individual movie reviews. This specific subset serves as a valuable resource for exploring unsupervised learning techniques within the domain of sentiment classification.

By utilizing this meticulously compiled IMDb Large Movie Review Dataset, researchers and data scientists can delve into various aspects related to analyzing sentiments in textual data. With its carefully labeled data points covering both positive and negative sentiments expressed in diverse film critiques, this dataset empowers users to develop sophisticated machine learning algorithms that accurately assess subjective opinions from text data

How to use the dataset

Introduction:

Dataset Overview: - Train.csv: This file contains a set of movie reviews along with their sentiment labels. It is intended for training your sentiment analysis models. - Test.csv: This file provides another set of movie reviews along with their corresponding sentiment labels. You can use this file to evaluate the performance of your trained models. - Unsupervised.csv: This file includes movie reviews without any associated sentiment labels. It can be used for unsupervised sentiment classification tasks.

Columns in the Dataset: - text: The main column containing the text of each movie review. - label: The sentiment label assigned to each review, indicating whether it is positive or negative.

Guidelines for Using the Dataset:

Training Your Model:

Begin by loading and preprocessing the data from train.csv

Treat 'text' as your input feature and 'label' as your target variable

Explore different machine learning or deep learning algorithms suitable for text classification

Train your model using various techniques, such as bag-of-words, word embeddings, or transformers

Evaluate and fine-tune your model's performance using test.csv

Evaluating Your Model:

Load test.csv and preprocess the data similar to what you did with train.csv

Use this preprocessed test data to evaluate the accuracy, precision, recall, F1 score or other relevant metrics of your trained model on unseen data

Analyze these metrics to understand how well your model is performing in predicting sentiments

Advancing Your Model (Unsupervised Classification):

Utilize unsupervised.csv for unsupervised sentiment classification tasks

Preprocess the movie reviews in this file and explore techniques like clustering, topic modeling, or self-supervised learning

Extract patterns, themes, or sentiments from the reviews without any guidance from labeled data

Conclusion:

Research Ideas

Sentiment Analysis: This dataset can be used to train models for sentiment analysis, where the goal is to predict whether a movie review is positive or negative based on its text.

NLP Research: The dataset can be used for various natural language processing (NLP) tasks such as text classification, information extraction, or named entity recognition. Researchers and practitioners can leverage this dataset to develop and evaluate new algorithms and techniques in the field of NLP.

Recommendation Systems: The sentiment labels in this dataset can be used as a source of feedback or user preferences for recommendation systems. By analyzing the sentiments expressed in reviews,...
o
Popular Movies of IMDb
opendatabay.com
.undefined
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Popular Movies of IMDb [Dataset]. https://www.opendatabay.com/data/web-social/c9597b23-d205-46ff-abb3-674815373730
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 9, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Entertainment & Media Consumption
Description
Introduction

TMDB.org is a crowd-sourced movie information database used by many film-related consoles, sites and apps, such as XBMC, MythTV and Plex. Dozens of media managers, mobile apps and social sites make use of its API. TMDb lists some 80,000 films at time of writing, which is considerably fewer than IMDb. While not as complete as IMDb, it holds extensive information for most popular/Hollywood films. This is dataset of the 10,000 most popular movies across the world has been fetched through the read API. TMDB's free API provides for developers and their team to programmatically fetch and use TMDb's data. Their API is to use as long as you attribute TMDb as the source of the data and/or images. Also, they update their API from time to time.

This data set is fetched using exception handling process so the data set contains some null values as there are missing fields in the tmdb database. Thought it's good for a young analyst to deal with messing value. Hey analyst are you all excited?

Original Data Source: Popular Movies of IMDb
o
Wikipedia Movie Plot Collection
opendatabay.com
.undefined
Updated Jul 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Wikipedia Movie Plot Collection [Dataset]. https://www.opendatabay.com/data/ai-ml/624e3736-74ea-4f5c-9ee5-fda14c16c770
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 8, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Entertainment & Media Consumption
Description
This dataset contains movie plots extracted from Wikipedia, along with other key metadata. It is specifically curated for movies released between 1950 and 2023 that have accumulated over 1000 ratings on IMDb. The primary purpose of this dataset is to facilitate development in Large Language Models (LLMs) for applications such as movie searching or recommendation systems. The plot summaries have been meticulously cleaned to remove irrelevant elements like links and references, ensuring a pure text value. Where Wikipedia plots were unavailable, IMDb synopses were used as a fallback. The dataset includes 89% of movies with detailed plot information, while 100% include a short summary untouched from Wikipedia, which is useful for matching metadata in retriever applications. Columns like 'stars', 'directors', and 'genres' are provided as lists of values, making them suitable for direct loading into vector databases.

Columns

title: The title of the film, presented in lowercase.

stars: The names of the actors featured in the film, also in lowercase.

directors: The names of the film's directors, in lowercase.

year: The year when the movie was released.

genre: The genres associated with the film, listed in lowercase.

runtime: The duration of the film, measured in minutes.

ratingCount: An indication of the film's popularity, showing the number of people who have rated it on IMDb.

plot: Detailed storyline of the film.

summary: A short overview and additional details about the film.

imdb_rating: The film's rating on IMDb, on a scale of 1 to 10.

Distribution

The data file is typically in CSV format. The dataset spans movies released from 1950 up to 2023. There are 20,617 unique movie titles, 21,596 unique star names, and 9,863 unique director names. The genres column contains 21,675 unique values. Movie runtimes range from -1 to 776 minutes, with a significant majority (17,433 entries) falling between 76.70 and 115.55 minutes. The number of ratings (ratingCount) varies widely, starting from 1,001 and going up to 2.73 million. IMDb ratings range from 1.2 to 9.3. While specific total row/record counts are not available, the distribution data for year, runtime, ratingCount, and imdb_rating show various value counts within different ranges.

Usage

This dataset is ideal for: * Developing demonstration projects leveraging Large Language Models (LLMs). * Creating movie search applications, such as the example of a movie searching app like cinemattr.ca. * Building retriever applications where the 'summary' column can be used for metadata matching. * Populating vector databases with structured information from 'stars', 'directors', and 'genres' for advanced querying and analysis.

Coverage

The dataset's geographic scope is global. It includes movies released within the time frame of 1950 to 2023. The data availability specifies that 89% of the movies have detailed plot information, and all movies (100%) include a short summary. The dataset focuses on films with more than 1000 ratings on IMDb.

License

CC0

Who Can Use It

This dataset is suitable for: * AI and machine learning developers who are building models based on natural language processing. * Data scientists and researchers interested in film data and entertainment analytics. * Software engineers developing applications that require movie plot summaries or metadata, such as recommendation engines. * Students and enthusiasts looking for high-quality, pre-processed text data for LLM projects.

Dataset Name Suggestions

IMDb Verified Movie Plots

Historical Film Summaries (1950-2023)

Wikipedia Movie Plot Collection

LLM-Ready Movie Dataset

Global Cinema Plot Archive

Attributes

Original Data Source: Movie Plots from Wikipedia
R
Relational In-Memory Database Report
datainsightsmarket.com
doc, pdf, ppt
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Relational In-Memory Database Report [Dataset]. https://www.datainsightsmarket.com/reports/relational-in-memory-database-1978756
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
May 28, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The relational in-memory database (IMDB) market is experiencing robust growth, driven by the increasing demand for real-time analytics and applications requiring ultra-low latency data processing. The market, estimated at $15 billion in 2025, is projected to expand at a Compound Annual Growth Rate (CAGR) of 18% between 2025 and 2033, reaching approximately $60 billion by 2033. This growth is fueled by several key factors. Firstly, the rise of big data and the need for faster insights across various sectors like finance, healthcare, and telecommunications are propelling adoption. Secondly, advancements in technology, such as improved memory capacity and processing power, are making IMDBs more affordable and efficient. Finally, cloud computing platforms are playing a significant role, offering scalable and cost-effective IMDB solutions. Major players like Microsoft, IBM, Oracle, and Amazon are investing heavily in this space, leading to increased competition and innovation. While the market faces challenges such as data security concerns and the complexity of integrating IMDBs into existing systems, these are likely to be mitigated by continuous technological advancements and increasing industry expertise. Despite the overall positive outlook, market segmentation reveals distinct growth patterns. The financial services sector is currently the largest adopter of IMDB technology, followed by the telecommunications and healthcare industries. Geographic distribution shows that North America and Europe currently hold the largest market shares, but significant growth is anticipated in Asia-Pacific regions due to increasing digitalization and data generation. Challenges remain in ensuring data consistency and managing the potential cost overhead associated with high-memory requirements. However, the continuous development of efficient memory management techniques and the integration of IMDBs with advanced analytics tools are likely to address these concerns and further fuel market expansion. The long-term outlook for the relational in-memory database market remains exceptionally promising, suggesting consistent high-growth potential well into the next decade.
Popular Movies of IMDb
kaggle.com
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sankha Subhra Mondal (2025). Popular Movies of IMDb [Dataset]. https://www.kaggle.com/sankha1998/tmdb-top-10000-popular-movies-dataset/notebooks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 9, 2025
Dataset provided by
Kaggle
Authors
Sankha Subhra Mondal
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description

Introduction

TMDB.org is a crowd-sourced movie information database used by many film-related consoles, sites and apps, such as XBMC, MythTV and Plex. Dozens of media managers, mobile apps and social sites make use of its API. TMDb lists some 80,000 films at time of writing, which is considerably fewer than IMDb. While not as complete as IMDb, it holds extensive information for most popular/Hollywood films. This is dataset of the 10,000 most popular movies across the world has been fetched through the read API. TMDB's free API provides for developers and their team to programmatically fetch and use TMDb's data. Their API is to use as long as you attribute TMDb as the source of the data and/or images. Also, they update their API from time to time.

This data set is fetched using exception handling process so the data set contains some null values as there are missing fields in the tmdb database. Thought it's good for a young analyst to deal with messing value.
Hey analyst are you all excited?
Full Netflix Dataset
kaggle.com
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OctopusTeam (2025). Full Netflix Dataset [Dataset]. https://www.kaggle.com/datasets/octopusteam/full-netflix-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 9, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
OctopusTeam
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides a comprehensive collection of all titles (Movies and TV Series) available on Netflix. In addition to basic information, it includes IMDb-specific data like IMDb ID, Average Rating, and Number of Votes.

A dataset is updated daily at 10:00 AM CET. If you find this dataset helpful, feel free to give it an upvote! 😊

You can find all our APIs, maintained and developed by us, at the following link: octopusteam.dev. These APIs provide access to various features and data, ensuring high-quality and reliable integration options for your needs.

All Datasets:

Full Netflix Dataset

Full Apple TV+ Dataset

Full Amazon Prime Dataset

Full Hulu Dataset

Full HBO Max Dataset
I
In Memory Database Industry Report
datainsightsmarket.com
doc, pdf, ppt
Updated Feb 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). In Memory Database Industry Report [Dataset]. https://www.datainsightsmarket.com/reports/in-memory-database-industry-13053
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Feb 15, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Market Overview: The global in-memory database (IMDB) market is poised for substantial growth, with a projected CAGR of 19.00% from 2025 to 2033. The market size, valued at XX million in 2025, is attributed to the increasing adoption of IMDBs in various industries, including telecommunications, BFSI, logistics, retail, entertainment, and healthcare. Key drivers behind this growth include the need for real-time data processing, improved performance, and the rise of big data and analytics. Market Dynamics: The IMDB market is influenced by several trends and challenges. The growing adoption of cloud-based IMDB solutions is a key trend, as it provides flexibility and cost-effectiveness. However, security concerns and latency issues associated with cloud-based deployments pose challenges. Additionally, the increasing demand for high-performance computing and the need for faster data processing are driving the development of advanced IMDB technologies. The market is fragmented, with established players such as IBM, Oracle, and Microsoft competing alongside emerging startups like VoltDB and MemSQL. Regional variations in market maturity and adoption rates are also observed, with North America leading the way in terms of market penetration. Recent developments include: May 2022: IBM and SAP announced the extension of their collaboration as IBM embarks on a corporate transformation initiative to optimize its business operations using RISE and SAP S/4HANA Cloud. To execute work for over 1,000 legal entities in more than 120 countries and multiple IBM companies supporting hardware, software, consulting, and finance, IBM said it is transferring to SAP S/4HANA, SAP's most recent ERP system, as part of the extended relationship. The replacement for SAP R/3 and SAP ERP, SAP S/4HANA, is SAP's ERP system for large businesses. It is intended to work optimally with SAP's in-memory database, SAP HANA., November 2022: Redis, a provider of real-time in-memory databases, and Amazon Web Services have announced a multi-year strategic alliance. Redis is a networked, open-source NoSQL system that stores data on disk for durability before moving it to DRAM as necessary. It can function as a streaming engine, message broker, database, or cache. The business claims that when Redis is used as a database, apps may instantly search across tens of millions of rows of customer data to locate information specific to one particular customer. A managed database-as-a-service product on AWS is called the real-time Redis Enterprise Cloud., December 2022: The National Stock Exchange, the largest stock exchange in India, chose the Raima Database Manager (RDM) Workgroup 12.0 in-memory system as a foundational component for the next iterations of its trading platform front-end, the National Exchange for Automated Trading (NEAT).. Key drivers for this market are: Decreasing Hardware Cost, Increasing Penetration Of Trends Like Big Data And IOT; Increase In The Volume Of Data Generated And Shift Of Enterprise Operations. Potential restraints include: Resilience In Integration With VLDB'S. Notable trends are: Telecommunication End-User Industry to Hold Significant Market Share.
f
DataSheet1_Quantifying Award Network and Career Development in the Movie...
frontiersin.figshare.com
pdf
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yixuan Liu; Yifang Ma (2023). DataSheet1_Quantifying Award Network and Career Development in the Movie Industry.pdf [Dataset]. http://doi.org/10.3389/fphy.2022.902890.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fphy.2022.902890.s001
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Yixuan Liu; Yifang Ma
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In show business, awards are conferred to persons and films to provide incentives to performers’ future career development through periodic film festivals and events. In this work, we focused on exploring the growth and dynamics of the film award system, the structure of the award network, and the relationships between historical performance, collaborations, and future career success of performers in the movie industry. We collected data from IMDb, which covers more than 3.5K movie events for 520K individuals with their award-winning and career records for over 90 years. By using network analysis and regression models, we find several novel results. At first, we found the exponential proliferation of awards across all genres of films and all professions of individuals and the uneven distribution of the number of awards in careers across time. More than 30% of the performers have won multiple awards. Second, we built an award network to reveal the interlocks between awards based on multiple award-winning phenomena. We found that for prestigious awards, 47% of the linkages were over-representative than the expectations from the null model. Furthermore, the performers’ collaboration network was highly clustered, exhibiting a high propensity of linkages between awarded performers. Lastly, our regression models revealed that multiple factors were related to performers’ early career success and award winning. Specifically, we showed that along with the performers’ historical achievements, their collaborators serve an important role in award winning after being nominated, with the scope and depth of the impact differing in the awards’ prestige. This work has strong implications for the harmonious dynamics of the movie industry and the career development of performers.
o
Oppenheimer IMDb reviews
opendatabay.com
.undefined
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Oppenheimer IMDb reviews [Dataset]. https://www.opendatabay.com/data/ai-ml/5fff8d2c-4db6-426f-9a39-64d7daa3059e
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 28, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Entertainment & Media Consumption
Description
"Oppenheimer," directed by the legendary Christopher Nolan, is set to grace theaters on July 21, 2023. This cinematic masterpiece offers an enthralling journey into history, recounting the extraordinary life of J. Robert Oppenheimer, a pivotal figure in the development of the atomic bomb during World War II.

License

CC0

Original Data Source: Oppenheimer IMDb reviews

FiveThirtyEight Biopics Dataset

kaggle.com

Updated Mar 26, 2019

Facebook

Twitter

Click to copy link

Link copied

Cite

FiveThirtyEight (2019). FiveThirtyEight Biopics Dataset [Dataset]. https://www.kaggle.com/fivethirtyeight/fivethirtyeight-biopics-dataset/discussion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 26, 2019

Dataset provided by

Kagglehttp://kaggle.com/

Authors

FiveThirtyEight

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Content

Biopics

This folder contains the data behind the story 'Straight Outta Compton' Is The Rare Biopic Not About White Dudes.

biopics.csv contains the following variables:

Variable	Definition
`title`	Title of the film.
`site`	URL from IMDB.
`country`	Country of origin.
`year_released`	Year of release.
`box_office`	Gross earnings at U.S. box office.
`director`	Director of film.
`number_of_subjects`	The number of subjects featured in the film.
`subject`	The actual name of the featured subject.
`type_of_subject`	The occupation of subject or reason for recognition.
`race_known`	Indicates whether the subject’s race was discernible based on background of self, parent, or grandparent.
`subject_race`	Race of the subject.
`person_of_color`	Dummy variable that indicates person of color.
`subject_sex`	Sex of subject.
`lead_actor_actress`	The actor or actress who played the subject.

Source: IMDb.

Context

This is a dataset from FiveThirtyEight hosted on their GitHub. Explore FiveThirtyEight data using Kaggle and all of the data sources available through the FiveThirtyEight organization page!

Update Frequency: This dataset is updated daily.

Acknowledgements

This dataset is maintained using GitHub's API and Kaggle's API.

This dataset is distributed under the Attribution 4.0 International (CC BY 4.0) license.

Cover photo by Denisse Leon on Unsplash
Unsplash Images are distributed under a unique Unsplash License.

c
Amazon prime tv shows and movies dataset
crawlfeeds.com
csv, zip
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Amazon prime tv shows and movies dataset [Dataset]. https://crawlfeeds.com/datasets/amazon-prime-tv-shows-and-movies-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
Jul 4, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Amazon Prime TV Shows and Movies Dataset offered by Crawl Feeds is an extensive resource containing over 92,000 records in JSON format. This dataset encompasses a wide array of data points, including links, titles, descriptions, release dates, genres, posters, streaming platforms, countries, number of seasons, content ratings, IMDb ratings, cast and crew details, unique identifiers, and scraping timestamps. Such comprehensive information is invaluable for researchers, data analysts, and developers aiming to conduct in-depth analyses, develop recommendation systems, or explore trends within Amazon Prime's content library.

For those interested in broader media datasets, Crawl Feeds also offers the Movies and TV Shows Dataset, which includes 118,000 records, and the IMDb Movie Details Dataset, comprising 250,000 records. These datasets provide extensive information across various platforms, facilitating comparative studies and cross-platform analyses.

Integrating these datasets into your projects can significantly enhance the depth and quality of your analyses, providing a robust foundation for exploring various facets of the entertainment industry. Whether you're developing a new application, conducting market research, or performing academic studies, these datasets serve as a valuable resource for gaining insights into the dynamic world of streaming media.

Explore the Amazon Prime TV Shows and Movies Dataset and other related datasets on Crawl Feeds to elevate your data-driven projects.
h
imdb_ckb
huggingface.co
Updated Aug 29, 2009
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Razhan Hameed (2009). imdb_ckb [Dataset]. https://huggingface.co/datasets/razhan/imdb_ckb
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 29, 2009
Authors
Razhan Hameed
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Dataset Card for IMDB Kurdish

Dataset Summary

Central Kurdish translation of the famous IMDB movie reviews dataset. The dataset contains 50K highly polar movie reviews, divided into two equal classes of positive and negative reviews. We can perform binary sentiment classification using this dataset. The availability of datasets in Kurdish, such as the IMDB movie reviews dataset, can help researchers and developers train and evaluate machine learning models for Kurdish… See the full description on the dataset page: https://huggingface.co/datasets/razhan/imdb_ckb.
o
Oppenheimer Film Audience Sentiment Data
opendatabay.com
.undefined
Updated Jul 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Oppenheimer Film Audience Sentiment Data [Dataset]. https://www.opendatabay.com/data/consumer/5fff8d2c-4db6-426f-9a39-64d7daa3059e
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 8, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Entertainment & Media Consumption
Description
This dataset provides IMDb user reviews for Christopher Nolan's highly anticipated film "**Oppenheimer**," which premiered on July 21, 2023. The film offers an engaging journey into history, recounting the extraordinary life of J. Robert Oppenheimer, a pivotal figure in the development of the atomic bomb during World War II. This collection of reviews allows for an insightful examination of public sentiment and audience reactions to this cinematic masterpiece.

Columns

Title: The title given by the user to their review.

Rating: The numerical score assigned by the user, expressed out of a maximum of 10.0.

Review: The full textual content of the user's opinion or critique.

Distribution

The dataset is presented in a tabular format, comprising individual user reviews linked with their respective ratings. It contains 2445 entries or rows. The ratings span from 1.00 to 10.00, with a significant proportion of scores concentrated in the higher ranges. While specific file type details are not provided, data files of this nature are typically available in formats such as CSV.

Usage

This dataset is ideally suited for: * Analysing audience sentiment and public opinion regarding the film "Oppenheimer." * Performing Natural Language Processing (NLP) tasks on unstructured movie review text, such as topic modelling or entity extraction. * Developing and evaluating sentiment analysis models to predict review polarity. * Visualising movie ratings distribution and identifying trends in audience reception. * Academic and market research into film criticism, audience engagement, and the public's response to historical dramas.

Coverage

Geographic Scope: Global, reflecting the worldwide reach of IMDb users.

Time Range: The reviews are specifically for the film "Oppenheimer," released on July 21, 2023. The dataset itself was listed on 27 June 2025, suggesting it captures reactions around or after its release.

Demographic Scope: Comprises user-submitted reviews from the IMDb platform; specific demographic breakdowns of reviewers are not included in the dataset details.

License

CC0

Who Can Use It

Data Scientists and Analysts: To train machine learning models for text classification, sentiment analysis, and recommender systems.

Film Critics and Researchers: To gain deeper insights into audience perceptions, identify recurring themes in feedback, and study the social impact of historical films.

Academics: For studies on online review platforms, collective intelligence, and the dynamics of public opinion in entertainment.

Marketing and Media Professionals: To understand audience reception, inform promotional strategies, and identify key discussion points surrounding the film.

Dataset Name Suggestions

Oppenheimer IMDb Reviews Dataset

Oppenheimer Film Audience Sentiment Data

Christopher Nolan's Oppenheimer User Reviews

Movie Ratings: Oppenheimer Audience Feedback

Attributes

Original Data Source: Oppenheimer IMDb reviews
350 000+ movies from themoviedb.org
kaggle.com
zip
Updated Oct 12, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephanerappeneau (2017). 350 000+ movies from themoviedb.org [Dataset]. https://www.kaggle.com/stephanerappeneau/350-000-movies-from-themoviedborg
Explore at:
zip(70483259 bytes)Available download formats
Dataset updated
Oct 12, 2017
Authors
Stephanerappeneau
Description
Context

I love movies.

I tend to avoid marvel-transformers-standardized products, and prefer a mix of classic hollywood-golden-age and obscure polish artsy movies. Throw in an occasional japanese-zombie-slasher-giallo as an alibi. Good movies don't exist without bad movies.

On average I watch 200+ movies each year, with peaks at more than 500 movies. Nine years ago I started to log my movies to avoid watching the same movie twice, and also assign scores. Over the years, it gave me a couple insights on my viewing habits but nothing more than what a tenth-grader would learn at school.

I've recently suscribed to Netflix and it pains me to see the global inefficiency of recommendation systems for people like me, who mostly swear by "La politique des auteurs". It's a term coined by famous new-wave french movie critic André Bazin, meaning that the quality of a movie is essentially linked to the director and it's capacity to execute his vision with his crew. We could debate it depends on movie production pipeline, but let's not for now. Practically, what it means, is that I essentially watch movies from directors who made films I've liked.

I suspect Neflix calibrate their recommandation models taking into account the way the "average-joe" chooses a movie. A few months ago I had read a study based on a survey, showing that people chose a movie mostly based on genre (55%), then by leading actors (45%). Director or Release Date were far behind around 10% each. It is not surprising, since most people I know don't care who the director is. Lots of US blockbusters don't even mention it on the movie poster. I am aware that collaborative filtering is based on user proximity , which I believe decreases (or even eliminates) the need to characterize a movie. So here I'm more interested in content based filtering which is based on product proximity for several reasons :

Users tastes are not easily accessible. It is, after all, Netflix treasure chest

Movie offer on Netflix is so bad for someone who likes author's movies that it wouldn't help

Modeling a movie intrinsic qualities is a nice challenge

Enough.

"*The secret of getting ahead is getting started*" (Mark Twain)

https://img11.hostingpics.net/pics/117765networkgraph.png" alt="network graph">

Content

The primary source is www.themoviedb.org. If you watch obscure artsy romanian homemade movies you may find only 95% of your movies referenced...but for anyone else it should be in the 98%+ range.

movies details are from www.themoviedb.org API : movies/details

movies crew & casting are from www.themoviedb.org API : movies/credits

both can be joined by id

they contain all 350k movies up, from end of 19th century to august 2017. If you remove short movies from imdb you get similar amounts of movies.

I uploaded the program to retrieve incremental movie details on github : https://github.com/stephanerappeneau/scienceofmovies/tree/master/PycharmProjects/GetAllMovies (need a dev API key from themoviedb.org though)

I have tried various supervised (decision tree) / unsupervised (clustering, NLP) approaches described in the discussions, source code is on github : https://github.com/stephanerappeneau/scienceofmovies

As a bonus I've uploaded the bio summary from top 500 critically-acclaimed directors from wikipedia, for some interesting NLTK analysis

Here is overview of the available sources that I've tried :

• Imdb.com free csv dumps (ftp://ftp.funet.fi/pub/mirrors/ftp.imdb.com/pub/temporaryaccess/) are badly documented, incomplete, loosely structured and impossible to join/merge. There's an API hosted by Amazon Web Service : 1€ every 100 000 requests. With around 1 million movies, it could become expensive also features are bare. So I've searched for other sources.

• www.themoviedb.org is based on crowdsourcing and has an excellent API, limited to 40 requests every 10 seconds. It is quite generous, well documented, and enough to sweep the 450 000 movies in a few days. For my purpose, data quality is not significantly worse than imdb, and as imdb key is also included there's always the possibility to complete my dataset later (I actually did it)

• www.Boxofficemojo.com has some interesting budget/revenue figures (which are sorely lacking in both imdb & tmdb), but it actually tracks only a few thousand movies, mainly blockbusters. There are other professional sources that are used by film industry to get better predictive / marketing insights but that's beyond my reach for this experiment.

• www.wikipedia.com is an interesting source with no real cap on API calls, however it requires a bit of webscraping and for movies or directors the layout and quality varies a lot, so I suspected it'd get a lot of work to get insights so I put this source in lower priority.

• www.google.com will ban you after a few minutes of web scraping because their job is to scrap data from others, than sell it, duh.

• It's worth mentionning that there are a few dumps of Netflix anonymized user tastes on kaggle, because they've organised a few competitions to improve their recommendation models. https://www.kaggle.com/netflix-inc/netflix-prize-data

• Online databases are largely white anglo-saxon centric, meaning bollywood (India is the 2nd bigger producer of movies) offer is mostly absent from datasets. I'm fine with that, as it's not my cup of tea plus I lack domain knowledge. The sheer amount of indian movies would probably skew my results anyway (I don't want to have too many martial-arts-musicals in my recommendations ;-)). I have, however, tremendous respect for indian movie industry so I'd love to collaborate with an indian cinephile ! https://img11.hostingpics.net/pics/340226westerns.png" alt="Westerns">

Inspiration

Starting from there, I had multiple problem statements for both supervised / unsupervised machine learning

Can I program a tailored-recommendation system based on my own criteria ?

What are the characteristics of movies/directors I like the most ?

What is the probability that I will like my next movie ?

Can I find the data ?

One of the objectives of sharing my work here is to find cinephile data-scientists who might be interested and, hopefully, contribute or share insights :) Other interesting leads : use tagline for NLP/Clustering/Genre guessing, leverage on budget/revenue, link with other data sources using the imdb normalized title, etc.

https://img11.hostingpics.net/pics/977004matrice.png" alt="Correlation matrix">

Motivation, Disclaimer and Acknowledgements

I've graduated from an french engineering school, majoring in artificial intelligence, but that was 17 years ago right in the middle of A.I-winter. Like a lot of white male rocket scientists, I've ended up in one of the leading european investment bank, quickly abandonning IT development to specialize in trading/risk project management and internal politics. My recent appointment in the Data Office made me aware of recent breakthroughts in datascience, and I thought that developing a side project would be an excellent occasion to learn something new. Plus it'd give me a well-needed credibility which too often lack decision makers when it comes to datascience.

I've worked on some of the features with Cédric Paternotte, a fellow friend of mine who is a professor of philosophy of sciences in La Sorbonne. Working with someone with a different background seem a good idea for motivation, creativity and rigor.

Kudos to www.themoviedb.org or www.wikipedia.com sites, who really have a great attitude towards open data. This is typically NOT the case of modern-bigdata companies who mostly keep data to themselves to try to monetize it. Such a huge contrast with imdb or instagram API, which generously let you grab your last 3 comments at a miserable rate. Even if 15 years ago this seemed a mandatory path to get services for free, I predict one day governments will need to break this data monopoly.

[Disclaimer : I apologize in advance for my engrish (I'm french ^-^), any bad-code I've written (there are probably hundreds of way to do it better and faster), any pseudo-scientific assumption I've made, I'm slowly getting back in statistics and lack senior guidance, one day I regress a non-stationary time series and the day after I'll discover I shouldn't have, and any incorrect use of machine-learning models]

https://img11.hostingpics.net/pics/898068408x161poweredbyrectanglegreen.png" alt="powered by themoviedb.org">
Z
Data from: ACTIV-ES: a comparable Spanish corpus comprised of film dialogue...
data.niaid.nih.gov
live.european-language-grid.eu
+1more
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jerid Francom (2020). ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1492612
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Jerid Francom
License
https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html
Description
DESCRIPTION: ACTIV-ES is a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions. Titles for each of these three countries were seeded from the Internet Movie Database, subtitle data for the hearing impaired was provided by Opensubtitles.org and was post-processed to correct/remove subtitle, OCR and diacritic artifacts and annotated for part-of-speech.

The data is available in two main formats: 1) running text for each document and 2) 1:5 gram aggregate files. Each format includes a plain text and part-of-speech annotated version. Document names reflect the language code, country, year, title, type, genre (first genre listed in the IMDb), and IMDb ID.

For more information about the development and evaluation of these resources and to cite this work refer to:

Francom, J., Hulden, M. and Ussishkin, A.. (2014) ACTIV-ES: a comparable, cross-dialect corpus of 'everyday' Spanish from Argentina, Mexico, and Spain. In Proceedings of the Ninth Annual Language Resources and Evaluation Conference, Reykjavik, Iceland. European Language Resources Association (ELRA).

In version .02 of the tagged running format corpus in the /eagles directory has been added which includes the EAGLES tagset. This tagset is much more fleshed out than the simplified tagset in the /tagged directory. For information on the tagset refer here: http://nlp.lsi.upc.edu/freeling/doc/tagsets/tagset-es.html.
Yelp Text Sentiment Analysis 2015
kaggle.com
Updated Feb 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dev Makwana (2024). Yelp Text Sentiment Analysis 2015 [Dataset]. https://www.kaggle.com/datasets/channingfisher/yelp-text-sentiment-analysis-2015
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2024
Dataset provided by
Kaggle
Authors
Dev Makwana
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
This dataset contains sentences labelled with positive or negative sentiment, extracted from reviews of products, movies, and restaurants.

UPDATE: Newer Version includes similar data from amazon, imdb.

Format:

sentence \t score

=======

Details:

Score is either 1 (for positive) or 0 (for negative)

The source for these sentences is: yelp.com

This dataset is an extract of a dataset created for the Paper 'From Group to Individual Labels using Deep Features', Kotzias et. al,. KDD 2015.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Crawl Feeds (2025). IMDB movie details dataset [Dataset]. https://crawlfeeds.com/datasets/imdb-movie-details-dataset

IMDB movie details dataset

IMDB movie details dataset from imdb.com

Explore at:

zip, csvAvailable download formats

Dataset updated

Jul 5, 2025

Dataset authored and provided by

Crawl Feeds

License

https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

Description

The IMDB Movie Details Dataset is a comprehensive collection of movie datasets that offers a treasure trove of information about movies, TV shows, and streaming content listed on IMDB. This dataset includes detailed data such as titles, release years, genres, cast, crew, ratings, and more, making it a go-to resource for film and entertainment enthusiasts. Ideal for data analysis, IMDB movie dataset applications span machine learning projects, predictive modeling, and insights into industry trends.

Researchers can explore patterns in movie ratings and genre popularity, while developers can use the dataset to build recommendation systems or applications. Movie buffs can dive deep into historical and contemporary trends in the world of cinema. This dataset not only supports academic and professional pursuits but also opens doors for creative projects in storytelling, content creation, and audience engagement. Whether you’re a developer, researcher, or film enthusiast, the IMDB movie dataset is a powerful tool for uncovering trends and gaining deeper insights into the evolving entertainment landscape.

Clear search

Close search

Google apps

Main menu

IMDB movie details dataset

imdb_reviews

Filtered IMDb Movies & TV Shows Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

IMDb Movie Review Sentiment

IMDb Movie Review Sentiment

Movie Review Sentiment

About this dataset

How to use the dataset

Research Ideas

Popular Movies of IMDb

Wikipedia Movie Plot Collection

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Relational In-Memory Database Report

Popular Movies of IMDb

Full Netflix Dataset

All Datasets:

In Memory Database Industry Report

DataSheet1_Quantifying Award Network and Career Development in the Movie...

Oppenheimer IMDb reviews

License

FiveThirtyEight Biopics Dataset

Content

Biopics

Context

Acknowledgements

Amazon prime tv shows and movies dataset

imdb_ckb

Oppenheimer Film Audience Sentiment Data

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

350 000+ movies from themoviedb.org

Context

Content

Inspiration

Motivation, Disclaimer and Acknowledgements

Data from: ACTIV-ES: a comparable Spanish corpus comprised of film dialogue...

Yelp Text Sentiment Analysis 2015

UPDATE: Newer Version includes similar data from amazon, imdb.

Format:

Details:

IMDB movie details dataset

IMDB movie details dataset from imdb.com