Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for "imdb"
Dataset Summary
Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
Supported Tasks and Leaderboards
More Information Needed
Languages
More Information Needed
Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/imdb.
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
This is the sentiment analysis dataset based on IMDB reviews initially released by Stanford University. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided. See the README file contained in the release for more… See the full description on the dataset page: https://huggingface.co/datasets/scikit-learn/imdb.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains detailed information about movies listed on IMDb, including titles, genres, release dates, and ratings. It also includes user reviews and ratings, making it an excellent resource for sentiment analysis and trend analysis in the movie industry. This dataset can be used to gain insights into movie trends, audience preferences, and the correlation between movie attributes and ratings. The second file has additional feature called poster_src which is a link Movies poster image. The second is bigger than the first file and has a wider range of moives.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The IMDb dataset is a collection of 50,000 reviews from the Internet Movie Database (IMDb). The reviews are labeled as either positive or negative and are split into two sets of 25,000 reviews for training and testing. Each set contains an equal number of positive and negative reviews.
The IMDb dataset is a binary sentiment analysis dataset for natural language processing or text analytics. It contains more data than previous benchmark datasets.
IMDb is a rich source of film data that includes cast and crew lists, movie release dates, box office information, plot summaries, trailers, actor and director biographies, and other trivia. Information on IMDb comes from a variety of sources, such as filmmakers, film studios, on-screen credits, and other official sources.
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Q-b1t/IMDB-Dataset-of-50K-Movie-Reviews-Backup dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
For details about the scraping process, explore the complete code repository on GitHub.
This dataset provides annual data for the most popular 500–600 movies per year from 1920 to 2025, extracted from IMDb. It includes over 60,000 movies, spanning more than 100 years of cinematic history. Each year’s data is divided into three CSV files for flexibility and ease of use:
- imdb_movies_[year].csv: Basic movie details.
- advanced_movies_details_[year].csv: Comprehensive metadata and financial details.
- merged_movies_data_[year].csv: A unified dataset combining both files.
imdb_movies_[year].csvEssential movie information, including:
- Title: Movie title.
- Description: Movie Description.
- méta_score: IMDB's meta score.
- Movie Link: IMDb URL for the movie.
- Year: Year of release.
- Duration: Runtime (in minutes).
- MPA: Motion Picture Association rating (e.g., PG, R).
- Rating: IMDb rating (scale of 1–10).
- Votes: Total user votes on IMDb.
advanced_movies_details_[year].csvDetailed movie metadata:
- Link: IMDb URL (for linking with other data).
- budget: Production budget (in USD).
- grossWorldWide: Global box office revenue.
- gross_US_Canada: North American box office earnings.
- opening_weekend_Gross: Opening weekend revenue.
- directors: List of directors.
- writers: List of writers.
- stars: Main cast members.
- genres: Movie genres.
- countries_origin: Countries of production.
- filming_locations: Primary filming locations.
- production_companies: Associated production companies.
- Languages: Languages spoken in the movie.
- Award_information: Information about awards, nominations and wins.
- release_date: Official release date.
merged_movies_data_[year].csvA unified dataset combining all columns from the previous two files:
- Basic Details: Title, Year, Rating, Votes.
- Advanced Features: budget, grossWorldWide, directors, genres, and awards.
Template Columns:
- imdb_movies_[year].csv:
Title, Year, Duration, MPA, Rating, Votes, meta_score, description, Movie Link
advanced_movies_details_[year].csv:
link, writers, directors, stars, budget, opening_weekend_Gross, grossWorldWide, gross_US_Canada, release_date, countries_origin, filming_locations, production_company, awards_content, genres, Languages
merged_movies_data_[year].csv:
Title, Year, Duration, MPA, Rating, Votes, meta_score, description, Movie Link, writers, directors, stars, budget, opening_weekend_Gross, grossWorldWide, gross_US_Canada, release_date, countries_origin, filming_locations, production_company, awards_content, genres, Languages
The dataset is updated annually in December to include the latest data.
This dataset is ideal for:
- Trend Analysis: Explore changes in the movie industry over six decades.
- Predictive Modeling: Build models to forecast box office revenue, ratings, or awards.
- Recommendation Systems: Use attributes like genres, cast, and ratings for personalized recommendations.
- Comparative Analysis: Study differences across eras, genres, or regions.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Facebook
TwitterIMDB Movie Reviews
This is a dataset for binary sentiment classification containing substantially huge data. This dataset contains a set of 50,000 highly polar movie reviews for training models for text classification tasks. The dataset is downloaded from https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz This data is processed and splitted into training and test datasets (0.2% test split). Training dataset contains 40000 reviews and test dataset contains 10000… See the full description on the dataset page: https://huggingface.co/datasets/ajaykarthick/imdb-movie-reviews.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
9
Facebook
Twitterdvilasuero/mini-imdb dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by Yueming
Released under Database: Open Database, Contents: Database Contents
Facebook
Twitterhttps://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
ImdbClassification An MTEB dataset Massive Text Embedding Benchmark
Large Movie Review Dataset
Task category t2c
Domains Reviews, Written
Reference http://www.aclweb.org/anthology/P11-1015
How to evaluate on this task
You can evaluate an embedding model on this dataset using the following code: import mteb
task = mteb.get_tasks(["ImdbClassification"]) evaluator = mteb.MTEB(task)
model = mteb.get_model(YOUR_MODEL) evaluator.run(model)
To learn more… See the full description on the dataset page: https://huggingface.co/datasets/mteb/imdb.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is a dump generated by pg_dump -Fc of the IMDb data used in the "How Good are Query Optimizers, Really?" paper. PostgreSQL compatible SQL queries and scripts to automatically create a VM with this dataset can be found here: https://git.io/imdb
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Unlock one of the most comprehensive movie datasets available—4.5 million structured IMDb movie records, extracted and enriched for data science, machine learning, and entertainment research.
This dataset includes a vast collection of global movie metadata, including details on title, release year, genre, country, language, runtime, cast, directors, IMDb ratings, reviews, and synopsis. Whether you're building a recommendation engine, benchmarking trends, or training AI models, this dataset is designed to give you deep and wide access to cinematic data across decades and continents.
Perfect for use in film analytics, OTT platforms, review sentiment analysis, knowledge graphs, and LLM fine-tuning, the dataset is cleaned, normalized, and exportable in multiple formats.
Genres: Drama, Comedy, Horror, Action, Sci-Fi, Documentary, and more
Train LLMs or chatbots on cinematic language and metadata
Build or enrich movie recommendation engines
Run cross-lingual or multi-region film analytics
Benchmark genre popularity across time periods
Power academic studies or entertainment dashboards
Feed into knowledge graphs, search engines, or NLP pipelines
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The "IMDB Dataset of Movies Reviews and Translation" dataset has been expanded significantly and is now available on Kaggle in a modified version. Three new columns have been added to the dataset: genres, descriptions, and emotions. The original dataset only had four columns: ratings, reviews, movies, and resenhas. This extension adds to the dataset's richness and offers insightful information about movie genres, in-depth synopses, and the sentimentality of the reviews.
The addition of the Genres column provides an extensive movie classification that enables scholars and film aficionados to explore particular genres and their traits in greater detail. By examining patterns, trends, and preferences across various genres, analysts can use this data to create more specialized research and moviegoer suggestions.
The newly added Descriptions column is a valuable addition as it provides textual summaries or synopses of each movie. These descriptions offer a concise overview of the plot, characters, and themes, making it easier for users to understand and evaluate movies of interest. Researchers can leverage this information to conduct sentiment analysis, topic modeling, or recommendation systems based on movie summaries.
Finally, the Emotions column adds an intriguing dimension to the dataset. By capturing the emotional tone expressed within each description, this column allows for a deeper understanding of sentiments toward the movies. Sentiment analysis techniques can be applied to this data, enabling researchers to gain insights into emotions: like joy, anger, sadness, and more emotions associated with different movies. This information can be particularly valuable for filmmakers, production companies, marketers looking to gauge audience reactions and tailor their strategies accordingly and especially for moviegoers who like to watch movies based on emotions.
Overall, the expanded version of the "50k Movie Reviews" dataset offers a wealth of new information that fosters detailed analysis and exploration of movie genres, descriptions, and emotional responses. This dataset presents a valuable resource for researchers, data scientists, and movie enthusiasts alike, enabling a deeper understanding of the movie landscape and facilitating the development of innovative tools and applications in the field of movie analysis and recommendation systems.
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
title.akas.csv
titleId (string) - a tconst, an alphanumeric unique identifier of the title ordering (integer) – a number to uniquely identify rows for a given titleId title (string) – the localized title region (string) - the region for this version of the title language (string) - the language of the title types (array) - Enumerated set of attributes for this alternative title. One or more of the following: "alternative", "dvd", "festival", "tv", "video", "working", "original"… See the full description on the dataset page: https://huggingface.co/datasets/labofsahil/IMDb-Dataset.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
From IMDB's database, I downloaded two datasets of actors and movies. I then cleaned and merged the datasets for a combined dataset containing known actors and relevant information, including a movie they appeared in.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘IMDB Movies Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harshitshankhdhar/imdb-dataset-of-top-1000-movies-and-tv-shows on 13 November 2021.
--- Dataset description provided by original source is as follows ---
IMDB Dataset of top 1000 movies and tv shows. You can find the EDA Process on - https://www.kaggle.com/harshitshankhdhar/eda-on-imdb-movies-dataset
Please consider UPVOTE if you found it useful.
Data:- - Poster_Link - Link of the poster that imdb using - Series_Title = Name of the movie - Released_Year - Year at which that movie released - Certificate - Certificate earned by that movie - Runtime - Total runtime of the movie - Genre - Genre of the movie - IMDB_Rating - Rating of the movie at IMDB site - Overview - mini story/ summary - Meta_score - Score earned by the movie - Director - Name of the Director - Star1,Star2,Star3,Star4 - Name of the Stars - No_of_votes - Total number of votes - Gross - Money earned by that movie
--- Original source retains full ownership of the source dataset ---
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IMDB movie review sentiment classification dataset (Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011)). For more information please refer to: https://ai.stanford.edu/~amaas/data/sentiment/
The IMDB dataset was modified as follows to prepare it for use in a Galaxy Training Tutorial (https://training.galaxyproject.org/):
The top 50 words are excluded (mostly stop words). Included the next 10,000 top words. Reviews are limited to 500 words max (Longer reviews trimmed and shorter reviews are padded). 25,000 reviews are used for training and testing each. Files are in tsv (tab separated value) format to be consumed by Galaxy (www.usegalaxy.org).
Facebook
Twitterhttps://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Dataset Card for IMDB-BINARY (IMDb-B)
Dataset Summary
The IMDb-B dataset is "a movie collaboration dataset that consists of the ego-networks of 1,000 actors/actresses who played roles in movies in IMDB. In each graph, nodes represent actors/actress, and there is an edge between them if they appear in the same movie. These graphs are derived from the Action and Romance genres".
Supported Tasks and Leaderboards
IMDb-B should be used for graph classification… See the full description on the dataset page: https://huggingface.co/datasets/graphs-datasets/IMDB-BINARY.
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for "imdb"
Dataset Summary
Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
Supported Tasks and Leaderboards
More Information Needed
Languages
More Information Needed
Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/imdb.