Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for "imdb"
Dataset Summary
Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
Supported Tasks and Leaderboards
More Information Needed
Languages
More Information Needed
Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/imdb.
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
This is the sentiment analysis dataset based on IMDB reviews initially released by Stanford University. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided. See the README file contained in the release for more… See the full description on the dataset page: https://huggingface.co/datasets/scikit-learn/imdb.
Facebook
TwitterThis is the IMDB dataset exactly same as ImDb Movie Reviews Dataset, contains the movie reviews.
The real dataset contains text files for training and testing purpose, but I created two csv files from those text files to ease the task ✌️ . Now you only need to download and apply your model. Each file contains 25000 reviews with label 0 for negative and 1 for positive. Each file has two columns 0 and 1, 0 represents reviews and 1 represents labels.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Facebook
Twitterdvilasuero/mini-imdb dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterIMDB Movie Reviews
This is a dataset for binary sentiment classification containing substantially huge data. This dataset contains a set of 50,000 highly polar movie reviews for training models for text classification tasks. The dataset is downloaded from https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz This data is processed and splitted into training and test datasets (0.2% test split). Training dataset contains 40000 reviews and test dataset contains 10000… See the full description on the dataset page: https://huggingface.co/datasets/ajaykarthick/imdb-movie-reviews.
Facebook
TwitterBy mahesh [source]
This IMDb Movies dataset contains information about some of the most beloved and critically praised films of all time. It includes a variety of features, such as the movie's title, original title, year published, date released, genre, duration in minutes, country of origin, language spoken in the movie, director and writer credits, production company responsible for its creation and distribution. Additionally we've included field descriptions for each actor involved as well members member who had a role in its makeup or promotion. Along with these fields we can also see detailed reviews from users and critics alike regarding the film’s basis; thereby providing a comprehensive set to evaluate how different generations have rated it throughout the years. Our selection even offers a description field offering viewers an intimate peek into its plot line before watching if desired! Finally you can discover what kind of budget was appropriated to make this movie possible along with gross income both domestically and globally worldwide! So grab your popcorn and search within this dataset today to find out more info on some classic cinematic favorites!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
In order to use this dataset properly, it is important to become familiar with the columns that make up the data set. The columns include: title, original_title, year, date_published ,genre, duration, country , language , director , writer , production_company , actors , :description avg_vote votes budget usa_gross income metascore reviews from users reviews from critics .
By studying the various columns in this dataset you can discover trends in movies over time such as genres gaining in popularity or budgets increasing or decreasing annually. Additionally you can compare productions companies or directors over time to see how their output has changed or if they produce consistently well-regarded content. Finally by looking at actors over time you can track whether particular actors have experienced ups and downs in their career as well as seeing which actors have remained popular for extended periods of times thanks to larger bodies of work.
With so many data points available it is easy to come up with dozens of questions that this dataset could help answer about movies both past present & future! Have fun exploring!
- Identifying movie trends in different countries, such as genre preference and budget size.
- Studying how aspects of the movie, such as actors, writers and crew, influence ratings and gross income.
- Analysing reviews from critics and users to understand correlations between reviews and metascores or vote values
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: IMDb names.csv | Column name | Description | |:-----------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | title | The title of the movie. (String) | | original_title | The original title of the movie (in case it was changed in other languages)...
Facebook
TwitterThis dataset is compiled using this dataset from GitHub.
Data Description Table
| Variable Name | Description |
|---|---|
movie_title | Title of the Movie |
duration | Duration in minutes |
director_name | Name of the Director of the Movie |
director_facebook_likes | Number of likes of the Director on his Facebook Page |
actor_1_name | Primary actor starring in the movie |
actor_1_facebook_likes | Number of likes of the Actor_1 on his/her Facebook Page |
actor_2_name | Other actor starring in the movie |
actor_2_facebook_likes | Number of likes of the Actor_2 on his/her Facebook Page |
actor_3_name | Other actor starring in the movie |
actor_3_facebook_likes | Number of likes of the Actor_3 on his/her Facebook Page |
num_user_for_reviews | Number of users who gave a review |
num_critic_for_reviews | Number of critical reviews on imdb |
num_voted_users | Number of people who voted for the movie |
cast_total_facebook_likes | Total number of facebook likes of the entire cast of the movie |
movie_facebook_likes | Number of Facebook likes in the movie page |
plot_keywords | Keywords describing the movie plot |
facenumber_in_poster | Number of the actor who featured in the movie poster |
color | Film colorization. ‘Black and White’ or ‘Color’ |
genres | Film categorization like ‘Animation’, ‘Comedy’, etc |
title_year | The year in which the movie is released (1916:2016) |
language | Languages like English, Arabic, Chinese, etc |
country | Country where the movie is produced |
content_rating | Content rating of the movie |
aspect_ratio | Aspect ratio the movie was made in |
movie_imdb_link | IMDB link of the movie |
gross | Gross earnings of the movie in Dollars |
budget | Budget of the movie in Dollars |
imdb_score | IMDB Score of the movie on IMDB |
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is a collection used for sentiment analysis of movie comments on the IMDB website. This dataset includes comments from IMDB users classified into three different sentiment levels: negative, neutral, and positive. Each comment is accompanied by a sentiment label.
The file stopwords.txt contains a list of common words that are often removed during text preprocessing. These words are specifically designed for sentiment analysis.
This dataset is used to train and evaluate deep learning models in the task of classifying sentiment of movie comments from IMDB into negative, neutral, and positive groups.
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
title.akas.csv
titleId (string) - a tconst, an alphanumeric unique identifier of the title ordering (integer) – a number to uniquely identify rows for a given titleId title (string) – the localized title region (string) - the region for this version of the title language (string) - the language of the title types (array) - Enumerated set of attributes for this alternative title. One or more of the following: "alternative", "dvd", "festival", "tv", "video", "working", "original"… See the full description on the dataset page: https://huggingface.co/datasets/labofsahil/IMDb-Dataset.
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Q-b1t/IMDB-Dataset-of-50K-Movie-Reviews-Backup dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was scraped based on the popularity of IMDb movies (highest to lowest popularity).
There are total 9083 movies in the dataset.
!UNCLEAN VERSION: IMDbMovies
Title: The name of the movie.
Summary: A brief overview of the movie's plot.
Director: The person responsible for overseeing the creative aspects of the film.
Writer: The individual who crafted the screenplay and story for the movie.
Main Genres: The primary categories or styles that the movie falls under.
Motion Picture Rating: The age-appropriate classification for viewers.
Motion Picture Rating Categories:
G (General Audience): Suitable for all ages; no offensive content.
PG (Parental Guidance): May contain mild language, violence, or thematic elements; parental guidance advised.
PG-13 (Parents Strongly Cautioned): Some material may be inappropriate for those under 13; more intense violence, language, or suggestive content.
R (Restricted): Restricted to viewers over 17 or 18; may contain adult themes, strong language, sexual content, or violence.
NC-17 (Adults Only): Restricted to adults 17 and older; may contain explicit sexual content or graphic violence.
Runtime: The total duration of the movie.
Release Year: The year in which the movie was officially released.
Rating: The average score given to the movie by viewers.
Number of Ratings: The total count of ratings submitted by viewers.
Budget: The estimated cost of producing the movie.
Gross in US & Canada: The total earnings from the movie's screening in the United States and Canada.
Gross worldwide: The overall worldwide earnings of the movie.
Opening Weekend Gross in US & Canada: The amount generated during the initial weekend of the movie's release in the United States and Canada.
!CLEAN VERSION: IMDbMovies-Clean
What I did:
I keep all missing values. Most of the cases missing values stem from lack of information in the website. There is few cases missing values stem from scraper. For example: Some movies will release in 2024 and there are no runtimes and ratings for these movies.
I changed the syntax of the 'Runtime', 'Rating', 'Number of Ratings', 'Budget', 'Gross in US & Canada', 'Gross worldwide', and 'Opening Weekend Gross in US & Canada' columns.
In some cases, I utilized the information from a single column to create two separate columns.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Unlock one of the most comprehensive movie datasets available—4.5 million structured IMDb movie records, extracted and enriched for data science, machine learning, and entertainment research.
This dataset includes a vast collection of global movie metadata, including details on title, release year, genre, country, language, runtime, cast, directors, IMDb ratings, reviews, and synopsis. Whether you're building a recommendation engine, benchmarking trends, or training AI models, this dataset is designed to give you deep and wide access to cinematic data across decades and continents.
Perfect for use in film analytics, OTT platforms, review sentiment analysis, knowledge graphs, and LLM fine-tuning, the dataset is cleaned, normalized, and exportable in multiple formats.
Genres: Drama, Comedy, Horror, Action, Sci-Fi, Documentary, and more
Train LLMs or chatbots on cinematic language and metadata
Build or enrich movie recommendation engines
Run cross-lingual or multi-region film analytics
Benchmark genre popularity across time periods
Power academic studies or entertainment dashboards
Feed into knowledge graphs, search engines, or NLP pipelines
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘IMDB Movies Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harshitshankhdhar/imdb-dataset-of-top-1000-movies-and-tv-shows on 13 November 2021.
--- Dataset description provided by original source is as follows ---
IMDB Dataset of top 1000 movies and tv shows. You can find the EDA Process on - https://www.kaggle.com/harshitshankhdhar/eda-on-imdb-movies-dataset
Please consider UPVOTE if you found it useful.
Data:- - Poster_Link - Link of the poster that imdb using - Series_Title = Name of the movie - Released_Year - Year at which that movie released - Certificate - Certificate earned by that movie - Runtime - Total runtime of the movie - Genre - Genre of the movie - IMDB_Rating - Rating of the movie at IMDB site - Overview - mini story/ summary - Meta_score - Score earned by the movie - Director - Name of the Director - Star1,Star2,Star3,Star4 - Name of the Stars - No_of_votes - Total number of votes - Gross - Money earned by that movie
--- Original source retains full ownership of the source dataset ---
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IMDB movie review sentiment classification dataset (Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011)). For more information please refer to: https://ai.stanford.edu/~amaas/data/sentiment/
The IMDB dataset was modified as follows to prepare it for use in a Galaxy Training Tutorial (https://training.galaxyproject.org/):
The top 50 words are excluded (mostly stop words). Included the next 10,000 top words. Reviews are limited to 500 words max (Longer reviews trimmed and shorter reviews are padded). 25,000 reviews are used for training and testing each. Files are in tsv (tab separated value) format to be consumed by Galaxy (www.usegalaxy.org).
Facebook
Twitterhttps://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
ImdbClassification An MTEB dataset Massive Text Embedding Benchmark
Large Movie Review Dataset
Task category t2c
Domains Reviews, Written
Reference http://www.aclweb.org/anthology/P11-1015
How to evaluate on this task
You can evaluate an embedding model on this dataset using the following code: import mteb
task = mteb.get_tasks(["ImdbClassification"]) evaluator = mteb.MTEB(task)
model = mteb.get_model(YOUR_MODEL) evaluator.run(model)
To learn more… See the full description on the dataset page: https://huggingface.co/datasets/mteb/imdb.
Facebook
TwitterThe MovieLens-IMDB dataset is a collection of user ratings for movies, with each rating indicating the user's preference for the movie.
Facebook
Twitterhttps://ai.stanford.edu/~amaas/data/sentimenthttps://ai.stanford.edu/~amaas/data/sentiment
The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The providers also include an additional 50,000 unlabeled documents for unsupervised learning.
The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset also contains an additional 50,000 unlabeled documents for unsupervised learning. See the README file contained in the release for more details.
The data is split into a train (25k reviews) and test (25k reviews) set. A preview file cannot be provided - please download the data directly from the data provider's website.
When using the dataset, please cite: Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
test3534/imdb dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for "imdb"
Dataset Summary
Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
Supported Tasks and Leaderboards
More Information Needed
Languages
More Information Needed
Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/imdb.