50 datasets found

T
imdb_reviews
tensorflow.org
kaggle.com
Updated Sep 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). imdb_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/imdb_reviews
Explore at:
Dataset updated
Sep 20, 2024
Description
Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('imdb_reviews', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
IMDB Dataset of 50K Movie Reviews - CLEANED
kaggle.com
zip
Updated Nov 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HQ Data Profiler (2025). IMDB Dataset of 50K Movie Reviews - CLEANED [Dataset]. https://www.kaggle.com/datasets/hqdataprofiler/imdb-dataset-of-50k-movie-reviews-cleaned
Explore at:
zip(26469422 bytes)Available download formats
Dataset updated
Nov 4, 2025
Authors
HQ Data Profiler
Description
The "IMDB Dataset of 50K Movie Reviews" dataset is a tabular dataset with listings for 50k reviews from IMDB. There are two fields: "review", containing the review text, and "sentiment", containing either the value "positive" or the value "negative".

Using HQ Data Profiler, data quality issues in the original dataset were identified and fixed and this CLEANED version prepared. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F29643712%2Fff70cdf355229a9160466f64a0816b4e%2FIMDB%20Promo.png?generation=1762216952842160&alt=media" alt="Data quality improvements"> HQ Data Profiler's comprehensive profile report showed that the original dataset contained 418 duplicated "review" values. All rows with duplicated review values were removed. The dataset was then balanced by randomly removing rows in the more populated sentiment category. Result: 24698 "positive" and 24698 "negative" reviews, with no duplicates.

Original dataset link (uncleaned): https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

Dataset citation ( https://ai.stanford.edu/~amaas/data/sentiment/ ): @InProceedings{maas-EtAl:2011:ACL-HLT2011, author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher}, title = {Learning Word Vectors for Sentiment Analysis}, booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies}, month = {June}, year = {2011}, address = {Portland, Oregon, USA}, publisher = {Association for Computational Linguistics}, pages = {142--150}, url = {http://www.aclweb.org/anthology/P11-1015} }
Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Movie Reviews Word2Vec Embeddings Dataset
kaggle.com
zip
Updated Jan 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Movie Reviews Word2Vec Embeddings Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/movie-reviews-word2vec-embeddings-dataset
Explore at:
zip(23254182 bytes)Available download formats
Dataset updated
Jan 17, 2023
Authors
The Devastator
Description
Movie Reviews Word2Vec Embeddings Dataset

Capturing Semantics in Textual Reviews

By Jared Fernandez [source]

About this dataset

This dataset contains a collection of Word2Vec embeddings for nearly 12,000 reviews from movies and other films. These embeddings allow the reviews to be represented in a meaningful way, providing insight into topics and trends present in the reviews. By utilizing this source of data, researchers can gain better understanding of language patterns that appear across various types of movie reviews. Additionally, models with these embeddings can be used to help create/improve models for sentiment analysis and other natural language processing tasks. Each row includes the reviewer's unique ID along with their review text and related word2vec embedding representing textual relationships found therein

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

How to Use this Dataset:

Download the dataset ‘Movie Reviews Word2Vec Embeddings’ from Kaggle.

This dataset contains an embedding type of word2vec, which is a type of neural network that creates high-dimensional vector representations of words based on their context in a training corpus.

Before making use of these embeddings, it’s important to understand what they are representing and how you can match them with other datasets for analysis purposes. The word2vec embeddings contain two columns – word (the specific word), and vec (the vector representation associated with that particular word).

To leverage the data from this text corpus effectively, it is important to first extract meaningful information out of them such as sentiment ratings or determining various topics that appears more frequently in movie reviews etc.. Sorting through millions of reviews will require automated processing – either by leveraging machine learning algorithms or using natural language processing to determine sentiment polarities and extracting relevant keywords/topics for each review.

You can also use the pre-processed Word Vectors (embeddings) along with supervised or unsupervised approaches available like Logistic Regression, BERT models etc.. to create features such as sentiment scoring or topic modelling - classifying texts into distinct categories etc.. That may be useful while doing some predictive analysis such as predicting movie ratings based on user reviews etc..

6 Once you have made use of the pre-processed data from this dataset, you can extend your model's performance further by having better understanding about how those words relate one another using the vectors derived from thems (i.e., Cosine Similarity measurement) which shows relatedness between words thus providing additional insights about relationships among different text fragments or paragraphs in documents eventually helping your model understand better contextual relationships while performing analytics tasks on text corpora involving movie reviews data!

Research Ideas

Automatically clustering movies with similar sentiment and themes.

Automatically generating movie plot summaries based on sentiment analysis of reviews.

Developing a movie recommendation system based on users’ preference in different genres or topics related to the movie in question

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

Columns

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Jared Fernandez.
b
IMDb Movie Reviews Dataset
berd-platform.de
bin
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew L. Maas; Raymond E. Daly; Peter T. Pham; Dan Huang; Andrew Y. Ng; Christopher Potts; Andrew L. Maas; Raymond E. Daly; Peter T. Pham; Dan Huang; Andrew Y. Ng; Christopher Potts (2025). IMDb Movie Reviews Dataset [Dataset]. http://doi.org/10.82939/z8gxk-w3567
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.82939/z8gxk-w3567
Dataset updated
Jul 31, 2025
Dataset provided by
Stanford University
Authors
Andrew L. Maas; Raymond E. Daly; Peter T. Pham; Dan Huang; Andrew Y. Ng; Christopher Potts; Andrew L. Maas; Raymond E. Daly; Peter T. Pham; Dan Huang; Andrew Y. Ng; Christopher Potts
License
https://ai.stanford.edu/~amaas/data/sentimenthttps://ai.stanford.edu/~amaas/data/sentiment
Description
The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The providers also include an additional 50,000 unlabeled documents for unsupervised learning.
The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset also contains an additional 50,000 unlabeled documents for unsupervised learning. See the README file contained in the release for more details.
The data is split into a train (25k reviews) and test (25k reviews) set. A preview file cannot be provided - please download the data directly from the data provider's website.
When using the dataset, please cite: Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).
IMDB 5000 Movie Dataset
kaggle.com
zip
Updated Dec 16, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yueming (2017). IMDB 5000 Movie Dataset [Dataset]. https://www.kaggle.com/datasets/carolzhangdc/imdb-5000-movie-dataset
Explore at:
zip(567524 bytes)Available download formats
Dataset updated
Dec 16, 2017
Authors
Yueming
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Dataset

This dataset was created by Yueming

Released under Database: Open Database, Contents: Database Contents

Contents
IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage)
crawlfeeds.com
csv, zip
Updated Nov 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage) [Dataset]. https://crawlfeeds.com/datasets/imdb-movies-metadata-dataset-4-5m-records-global-coverage
Explore at:
csv, zipAvailable download formats
Dataset updated
Nov 9, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Unlock one of the most comprehensive movie datasets available—4.5 million structured IMDb movie records, extracted and enriched for data science, machine learning, and entertainment research.

This dataset includes a vast collection of global movie metadata, including details on title, release year, genre, country, language, runtime, cast, directors, IMDb ratings, reviews, and synopsis. Whether you're building a recommendation engine, benchmarking trends, or training AI models, this dataset is designed to give you deep and wide access to cinematic data across decades and continents.

Perfect for use in film analytics, OTT platforms, review sentiment analysis, knowledge graphs, and LLM fine-tuning, the dataset is cleaned, normalized, and exportable in multiple formats.

What’s Included:

Genres: Drama, Comedy, Horror, Action, Sci-Fi, Documentary, and more

Delivery: Direct download

Use Cases:

Train LLMs or chatbots on cinematic language and metadata

Build or enrich movie recommendation engines

Run cross-lingual or multi-region film analytics

Benchmark genre popularity across time periods

Power academic studies or entertainment dashboards

Feed into knowledge graphs, search engines, or NLP pipelines
u
Amazon review data 2018
cseweb.ucsd.edu
nijianmo.github.io
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Amazon review data 2018 [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/
Explore at:
Dataset authored and provided by
UCSD CSE Research Project
Description
Context

This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:

More reviews:

The total number of reviews is 233.1 million (142.8 million in 2014).

New reviews:

Current data includes reviews in the range May 1996 - Oct 2018.

Metadata: - We have added transaction metadata for each review shown on the review page.

Added more detailed metadata of the product landing page.

Acknowledgements

If you publish articles based on this dataset, please cite the following paper:

Jianmo Ni, Jiacheng Li, Julian McAuley. Justifying recommendations using distantly-labeled reviews and fined-grained aspects. EMNLP, 2019.
IMDb Dataset (2024) updated
kaggle.com
zip
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Parth (2024). IMDb Dataset (2024) updated [Dataset]. https://www.kaggle.com/datasets/parthdande/imdb-dataset-2024-updated
Explore at:
zip(335942 bytes)Available download formats
Dataset updated
Jul 6, 2024
Authors
Parth
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains detailed information about movies listed on IMDb, including titles, genres, release dates, and ratings. It also includes user reviews and ratings, making it an excellent resource for sentiment analysis and trend analysis in the movie industry. This dataset can be used to gain insights into movie trends, audience preferences, and the correlation between movie attributes and ratings. The second file has additional feature called poster_src which is a link Movies poster image. The second is bigger than the first file and has a wider range of moives.
Full TMDB Movies Dataset 2024 (1M Movies)
kaggle.com
zip
Updated Nov 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
asaniczka (2025). Full TMDB Movies Dataset 2024 (1M Movies) [Dataset]. https://www.kaggle.com/datasets/asaniczka/tmdb-movies-dataset-2023-930k-movies
Explore at:
zip(239404730 bytes)Available download formats
Dataset updated
Nov 11, 2025
Authors
asaniczka
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
The TMDb (The Movie Database) is a comprehensive movie database that provides information about movies, including details like titles, ratings, release dates, revenue, genres, and much more.

This dataset contains a collection of 1,000,000 movies from the TMDB database.

Dataset is updated daily. If you find this dataset valuable, don't forget to hit the upvote button! 😊💝

Interesting Task Ideas:

Predict movie ratings based on features such as revenue, popularity, genre, and runtime.

Identify trends in movie release dates and analyze their impact on revenue.

Analyze the relationship between budget, revenue, and popularity to determine factors that contribute to a movie's success.

Build a recommendation system that suggests similar movies based on genres, production companies, and language.

Perform sentiment analysis on movie reviews to understand audience reactions.

Explore the impact of movie genres on popularity and revenue.

Investigate the correlation between runtime and audience engagement.

Identify successful production companies and analyze their strategies.

Utilize natural language processing techniques to extract meaningful insights from movie overviews.

Visualize movie popularity over time and identify popular genres in different periods.

Checkout my other datasets

Clash of Clans Clans Dataset 2023 (3.5M Clans)

Black-White Wage Gap in the USA Dataset

130K Kindle Books

USA Unemployment Rates by Demographics & Race

150K TMDb TV Shows

Photo by Onur Binay on Unsplash
T
IMDb Reviews Dataset of Spider-Man: No Way Home Film
dataverse.telkomuniversity.ac.id
tsv
Updated Apr 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Telkom University Dataverse (2022). IMDb Reviews Dataset of Spider-Man: No Way Home Film [Dataset]. http://doi.org/10.34820/FK2/BUS4WO
Explore at:
tsv(342144)Available download formats
Unique identifier
https://doi.org/10.34820/FK2/BUS4WO
Dataset updated
Apr 13, 2022
Dataset provided by
Telkom University Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset is used in the paper entitled "The Sentiment Analysis of Spider-Man: No Way Home Film Based on IMDb Reviews". Download full paper at http://jurnal.iaii.or.id/index.php/RESTI/article/view/3851.
IMDB Movie Ratings Dataset
kaggle.com
zip
Updated Jan 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). IMDB Movie Ratings Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/imdb-movie-ratings-dataset
Explore at:
zip(319960 bytes)Available download formats
Dataset updated
Jan 17, 2023
Authors
The Devastator
Description
IMDB Movie Ratings Dataset

Evaluating Directors, Actors, Genres, and Movie Titles

By Himanshu Sekhar Paul [source]

About this dataset

This inspiring IMDB Movie Dataset is a comprehensive database of movie ratings, featuring director_name, duration, actor_2_name, genres, actor_1_name, movie title and more. Whether you're a fan of dramatic thrillers or nostalgic '90s classics from our childhoods; here you'll find information about the most voted movies from users across the world. Delve into num_voted_users trends and discover the language each movie was released in to craft your very own personal film library of country-specific titles released in any given year. With this dataset at your disposal comparing imdb scores will never be easier! Who will come out top when the votes have been tallied? Dive into data for a journey unparalleled!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset offers a comprehensive overview of the movie ratings from IMDB. It includes data about director name, duration, actors, genres, movie title, number of votes, language, country of origin, year released and IMDB score.

To use this dataset to get a deeper understanding of how movies are rated on IMDB you can take the following steps:

Look through each column of the data to get an overall understanding. This will help you identify any specific trends or correlations in the data that you can then analyze further in later steps.

Take some time to explore relationships between different columns such as 'Number Voted Users' and 'IMDB Score' – it could be interesting to look at how these numbers relate with each other in order better understan rating trends on IMDB?

Analyze how particular sub-groups perform within various categories such as genre or country; this could provide insight into preferences towards certain types of movies or countries with higher associated scores than others?

Through your analysis try and gain answers to questions related to specific demographic groups on IMDB – are there distinct preferences among age groups when it comes to what they watch? Are there any clear correlations between rating and genre within certain countries? etc…

By utilizing the questions above and taking an initial 'big picture' view before diving into more detailed analysis users should be able find value from this dataset by uncovering useful insights about movie ratings on IMDB!

Research Ideas

Movie Recommendation System: The dataset can be used to build a movie recommendation system using machine learning algorithms like k-nearest neighbors or collaborative filtering. Based on the user's past ratings, the system can suggest relevant movies with similar genres, actors and directors.

Movie Popularity Index: Using the data, a metric could be designed that provides an overall popularity index for movies released over the years. This index could be constructed by considering factors such as IMDb score, number of votes and reviews collected, etc..

Genre-based Over/Under Performance Analysis: Based on genre selections in each movie year, this dataset can provide insight into which genres are performing well and which are not. This kind of analysis could help form important decisioning when deciding to allocate resources towards production budgeting or marketing campaigns for upcoming films in different genres across different regions or markets

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: movie_data.csv | Column name | Description | |:-------------------------|:---------------------------------------------------| | director_name | Name of the director of the movie. (String) | | duration | Length of the movie in minutes. (Integer) | | actor_2_name | Name of the second actor in the movie. (String) | | genres | Genre of the movie. (String) | | actor_1_name | Name of the first actor in the movie. (String) | | movie_title | Title of the movie. (String) | | num_voted_users | Number of users who voted for the movie. (Integer) | | actor_3_name | Name of the third actor in the movie. (String) | | movie_imdb_link | Link to the movie's IMDB page. (String) | | num_user_for_reviews |...
Real Movies Dataset
kaggle.com
zip
Updated Feb 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harshit Sharma (2024). Real Movies Dataset [Dataset]. https://www.kaggle.com/datasets/harshitstark/real-movies-dataset
Explore at:
zip(104062 bytes)Available download formats
Dataset updated
Feb 9, 2024
Authors
Harshit Sharma
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The "Real Movies Dataset" offers a comprehensive repository of diverse movie information, facilitating in-depth analysis and meaningful comparisons across various cinematic attributes. With its wealth of key details, this dataset serves as an invaluable resource for researchers, enthusiasts, and industry professionals alike. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F18544731%2Fbfb64d5c16fa1164befbde46928b7f83%2FMovies%20Kaggle.jpg?generation=1707490228580924&alt=media" alt=""> Each entry in the dataset includes the following attributes: * Movie Name: The title of the movie. * Year of Release: The year in which the movie was officially released to the public. * Watch Time: The duration of the movie in terms of hours and minutes, indicating the length of time required to watch the entire film. * Movie Rating: This refers to the rating assigned to the movie based on various criteria such as content, suitability for different age groups, and overall quality. Ratings could be numerical (e.g., out of 10). * Meatscore of Movie: This is a unique metric that represents the "meatiness" or substance of the movie. It might be a score assigned based on the complexity of the plot, character development, thematic depth, or other qualitative aspects. * Votes: The number of votes or ratings received by the movie from viewers or critics. This metric provides an indication of the movie's popularity or reception. * Gross: The total box office gross earnings generated by the movie, typically measured in a specific currency (e.g., USD). This metric reflects the commercial success of the film. * Description: The dataset includes a brief description field providing a summary or overview of the movie's plot, genre, themes, or notable aspects. This description offers context and insight into the content and style of each film, aiding in understanding and analysis.

Overall, the "Real Movies Dataset" serves as a valuable resource for researchers, analysts, and enthusiasts interested in exploring and studying the dynamics of the film industry, including trends in movie production, audience preferences, and financial performance.
IMDB Movies From 1920 to 2025
kaggle.com
zip
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raed Addala (2025). IMDB Movies From 1920 to 2025 [Dataset]. https://www.kaggle.com/datasets/raedaddala/imdb-movies-from-1960-to-2023
Explore at:
zip(46688739 bytes)Available download formats
Dataset updated
Mar 27, 2025
Authors
Raed Addala
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Over 60,000 Movies, 100+ Years of Data, and Rich Metadata!

Links:

Cleaned Data on Kaggle.

Step-by-step Kaggle Notebook that merges and cleans the data.

Download from GitHub Releases.

For details about the scraping process, explore the complete code repository on GitHub.

About the Dataset

This dataset provides annual data for the most popular 500–600 movies per year from 1920 to 2025, extracted from IMDb. It includes over 60,000 movies, spanning more than 100 years of cinematic history. Each year’s data is divided into three CSV files for flexibility and ease of use:
- imdb_movies_[year].csv: Basic movie details.
- advanced_movies_details_[year].csv: Comprehensive metadata and financial details.
- merged_movies_data_[year].csv: A unified dataset combining both files.

File Descriptions

1. imdb_movies_[year].csv

Essential movie information, including:
- Title: Movie title. - Description: Movie Description. - méta_score: IMDB's meta score. - Movie Link: IMDb URL for the movie.
- Year: Year of release.
- Duration: Runtime (in minutes).
- MPA: Motion Picture Association rating (e.g., PG, R).
- Rating: IMDb rating (scale of 1–10).
- Votes: Total user votes on IMDb.

2. advanced_movies_details_[year].csv

Detailed movie metadata:
- Link: IMDb URL (for linking with other data).
- budget: Production budget (in USD).
- grossWorldWide: Global box office revenue.
- gross_US_Canada: North American box office earnings.
- opening_weekend_Gross: Opening weekend revenue.
- directors: List of directors.
- writers: List of writers.
- stars: Main cast members.
- genres: Movie genres.
- countries_origin: Countries of production.
- filming_locations: Primary filming locations.
- production_companies: Associated production companies.
- Languages: Languages spoken in the movie.
- Award_information: Information about awards, nominations and wins.
- release_date: Official release date.

3. merged_movies_data_[year].csv

A unified dataset combining all columns from the previous two files:
- Basic Details: Title, Year, Rating, Votes.
- Advanced Features: budget, grossWorldWide, directors, genres, and awards.

Data Structure

Template Columns:
- imdb_movies_[year].csv:
Title, Year, Duration, MPA, Rating, Votes, meta_score, description, Movie Link

advanced_movies_details_[year].csv:
link, writers, directors, stars, budget, opening_weekend_Gross, grossWorldWide, gross_US_Canada, release_date, countries_origin, filming_locations, production_company, awards_content, genres, Languages

merged_movies_data_[year].csv:
Title, Year, Duration, MPA, Rating, Votes, meta_score, description, Movie Link, writers, directors, stars, budget, opening_weekend_Gross, grossWorldWide, gross_US_Canada, release_date, countries_origin, filming_locations, production_company, awards_content, genres, Languages

Updates

The dataset is updated annually in December to include the latest data.

Applications

This dataset is ideal for:
- Trend Analysis: Explore changes in the movie industry over six decades.
- Predictive Modeling: Build models to forecast box office revenue, ratings, or awards.
- Recommendation Systems: Use attributes like genres, cast, and ratings for personalized recommendations.
- Comparative Analysis: Study differences across eras, genres, or regions.

Dataset Features

Over 60,000 Movies: Detailed data from 1920 to 2025.

Rich Metadata: Financial, creative, and recognition-related attributes.

User-friendly: Modular files for tailored use or comprehensive merged files.

Consistency: Uniform structure enables seamless analysis.

Notes

For issues, suggestions, or feature requests, please feel free to contact me: send me an email or open an issue on GitHub. Your input is highly appreciated.
IMDB Movies Analysis - SQL
kaggle.com
zip
Updated Feb 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gaurav B R (2023). IMDB Movies Analysis - SQL [Dataset]. https://www.kaggle.com/datasets/gauravbr/imdb-movies-data-erd
Explore at:
zip(3818401 bytes)Available download formats
Dataset updated
Feb 21, 2023
Authors
Gaurav B R
Description
SQL IMDB Movies Analysis for RSVP (Film Production Company)

RSVP Movies is an Indian film production company which has produced many super-hit movies. They have usually released movies for the Indian audience but for their next project, they are planning to release a movie for the global audience in 2022.

The production company wants to plan their every move analytically based on data. We have taken the last three years IMDB movies data and carried out the analysis using SQL. We have analysed the data set and drew meaningful insights that could help them start their new project.

For our convenience, the entire analytics process has been divided into four segments, where each segment leads to significant insights from different combinations of tables. The questions in each segment with business objectives are written in the script given below. We have written the solution code below every question.
Movie Reviews Dataset
kaggle.com
zip
Updated Jan 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
czyzi0 (2023). Movie Reviews Dataset [Dataset]. https://www.kaggle.com/datasets/czyzi0/movie-reviews-dataset
Explore at:
zip(161459498 bytes)Available download formats
Dataset updated
Jan 30, 2023
Authors
czyzi0
Description
Description: This dataset contains movie reviews and their sentiment labels. All text were scraped from Internet from various websites in 2020. Reviews are available in few languages: cs, de, es, fr, pl, sk. Split into training and testing data is provided. There are three sentiment labels: - pos - for positive sentiment, - neg - for negative sentiment, - n\a - not assigned, can be used for some unsupervised learning.

Distribution of training data: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4135817%2F5035e68ab296b928f1511957cd2052fa%2Ftraining.png?generation=1675604158298685&alt=media" alt="">

Distribution of testing data: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4135817%2Ffccbe9806c21850cab6f4d9fe035ff5e%2Ftesting.png?generation=1675604176597583&alt=media" alt="">

License and copyright: The Movie Reviews Dataset is distributed under the CC BY-NC 4.0. The copyright remains with the original owners of the texts.

Notice and take down policy: Should you consider that data contains material that is owned by you and should therefore not be reproduced here, please: - Identify yourself, with contact data such as an email address at which you can be contacted. - Identify the copyrighted work claimed to be infringed. - Identify the material that is claimed to be infringing and information reasonably sufficient to allow me to locate the material. - Send the request to me.

I will comply to legitimate requests by removing the affected sources from the corpus.

I've collected these reviews for scientific purposes. It has been more than 2 years since publication date of any of these reviews. That's why I've decided to share this collection. This way other people will also be able to use it for educational purposes.
R
Football Game Film Angle Dataset
universe.roboflow.com
zip
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Football Analysis (2024). Football Game Film Angle Dataset [Dataset]. https://universe.roboflow.com/football-analysis-fm44i/football-game-film-angle
Explore at:
zipAvailable download formats
Dataset updated
Jul 17, 2024
Dataset authored and provided by
Football Analysis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Film Angles
Description
Football Game Film Angle

## Overview Football Game Film Angle is a dataset for classification tasks - it contains Film Angles annotations for 595 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Game of Thrones - A naturalistic viewing dataset
openneuro.org
Updated Nov 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kira Noad; David Watson; Timothy Andrews (2023). Game of Thrones - A naturalistic viewing dataset [Dataset]. http://doi.org/10.18112/openneuro.ds004848.v1.0.0
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds004848.v1.0.0
Dataset updated
Nov 16, 2023
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Kira Noad; David Watson; Timothy Andrews
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Game of Thrones - A naturalistic viewing dataset

Overview

This dataset contains fMRI movie-watching and category localiser data in 28 developmental prosopagnosics and 45 neurologically healthy controls. Participants are additionally grouped by their familiarity with the Game of Thrones television series.

In movie-watching scans, participants passively viewed a series of short audiovisual clips (ranging from 50 to 117 s duration; total duration = 12 min 58 s) taken from the Game of Thrones television series.

In category localiser scans, participants viewed images of faces, scenes, and phase scrambled versions of the face images. These can be used to define face and scene selective regions of interest.

Please refer to the folloiwng paper when using this dataset:

Noad, K., Watson, D.M., Andrews, T.J. (In review). Natural viewing reveals an extended network of regions for familiar faces that is disrupted in developmental prosopagnosia.

Data Contents

participants.tsv - List of subject IDs in control and developmental prosopagnosic groups, along with whether they were familiar or unfamiliar with Game of Thrones.

slice_timings.tsv, fsl_slice_timings.txt - Slice timings for functional scans. The TSV file gives the times in milliseconds, and the text file gives the times in normalised units of the TR suitable for entering into FEAT.

Scans were acquired with the HCP/CMRR multiband sequence. More information on slices timings can be found at: https://wiki.humanconnectome.org/download/attachments/40534057/CMRR_MB_Slice_Order.pdf

behavioural_measures.tsv - Scores on PI20, CFMT, and Game of Thrones quiz tasks (see below for more details). PI20 scores are out of 100. CFMT scores are given as percentage accuracies. Quiz scores are given as percentage accuracies over all questions as well as broken down by face, scene, and narrative questions.

Subject Directories - MRI data directories for each subject:

anat - T1 anatomical images

fmap - Magnitude and phase difference fieldmap images

func - Movie-watching (Game of Thrones) and category localiser data

Behavioural Measures

We provide two measures of face processing ability (PI20 and CFMT) and a quiz assessing familiarity with the Game of Thrones TV series. All participants completed the Game of Thrones quiz, and all developmental prosopagnosics completed the PI20 and CFMT assessments. Approximately half of the control subjects also completed the CFMT.

PI20 - 20-item prosopagnosia index, used as initial screening for developmental prosopagnosia. All developmental prosopagnosic participants comleted this.

Reference: Shah et al. (2015), Royal Society Open Science, 2(150305), 1-6.

CFMT - Cambridge Face Memory Test, used as secondary screening for developmental prosopagnosia. All developmentral prosopagnosic participants and approximately half of the control participants completed this.

Reference: Duchaine & Nakayama (2006), Neuropsychologia, 44(4), 576-585.

Game of Thrones Quiz - We developed this quiz to assess familiarity with the Game of Thrones television series. All participants completed this quiz. The quiz comprised 3 types of questions:

Face questions presented participants with a picture of a character, and participants had to provide a name or some defining biographical information for that character (e.g., "Jon Snow").

Scene questions similarly presented participants with a picture of a scene from the show and participants had to name or provide some details of the location (e.g, "King's Landing").

Narrative questions were 4-option multiple choice questions about key elements of the Game of Thrones story. For example, "Which character was Lord of Winterfell and was beheaded at the end of Season 1 - A) Daenerys Targaryen, B) Jon Snow, C) Ned Stark, or D) Tyrion Lannister?"

Notes

sub-DP15 is missing category localiser and fieldmap scans due to time constraints during the scanning session.
🎬📽️The MOTHER OF ALL MOVIE REVIEW DATASETS
kaggle.com
zip
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BwandoWando (2024). 🎬📽️The MOTHER OF ALL MOVIE REVIEW DATASETS [Dataset]. https://www.kaggle.com/datasets/bwandowando/rotten-tomatoes-9800-movie-critic-and-user-reviews/code
Explore at:
zip(4253772949 bytes)Available download formats
Dataset updated
Jul 17, 2024
Authors
BwandoWando
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Banner

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F0411cd02654d97cd74132c69908feae3%2FMEGAPACK3A.png?generation=1721222705178453&alt=media" alt="">

Context

The MOTHER OF ALL MOVIE REVIEW DATASETS for all your NLP, research, and learning needs!

Contents

10500 movies

56M+ user reviews!

1M+ critic reviews!

Movies as early as the early 1900's to 2024 can be found here!

English, French, Japanese, Hindi, and many more movies!

varying movies from very bad to blockbusters!

(and many more!)

Possible Usages

NLP

Sentiment Analysis

Topic Modelling

Research

Sentiment Analysis

Studying

Visualizations

(and many more!)

Collection Methodology

I wrote my own scripts to get data from Rotten Tomatoes

Image

Generated with Bing Image Generator

Note

I'm looking forward to the community creating and generating analyses, content, and insights from this MOTHER OF ALL MOVIE REVIEW DATASETS! @bwandowando
s
SEHI (Secondary Electron Hyperspectral Imaging) dataset of Metal alloy and...
orda.shef.ac.uk
zip
Updated Aug 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jingqiong Zhang; James Nohl; Nicholas Farr; Cornelia Rodenburg; Kerry Abrams; Kate Black; Lyudmila Mihaylova (2025). SEHI (Secondary Electron Hyperspectral Imaging) dataset of Metal alloy and Carbon film (Palladium Silver Carbon complex film) [Dataset]. http://doi.org/10.15131/shef.data.22202923.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.22202923.v2
Dataset updated
Aug 19, 2025
Dataset provided by
The University of Sheffield
Authors
Jingqiong Zhang; James Nohl; Nicholas Farr; Cornelia Rodenburg; Kerry Abrams; Kate Black; Lyudmila Mihaylova
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This data repository can be used as benchmark data for the purpose of material characterization, particularly for investigating nanostructures and chemical properties in materials using SEHI (Secondary Electron Hyperspectral Imaging), as well as research in Scanning Electron Microscopy and Secondary Electron (SE) spectroscopy, and advanced image processing and data analysis (computer vision and machine learning) techniques.This work is supported by the UK EPSRC EP/V012126/1 the grant ‘‘SEE MORE, MAKE MORE: Secondary Electron Energy Measurement Optimisation for Reliable Manufacturing of Key Materials’’. Contact: SM3 (SEE MORE MAKE MORE) project PI, Professor Cornelia Rodenburg, c.rodenburg@shefield.ac.uk.We also acknowledge the support from Insigneo Institute for In Silico Medicine in Sheffield.The complex metal alloy (palladium silver, abbreviated as Pd-Ag) and carbon films were printed by University of Liverpool, and a Helios Nanolab G3 UC microscope was used to acquire the raw image stacks [1]. One can find more information from [1] regarding the sample preparation, and experimental conditions. This dataset contains four processed SEHI stacks (cropped and aligned) collected from different regions of interest, and the associated metadata.[1] Abrams, K.J., Dapor, M., Stehling, N., Azzolini, M., Kyle, S.J., Schäfer, J., Quade, A., Mika, F., Kratky, S., Pokorna, Z., et al., 2019. Making sense of complex carbon and metal/carbon systems by secondary electron hyperspectral imaging. Advanced Science 6, 1900719.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2024). imdb_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/imdb_reviews

imdb_reviews

Explore at:

35 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Sep 20, 2024

Description

Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('imdb_reviews', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

Clear search

Close search

Google apps

Main menu

imdb_reviews

IMDB Dataset of 50K Movie Reviews - CLEANED

Datasets for Sentiment Analysis

Movie Reviews Word2Vec Embeddings Dataset

Movie Reviews Word2Vec Embeddings Dataset

Capturing Semantics in Textual Reviews

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

IMDb Movie Reviews Dataset

IMDB 5000 Movie Dataset

Dataset

Contents

IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage)

What’s Included:

Use Cases:

Amazon review data 2018

Context

Acknowledgements

IMDb Dataset (2024) updated

Full TMDB Movies Dataset 2024 (1M Movies)

Interesting Task Ideas:

Checkout my other datasets

IMDb Reviews Dataset of Spider-Man: No Way Home Film

IMDB Movie Ratings Dataset

IMDB Movie Ratings Dataset

Evaluating Directors, Actors, Genres, and Movie Titles

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Real Movies Dataset

IMDB Movies From 1920 to 2025

Over 60,000 Movies, 100+ Years of Data, and Rich Metadata!

Links:

About the Dataset

File Descriptions

1. imdb_movies_[year].csv

2. advanced_movies_details_[year].csv

3. merged_movies_data_[year].csv

Data Structure

Updates

Applications

Dataset Features

Notes

IMDB Movies Analysis - SQL

SQL IMDB Movies Analysis for RSVP (Film Production Company)

Movie Reviews Dataset

Football Game Film Angle Dataset

Football Game Film Angle

Game of Thrones - A naturalistic viewing dataset

Game of Thrones - A naturalistic viewing dataset

Overview

Data Contents

Behavioural Measures

Notes

🎬📽️The MOTHER OF ALL MOVIE REVIEW DATASETS

Banner

Context

Contents

Possible Usages

Collection Methodology

Image

Note

SEHI (Secondary Electron Hyperspectral Imaging) dataset of Metal alloy and...

imdb_reviewsSee More Versions

1. `imdb_movies_[year].csv`

2. `advanced_movies_details_[year].csv`

3. `merged_movies_data_[year].csv`

imdb_reviews