Facebook
Twitter# 🏆 IMDB Top 100 Movies Dataset
This dataset contains detailed information about the Top 100 movies from IMDb, collected to assist film enthusiasts, data analysts, and machine learning practitioners in exploring trends and insights in the film industry.
Each movie entry includes: 🎬 Title – Name of the movie 📅 Year – Year of release ⭐ Rating – IMDb user rating (out of 10) 📣 Genres – List of genres the movie belongs to 🎥 Director – Director(s) of the movie 👥 Stars – Leading cast ⏱️ Runtime – Duration in minutes 📝 Summary – A brief synopsis of the movie 🧾 Votes – Number of user votes 💰 Gross – Box office gross (if available)
Data Visualization: Create graphs showing rating trends, genre distributions, etc. Recommendation Systems: Build a content-based movie recommender. NLP Projects: Use summaries for natural language processing tasks. Exploratory Data Analysis: Great dataset for practicing EDA techniques.
The data is derived from IMDb's public listings and compiled into JSON format for easy use in Python-based projects.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset is a json file that contains movie information from imdb Fields in json: title, year, rating, genre, director, votes
The first 100 entries were directly from IMDB Top 100 Movies - https://www.kaggle.com/datasets/prakash27x/imdb-top-100-movies
The next 10 entries are movies produced in 2007 (for a database management project) and were scraped from IMDB by me
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Card for IMDb Multi-Movie Review Dataset
Dataset Summary
The IMDb Multi-Movie Review Dataset contains approximately 114,000 user reviews collected from over 150 movies on IMDb.Each movie is stored as a separate JSON file, identified by its movie_id (IMDb ID).Each JSON file includes a list of structured reviews, where every review consists of:
title: A short summary or headline of the review. review: The full detailed user review. rating: A numeric rating (1–10)… See the full description on the dataset page: https://huggingface.co/datasets/Daksh0505/IMDB-Reviews.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Explore our meticulously curated Movies dataset and TV shows dataset, designed to cater to diverse analytical and research needs. Whether you're a data scientist, a student, or a business professional, these datasets provide valuable insights into the entertainment industry.
Extensive collection of global movies across various genres and languages.
Detailed metadata, including titles, release dates, genres, directors, cast, and ratings.
Regularly updated to ensure relevance and accuracy.
Our TV shows dataset is your gateway to understanding trends in episodic content. It includes:
Comprehensive details about popular and niche TV shows.
Information on episode counts, seasons, ratings, and networks.
Insights into audience preferences and regional programming.
These datasets are perfect for:
Machine learning models for recommendation systems.
Academic research on media trends and audience behavior.
Business strategies for entertainment platforms.
Unlock the power of TV show data with our Crawl Feeds TV Shows Dataset. Start analyzing today and gain valuable insights into your favorite shows!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context:
I made this dataset for "Unlock cinematic gems with a dataset featuring IMDB's top-rated movies, ensuring precise and exceptional movie recommendations for an unparalleled viewing experience."
source: The dataset was collected from The Movie Database (TMDB) using a valid API key. The CSV data was scrape https://api.themoviedb.org/3/movie/top_rated/ by ensuring proper authorization to access their database .
The raw data obtained from API responses was processed to extract relevant information. This may include parsing JSON responses, handling pagination, and cleaning the data to ensure consistency.
Inspiration: The inspiration behind making this dataset is that you can build a recommendation system for your project and you can also do EDA on this dataset and make your mini project.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Bringing you another scraping exercise with BeautifulSoup and Selenium. If you are interested in the scrapper, you can check out this link. .
MovieFolder/
-metadata.json
-movieReviews.csv
Movie: Number of User Reviews - SpiderMan No Way Home': 6034 - Joker': 11357, - Avengers Endgame: 9513 - The Dark Knight: 7642 - Forrest Gump: 2960 - Pulp Fiction: 3475 - The Avengers: 2081 - Morbius: 1910 - Thor: 1864 - John Wick 3: 2417
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Amazon Prime TV Shows and Movies Dataset offered by Crawl Feeds is an extensive resource containing over 92,000 records in JSON format. This dataset encompasses a wide array of data points, including links, titles, descriptions, release dates, genres, posters, streaming platforms, countries, number of seasons, content ratings, IMDb ratings, cast and crew details, unique identifiers, and scraping timestamps. Such comprehensive information is invaluable for researchers, data analysts, and developers aiming to conduct in-depth analyses, develop recommendation systems, or explore trends within Amazon Prime's content library.
For those interested in broader media datasets, Crawl Feeds also offers the Movies and TV Shows Dataset, which includes 118,000 records, and the IMDb Movie Details Dataset, comprising 250,000 records. These datasets provide extensive information across various platforms, facilitating comparative studies and cross-platform analyses.
Integrating these datasets into your projects can significantly enhance the depth and quality of your analyses, providing a robust foundation for exploring various facets of the entertainment industry. Whether you're developing a new application, conducting market research, or performing academic studies, these datasets serve as a valuable resource for gaining insights into the dynamic world of streaming media.
Explore the Amazon Prime TV Shows and Movies Dataset and other related datasets on Crawl Feeds to elevate your data-driven projects.
Facebook
TwitterAround 100,000 movies acquired from IMDB. The most popular items from each year since 1950. The dataset is organized as a JSON file. The JSON is of the following format: { year1 : { movie_title1 : { 'genre' : [genre1, genre2,...], 'synopsis' : synopsis_string } movie_title2 : ... } year2 : .... } It should be noted that the list of genres could be empty.
We only took movies with a short synopsis, and not the longer "summary" format in IMDB.
The script is also attached as "Crawler.py".
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This comprehensive dataset features detailed metadata for over 190,000 movies and TV shows, with a strong concentration in the Horror genre. It is ideal for entertainment research, machine learning models, genre-specific trend analysis, and content recommendation systems.
Each record contains rich information, making it perfect for streaming platforms, film industry analysts, or academic media researchers.
Primary Genre Focus: Horror
Build movie recommendation systems or genre classifiers
Train NLP models on movie descriptions
Analyze Horror content trends over time
Explore box office vs. rating correlations
Enrich entertainment datasets with directorial and cast metadata
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
TMDb movies database with id_imdb that can be used for batch processing of TMDb films, as an alternative to request the TMDb API 6 million times with the id from IMDb to find the links.
IMDb provides snapshots of their databases on titles, casting, etc. However, they do not provide user reviews. Furthermore, it is against their Terms of Use to do any form of Scraping of their webpages.
TMDb, an Alternative to IMDb TMDb (The Movie Database) on the other hand, does provide user reviews, through their API. It is even possible to search a film by their imdb_id.
However, if for any reason you must stick to the IMDB as your base dataset, and collect information for a good portion of IMDB's 6,782,091 entries, you are doomed.
10% of 6,782,091 would amount for 678,209 API requests, and even though you may not be rate limited, it will still take days.
I've then created this script (https://github.com/hudsonmendes/lambda-tmdb-distributed-downloader) that can be used to download, with good level of parallelism, TMDb movies by their IMDb id.
Apart from the extra data that TMDb makes available (like full release date, for example), we attach the IMDb ID that was found (as id_imdb) to the TMDB movie JSON, and save it in S3.
It would not be possible to put together this data if it wasn't for snapshot of data provided by IMDB or by the nice API provided by TMDB. Special thanks for both providers to provide either data or the API, documentation, run the infra-structure and allow us, through their terms to have access to such data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is extended datasets from MM-IMDB [Arevalo+ ICLRW'17], Ads-Parallelity [Zhang+ BMVC'18] dataset with the features from Google Cloud Vision API. These datasets are stored in jsonl (JSON Lines) format.
Abstract (from our paper):
There is increasing interest in the use of multimodal data in various web applications, such as digital advertising and e-commerce. Typical methods for extracting important information from multimodal data rely on a mid-fusion architecture that combines the feature representations from multiple encoders. However, as the number of modalities increases, several potential problems with the mid-fusion model structure arise, such as an increase in the dimensionality of the concatenated multimodal features and missing modalities. To address these problems, we propose a new concept that considers multimodal inputs as a set of sequences, namely, deep multimodal sequence sets (DM2S2). Our set-aware concept consists of three components that capture the relationships among multiple modalities: (a) a BERT-based encoder to handle the inter- and intra-order of elements in the sequences, (b) intra-modality residual attention (IntraMRA) to capture the importance of the elements in a modality, and (c) inter-modality residual attention (InterMRA) to enhance the importance of elements with modality-level granularity further. Our concept exhibits performance that is comparable to or better than the previous set-aware models. Furthermore, we demonstrate that the visualization of the learned InterMRA and IntraMRA weights can provide an interpretation of the prediction results.
Dataset (MM-IMDB and Ads-Parallelity):
We extended two multimodal datasets, namely, MM-IMDB [Arevalo+ ICLRW'17], Ads-Parallelity [Zhang+ BMVC'18] for the empirical experiments. The MM-IMDB dataset contains 25,925 movies with multiple labels (genres). We used the original split provided in the dataset and reported the F1 scores (micro, macro, and samples) of the test set. The Ads-Parallelity dataset contains 670 images and slogans from persuasive advertisements to understand the implicit relationship (parallel and non-parallel) between these two modalities. A binary classification task is used to predict whether the text and image in the same ad convey the same message.
We transformed the following multimodal information (i.e., visual, textual, and categorical data) into textual tokens and fed these into our proposed model. We used the Google Cloud Vision API for the visual features to obtain the following four pieces of information as tokens: (1) text from the OCR, (2) category labels from the label detection, (3) object tags from the object detection, and (4) the number of faces from the facial detection. We input the labels and object detection results as a sequence in order of confidence, as obtained from the API. We describe the visual, textual, and categorical features of each dataset below.
MM-IMDB: We used the title and plot of movies as the textual features, and the aforementioned API results based on poster images as visual features.
Ads-Parallelity: We used the same API-based visual features as in MM-IMDB. Furthermore, we used textual and categorical features consisting of textual inputs of transcriptions and messages, and categorical inputs of natural and text concrete images.
Facebook
TwitterData Source: https://www.kaggle.com/datasets/gufukuro/movie-scripts-corpus Data Description : Movie Scripts Corpus This corpus was collected to use for screenplay analysis with machine learning methods. Corpus includes movie scripts, crawled from different sources, their annotations by script structural elements and movies metadata. Corpus description Screenplay data consists of: Movie scripts TXT-documents with raw full text (2858 docs) Movie scripts TXT-documents with full text lemmas (2858 docs) Manual annotation TXT-documents for some movie scripts (33 docs, more than 6000 annotated rows) Movie scripts annotations TXT-documents obtained by BERT Movie scripts annotations json-documents obtained by rule-based annotator ScreenPy Movies metadata consists of: Cut versions of movie reviews and scores from metacritic: Number of reviews: 21025 Number of movies with reviews: 2038 Metadata for movies, including: title, akas, launch year, score from metacritic, imdb user rating and number of votes from imdb.com, movie awards, opening weekend, producers, budget, script department, production companies, writers, directors, cast info, countries involved in production, age restrict, plot (with outline), keywords, genres, taglines, critics' synopsis Screenplay awards information: Academy Awards adapted screenplay, Academy Awards original screenplay, BAFTA, Golden Globe Award for Best Screenplay, Writers Guild Awards Winners & Nominees 2020-2013 nominations information for 462 movies in total. Movie characters data consists of: Script text fragments with dialogs and scene descriptions for characters, gathered with annotators: 2153 movies and text fragments for 32114 characters in total Gender labels for 4792 characters
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was scrapped from my list on Internet Movies Database List about Palestinian Movies.
Using apify actor to scrap the data and download the file: https://console.apify.com/actors/poWuYPmbfLGBn5Mf8/console
This is the list: https://www.imdb.com/list/ls563010565/?sort=alpha,asc&st_dt=&mode=detail&page=1
To use this dataset
It's usable for raw JSON response
https://raw.githubusercontent.com/sondosaabed/Palestinian-Movies-JSON-Dataset/main/palestinian_movies.json
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
These files contain metadata for all 45,000 movies listed in the Full MovieLens Dataset. The dataset consists of movies released on or before July 2017. Data points include cast, crew, plot keywords, budget, revenue, posters, release dates, languages, production companies, countries, TMDB vote counts and vote averages.
This dataset also has files containing 26 million ratings from 270,000 users for all 45,000 movies. Ratings are on a scale of 1-5 and have been obtained from the official GroupLens website.
This dataset consists of the following files:
movies_metadata.csv: The main Movies Metadata file. Contains information on 45,000 movies featured in the Full MovieLens dataset. Features include posters, backdrops, budget, revenue, release dates, languages, production countries and companies.
keywords.csv: Contains the movie plot keywords for our MovieLens movies. Available in the form of a stringified JSON Object.
credits.csv: Consists of Cast and Crew Information for all our movies. Available in the form of a stringified JSON Object.
links.csv: The file that contains the TMDB and IMDB IDs of all the movies featured in the Full MovieLens dataset.
links_small.csv: Contains the TMDB and IMDB IDs of a small subset of 9,000 movies of the Full Dataset.
ratings_small.csv: The subset of 100,000 ratings from 700 users on 9,000 movies.
The Full MovieLens Dataset consisting of 26 million ratings and 750,000 tag applications from 270,000 users on all the 45,000 movies in this dataset can be accessed here
This dataset is an ensemble of data collected from TMDB and GroupLens. The Movie Details, Credits and Keywords have been collected from the TMDB Open API. This product uses the TMDb API but is not endorsed or certified by TMDb. Their API also provides access to data on many additional movies, actors and actresses, crew members, and TV shows. You can try it for yourself here.
The Movie Links and Ratings have been obtained from the Official GroupLens website. The files are a part of the dataset available here
https://www.themoviedb.org/assets/static_cache/9b3f9c24d9fd5f297ae433eb33d93514/images/v4/logos/408x161-powered-by-rectangle-green.png" alt="">
This dataset was assembled as part of my second Capstone Project for Springboard's Data Science Career Track. I wanted to perform an extensive EDA on Movie Data to narrate the history and the story of Cinema and use this metadata in combination with MovieLens ratings to build various types of Recommender Systems.
Both my notebooks are available as kernels with this dataset: The Story of Film and Movie Recommender Systems
Some of the things you can do with this dataset: Predicting movie revenue and/or movie success based on a certain metric. What movies tend to get higher vote counts and vote averages on TMDB? Building Content Based and Collaborative Filtering Based Recommendation Engines.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
10000 film's posters and descriptions parsed from the IMDB cite for Genre Classification Task.
Folder contains:
labels.json - Information about film's genre mapping to labels from 0 to 23.
parsed_data.json:
Information about each film, represented by his own film_id (based on IMDB film_id):
title - title of the filmdescription - film description, parsed from IMDBposter_url - link to film's postegenre - main genre of the filmlabels - additional genres of the filmreleaseDate - release date of the filmfilm_year - year of the release dateSample:
"tt6443346": {
"title": "Black Adam"
"description":string"Nearly 5,000 years after he was bestowed with the almighty powers of the Egyptian gods - and imprisoned just as quickly - Black Adam is freed from his earthly tomb, ready to unleash his unique form of justice on the modern world."
"poster_url": "https://m.media-amazon.com/images/M/MV5BYzZkOGUwMzMtMTgyNS00YjFlLTg5NzYtZTE3Y2E5YTA5NWIyXkEyXkFqcGdeQXVyMjkwOTAyMDU@._V1_QL75_UX190_CR0,0,190,281_.jpg"
"genre": "SuperHero"
"labels": ["Action", "Adventure", "Fantasy"]
"releaseDate": NULL
"film_year": 2022
}
Facebook
TwitterThe IMDB movie data is a comprehensive data set that contains information about movies from the Internet Movie Database (IMDB). It is an extensive collection of movie-related data, including movie titles, release dates, genres, ratings, and reviews.
The data set contains information about a wide range of movies, including both old and new films from various countries and languages. It is an excellent resource for those interested in movie analysis, as it includes information such as the movie's budget, box office revenue, and cast and crew details.
The IMDB movie data set is widely used by data scientists, researchers, and movie enthusiasts to perform analysis and draw insights. By analyzing the data, one can gain valuable insights about the movie industry, such as the most popular genres, the most successful directors, and the impact of ratings and reviews on box office performance.
The data set is available for free and can be downloaded in various formats, including CSV, JSON, and SQL. This makes it easily accessible and usable by anyone interested in conducting analysis on movie-related data
Facebook
TwitterThis is a huge dataset and takes around 400 seconds to load into kernel. If you need quickly IMDB data in Keras kernel use the following dataset instead:
https://www.kaggle.com/pankrzysiu/keras-imdb-reviews
A set of 50,000 highly-polarized reviews from the Internet Movie Database.
This file is to be used directly in your code. The .zip file will be automatically uncompressed by Kaggle.
from os import listdir, makedirs
from os.path import join, exists, expanduser
cache_dir = expanduser(join('~', '.keras'))
if not exists(cache_dir):
makedirs(cache_dir)
datasets_dir = join(cache_dir, 'datasets')
if not exists(datasets_dir):
makedirs(datasets_dir)
# If you have multiple input files, change the below cp commands accordingly, typically:
# !cp ../input/keras-imdb/imdb* ~/.keras/datasets/
!cp ../input/imdb* ~/.keras/datasets/
The files are on the net in these locations:
https://s3.amazonaws.com/text-datasets/imdb.npz
https://s3.amazonaws.com/text-datasets/imdb_word_index.json
They are used by keras imdb.py:
https://github.com/keras-team/keras/blob/master/keras/datasets/imdb.py
"Python Deep Learning" Book example is using this:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
These files contain 48,000 Movies data. It's good for researcher how wants to make an online recommender systems. It almost contain all Movies that exist in MovieLens 1M datasets.
================================================================================
To acknowledge use of the dataset in publications, please cite the following:
RJ Ziarani, 48K IMDB Movies With Datasets, accessed 25 July 2021 ,2021
================================================================================
These Folder contain movie's Data.You can access each movie with following pattern:
Data/Year/IMDBID/IMDBID.json
Example:
Data/2020\tt4532038\tt4532038.json
Example for a movie's data:
{"@context": "http://schema.org", "@type": "Movie", "url": "/title/tt4532038/", "name": "The War with Grandpa", "image": "https://m.media-amazon.com/images/M/MV5BNTlkZDQ1ODEtY2ZiMS00OGNhLWJlZDctYzY0NTFmNmQ2NDAzXkEyXkFqcGdeQXVyMTkxNjUyNQ@@._V1_.jpg", "genre": ["Comedy", "Drama", "Family"], "contentRating": "PG", "actor": [{"@type": "Person", "url": "/name/nm0000134/", "name": "Robert De Niro"}, {"@type": "Person", "url": "/name/nm0000235/", "name": "Uma Thurman"}, {"@type": "Person", "url": "/name/nm1443527/", "name": "Rob Riggle"}, {"@type": "Person", "url": "/name/nm4625502/", "name": "Oakes Fegley"}], "director": {"@type": "Person", "url": "/name/nm0384722/", "name": "Tim Hill"}, "creator": [{"@type": "Person", "url": "/name/nm0040022/", "name": "Tom J. Astle"}, {"@type": "Person", "url": "/name/nm0256079/", "name": "Matt Ember"}, {"@type": "Person", "url": "/name/nm0809759/", "name": "Robert Kimmel Smith"}, {"@type": "Organization", "url": "/company/co0482253/"}, {"@type": "Organization", "url": "/company/co0017712/"}, {"@type": "Organization", "url": "/company/co0639852/"}, {"@type": "Organization", "url": "/company/co0437328/"}, {"@type": "Organization", "url": "/company/co0641417/"}], "description": "The War with Grandpa is a movie starring Robert De Niro, Uma Thurman, and Rob Riggle. Upset that he has to share the room he loves with his grandfather, Peter decides to declare war in an attempt to get it back.", "datePublished": "2020-08-27", "keywords": "mother son relationship,christmas,room,family conflict,family relationships", "aggregateRating": {"@type": "AggregateRating", "ratingCount": 9310, "bestRating": "10.0", "worstRating": "1.0", "ratingValue": "5.5"}, "review": {"@type": "Review", "itemReviewed": {"@type": "CreativeWork", "url": "/title/tt4532038/"}, "author": {"@type": "Person", "name": "byron-116"}, "dateCreated": "2020-08-28", "inLanguage": "English", "name": "Suitable for juveniles only...", "reviewBody": "It's pathetic to watch such great stars in this film apt for juveniles only. Watch it if you are under 14 years old.....", "reviewRating": {"@type": "Rating", "worstRating": "1", "bestRating": "10", "ratingValue": "4"}}, "duration": "PT1H34M", "trailer": {"@type": "VideoObject", "name": "Official Trailer", "embedUrl": "/video/imdb/vi911785497", "thumbnail": {"@type": "ImageObject", "contentUrl": "https://m.media-amazon.com/images/M/MV5BMTdhNWI1N2QtMjQ5Yi00M2M5LWE3YWQtMDE5YmNhMmFmZTVkXkEyXkFqcGdeQXRyYW5zY29kZS13b3JrZmxvdw@@._V1_.jpg"}, "thumbnailUrl": "https://m.media-amazon.com/images/M/MV5BMTdhNWI1N2QtMjQ5Yi00M2M5LWE3YWQtMDE5YmNhMmFmZTVkXkEyXkFqcGdeQXRyYW5zY29kZS13b3JrZmxvdw@@._V1_.jpg", "description": "The next big family-fun film is hitting theaters soon! Check out the trailer for THE WAR WITH GRANDPA starring Robert De Niro, Christopher Walken, Uma Thurman, Rob Riggle, Cheech Marin, Laura Marano and Oakes Fegly. Coming soon to theaters!", "uploadDate": "2020-08-13T17:40:20Z"}}
Facebook
TwitterNetfilx prize data is one of the popular datasets available today for OTT Recommandation. Netflix Prize Dataset contains title, userid, rating,date of rating as the only attributes for recommandation . we extend the Netflix prize dataset by scraping IMDB data about the titles in Netflix prize dataset. Any copyyright to the scraped data belongs to its respective owners.
The Dataset contains information of approximately 9000 movies and tv shows available in Netflix prize datasets. Information like duration of movie, cast and crew,genre,languages,etc are present. For Columns which hold multiple values in a row arrays have been used to store those values. Please use the .json file to access the dataset to avoid string related errors.
Could you build a Hybrid recommandation system by combining our dataset along with Netflix Prize Dataset.
Some movies present in imdb.csv and imdb.json have information of movies with titles same as in Netflix Prize Dataset but were made after 2005 (release of Netflix Prize Dataset) this has been corrected in imdb_processed.csv and imdb_processed.json . Please use this processed data while using the dataset for tasks specific to Netfilx Prize Dataset.
Facebook
TwitterUsing Keras inside Kaggle requires you to provide cached datasets. This dataset loads quickly into kernels and Keras.
A set of 50,000 highly-polarized reviews from the Internet Movie Database.
from os import listdir, makedirs
from os.path import join, exists, expanduser
cache_dir = expanduser(join('~', '.keras'))
if not exists(cache_dir):
makedirs(cache_dir)
datasets_dir = join(cache_dir, 'datasets')
if not exists(datasets_dir):
makedirs(datasets_dir)
# If you have multiple input files, change the below cp commands accordingly, typically:
# !cp ../input/keras-imdb-reviews/imdb* ~/.keras/datasets/
!cp ../input/imdb* ~/.keras/datasets/
The files are on the net in these locations:
https://s3.amazonaws.com/text-datasets/imdb.npz
https://s3.amazonaws.com/text-datasets/imdb_word_index.json
They are used by keras imdb.py:
https://github.com/keras-team/keras/blob/master/keras/datasets/imdb.py
"Python Deep Learning" Book example is using this: https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/3.5-classifying-movie-reviews.ipynb
Facebook
Twitter# 🏆 IMDB Top 100 Movies Dataset
This dataset contains detailed information about the Top 100 movies from IMDb, collected to assist film enthusiasts, data analysts, and machine learning practitioners in exploring trends and insights in the film industry.
Each movie entry includes: 🎬 Title – Name of the movie 📅 Year – Year of release ⭐ Rating – IMDb user rating (out of 10) 📣 Genres – List of genres the movie belongs to 🎥 Director – Director(s) of the movie 👥 Stars – Leading cast ⏱️ Runtime – Duration in minutes 📝 Summary – A brief synopsis of the movie 🧾 Votes – Number of user votes 💰 Gross – Box office gross (if available)
Data Visualization: Create graphs showing rating trends, genre distributions, etc. Recommendation Systems: Build a content-based movie recommender. NLP Projects: Use summaries for natural language processing tasks. Exploratory Data Analysis: Great dataset for practicing EDA techniques.
The data is derived from IMDb's public listings and compiled into JSON format for easy use in Python-based projects.