Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Card for Wikipedia Movie Plots with AI Plot Summaries
Dataset Summary
Context
Wikipedia Movies Plots dataset by JustinR ( https://www.kaggle.com/jrobischon/wikipedia-movie-plots )
Content
Everything is the same as in https://www.kaggle.com/jrobischon/wikipedia-movie-plots
Acknowledgements
Please, go upvote https://www.kaggle.com/jrobischon/wikipedia-movie-plots dataset, since this is 100% based on that.
Supported Tasks and… See the full description on the dataset page: https://huggingface.co/datasets/vishnupriyavr/wiki-movie-plots-with-summaries.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains information about movies gathered from IMDB and other sources. It includes the following features:
Title: 7k+ Movie Plot Dataset.
Plot: A brief summary or description of the movie's story.
Genres: The genres or categories the movie belongs to (e.g., Drama, Action, Comedy).
Countries: The countries where the movie was produced.
Languages: The primary languages spoken in the movie.
Average Rating: The average user rating given to the movie.
Number of Votes: The total number of user votes or reviews the movie has received.
The data has been cleaned and preprocessed to remove unnecessary symbols and text, providing a more streamlined and usable version for analysis.
Citation Imran Ahmed, and Arman Sakif. (2025). 7k+ Movie Plot Dataset [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/10697654
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
IMDb (an acronym for Internet Movie Database) is an online database of information related to films, television programs, home videos, video games, and streaming content online – including cast, production crew and personal biographies, plot summaries, trivia, ratings, and fan and critical reviews. An additional fan feature, message boards, was abandoned in February 2017. Originally a fan-operated website, the database is now owned and operated by IMDb.com, Inc., a subsidiary of Amazon.
As of December 2020, IMDb has approximately 7.5 million titles (including episodes) and 10.4 million personalities in its database,[2] as well as 83 million registered users.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Plots extracted from Wikipedia for all movies with > 1000 ratings on IMDb and released between 1950 to 2023. Useful for a demo projects on Large Language models (LLMs)(e.g. a movie searching app - https://www.cinemattr.ca).
The plot summary section of each movie was cleaned of all links, references and other irrelevant stuff to get a pure text value.Missing plots were fallbacked to IMDb synopses.
89% movies have plot details, 100% have a short summary (untouched from wikipedia, useful for matching metadata and other details for a retriever application)
The columns stars, directors, genres are a list of values, useful for loading into a vector database.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a dataset containing movie plots of 100k+ movies more which is in .csv form
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Unlock one of the most comprehensive movie datasets available—4.5 million structured IMDb movie records, extracted and enriched for data science, machine learning, and entertainment research.
This dataset includes a vast collection of global movie metadata, including details on title, release year, genre, country, language, runtime, cast, directors, IMDb ratings, reviews, and synopsis. Whether you're building a recommendation engine, benchmarking trends, or training AI models, this dataset is designed to give you deep and wide access to cinematic data across decades and continents.
Perfect for use in film analytics, OTT platforms, review sentiment analysis, knowledge graphs, and LLM fine-tuning, the dataset is cleaned, normalized, and exportable in multiple formats.
Genres: Drama, Comedy, Horror, Action, Sci-Fi, Documentary, and more
Train LLMs or chatbots on cinematic language and metadata
Build or enrich movie recommendation engines
Run cross-lingual or multi-region film analytics
Benchmark genre popularity across time periods
Power academic studies or entertainment dashboards
Feed into knowledge graphs, search engines, or NLP pipelines
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains information about top 1000 IMDB movies, including their titles, certificates, durations, genres, IMDb ratings, Metascores, directors, cast members, the number of votes they received, grossed earnings, and plot summaries. The data is a curated list of highly acclaimed and popular movies.
Columns/Variables:
Movie Name: The title of the movie. Certificate: The certificate or rating assigned to the movie. Duration: The duration of the movie in minutes. Genre: The genre(s) to which the movie belongs. IMDb Rating: The IMDb rating of the movie. Metascore: The Metascore rating of the movie. Director: The director of the movie. Stars: The main cast members of the movie. Votes: The number of user votes/ratings the movie has received. Grossed in $: The gross earnings in dollars (if available). Plot: A brief summary or plot description of the movie. Size: The dataset contains 1000 rows and 11 columns.
Data Quality: The dataset appears to be well-structured and complete. There are no missing values, and it seems to be ready for analysis.
Use Cases: This dataset can be used for various analyses, such as exploring the relationship between IMDb ratings and Metascores, identifying top-rated directors, or understanding the distribution of movie ratings across genres.
Facebook
TwitterData Source: https://www.kaggle.com/datasets/gufukuro/movie-scripts-corpus Data Description : Movie Scripts Corpus This corpus was collected to use for screenplay analysis with machine learning methods. Corpus includes movie scripts, crawled from different sources, their annotations by script structural elements and movies metadata. Corpus description Screenplay data consists of: Movie scripts TXT-documents with raw full text (2858 docs) Movie scripts TXT-documents with full text lemmas (2858 docs) Manual annotation TXT-documents for some movie scripts (33 docs, more than 6000 annotated rows) Movie scripts annotations TXT-documents obtained by BERT Movie scripts annotations json-documents obtained by rule-based annotator ScreenPy Movies metadata consists of: Cut versions of movie reviews and scores from metacritic: Number of reviews: 21025 Number of movies with reviews: 2038 Metadata for movies, including: title, akas, launch year, score from metacritic, imdb user rating and number of votes from imdb.com, movie awards, opening weekend, producers, budget, script department, production companies, writers, directors, cast info, countries involved in production, age restrict, plot (with outline), keywords, genres, taglines, critics' synopsis Screenplay awards information: Academy Awards adapted screenplay, Academy Awards original screenplay, BAFTA, Golden Globe Award for Best Screenplay, Writers Guild Awards Winners & Nominees 2020-2013 nominations information for 462 movies in total. Movie characters data consists of: Script text fragments with dialogs and scene descriptions for characters, gathered with annotators: 2153 movies and text fragments for 32114 characters in total Gender labels for 4792 characters
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This movies dataset can certainly be used for a variety of purposes, depending on goals and the insights you're looking to derive from the data. Here are some potential use cases for the dataset.
Movie Analysis
Recommendation Systems
Popularity Measurement
Audience Engagement
Comparative Analysis
The dataset consists of various attributes related to movies. These attributes provide information about each entry in the dataset:
1. Index: - Index for each row
2. Title: - The title attribute represents the name of the movie.
3. Original Language: - This attribute signifies the language in which the movie was originally produced. It could offer insights into the target audience and geographical scope of the content.
4. Release Date: - This attribute indicates when the movie was officially released for public viewing. The release date can impact factors like marketing strategies, competition with other releases, and audience anticipation.
5. Popularity: - This attribute likely represents the measure of how well-known or talked-about a particular movie is within a given context. It could be based on factors such as online discussions, social media mentions, and viewer interest.
6. Vote Average: - This attribute likely represents the average rating or score given to the movie by viewers who have voted. A higher average could imply that the content is generally well-received.
7. Vote Count: - This attribute indicates the number of votes or ratings that the movie has received from viewers. A higher vote count might suggest a larger viewer base or a more engaging content.
8. Overview: - This attribute provides a concise summary or description of the movie plot, themes, and overall content. It offers a glimpse into what the content is about.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Selection of top 1000 entries of each gender in IMDB..
Contains information of:
This is a sumulated dataset.
Facebook
TwitterData of the relevant plots of the thesis
Facebook
TwitterMovies that summarize key TORUS-LItE deployments. The movies were generated from IDV (integrated Data Viewer) and include the positions of all assets operating on the particular day updated at one minute intervals, the radar reflectivity from the nearest WSR-88D, and scanning symbols for remote-sensing instruments.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset offers a wide-ranging collection of movie information, featuring essential details like movie titles, URLs to IMDb pages, release years, genres, and ratings. Additional data include synopses, director names, leading actors, and links to IMDb images. Compiled from IMDb, this dataset is perfect for anyone interested in movie analytics, trend analysis, or creating data-driven applications related to the film industry.
Certainly! Here's the updated description for the "About Dataset" section with the information about the data being scraped using Scrapy:
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This table contains 420 series, with data for years 1996/1997 - 2004/2005 (not all combinations necessarily have data for all years), and is no longer being released. This table contains data described by the following dimensions (Not all combinations are available): Geography (12 items: Canada; Newfoundland and Labrador; Prince Edward Island; Nova Scotia; ...), Type of venue (3 items: Total movie theatres and drive-ins; Movie theatres; Drive-ins), Summary characteristics (14 items: Number of theatres; Paid admissions; Average ticket prices; Number of screens; ...).
Facebook
TwitterThis dataset contains the predicted prices of the asset Ix Shells: An Ethereum Story (Short Film) over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.
Facebook
TwitterMovies that summarize key TORUS 2019 deployments. The movies were generated from IDV (integrated Data Viewer) and include the positions of all assets operating on the particular day updated at one minute intervals, the radar reflectivity from the nearest WSR-88D, and scanning symbols for remote-sensing instruments.
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The summary statistics by North American Industry Classification System (NAICS) which include: operating revenue (dollars x 1,000,000), operating expenses (dollars x 1,000,000), salaries wages and benefits (dollars x 1,000,000), and operating profit margin (by percent), of post-production and other motion picture and video industries (NAICS 512190), annual, for five years of data.
Facebook
TwitterThis dataset contains the predicted prices of the asset ConstitutionDAO: An Ethereum Story (Short Film) over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This table contains 420 series, with data for years 1996/1997 - 2004/2005 (not all combinations necessarily have data for all years), and is no longer being released. This table contains data described by the following dimensions (Not all combinations are available): Geography (12 items: Canada; Newfoundland and Labrador; Prince Edward Island; Nova Scotia; ...), Type of venue (3 items: Total movie theatres and drive-ins; Movie theatres; Drive-ins), Summary characteristics (14 items: Number of theatres; Paid admissions; Average ticket prices; Number of screens; ...).
Facebook
TwitterThis table contains 98 series, with data for years 1998/1999 - 2004/2005 (not all combinations necessarily have data for all years), and is no longer being released. This table contains data described by the following dimensions (Not all combinations are available): Geography (7 items: Canada; Atlantic provinces; Quebec; Ontario; ...), Summary characteristics (14 items: Total number of firms; Total number of employees; Full-time employees; Part-time employees; ...).
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Card for Wikipedia Movie Plots with AI Plot Summaries
Dataset Summary
Context
Wikipedia Movies Plots dataset by JustinR ( https://www.kaggle.com/jrobischon/wikipedia-movie-plots )
Content
Everything is the same as in https://www.kaggle.com/jrobischon/wikipedia-movie-plots
Acknowledgements
Please, go upvote https://www.kaggle.com/jrobischon/wikipedia-movie-plots dataset, since this is 100% based on that.
Supported Tasks and… See the full description on the dataset page: https://huggingface.co/datasets/vishnupriyavr/wiki-movie-plots-with-summaries.