19 datasets found
  1. c

    IMDB movie details dataset

    • crawlfeeds.com
    csv, zip
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). IMDB movie details dataset [Dataset]. https://crawlfeeds.com/datasets/imdb-movie-details-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description
    The IMDB Movie Details Dataset is a comprehensive collection of movie datasets that offers a treasure trove of information about movies, TV shows, and streaming content listed on IMDB. This dataset includes detailed data such as titles, release years, genres, cast, crew, ratings, and more, making it a go-to resource for film and entertainment enthusiasts. Ideal for data analysis, IMDB movie dataset applications span machine learning projects, predictive modeling, and insights into industry trends.
    Researchers can explore patterns in movie ratings and genre popularity, while developers can use the dataset to build recommendation systems or applications. Movie buffs can dive deep into historical and contemporary trends in the world of cinema. This dataset not only supports academic and professional pursuits but also opens doors for creative projects in storytelling, content creation, and audience engagement. Whether you’re a developer, researcher, or film enthusiast, the IMDB movie dataset is a powerful tool for uncovering trends and gaining deeper insights into the evolving entertainment landscape.
  2. T

    imdb_reviews

    • tensorflow.org
    • kaggle.com
    Updated Sep 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). imdb_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/imdb_reviews
    Explore at:
    Dataset updated
    Sep 20, 2024
    Description

    Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('imdb_reviews', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  3. o

    Filtered IMDb Movies & TV Shows Dataset

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Filtered IMDb Movies & TV Shows Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/ca25d396-b298-4765-ab3b-8adf955bfc63
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Entertainment & Media Consumption
    Description

    This dataset provides detailed information on IMDb movies and television shows, integrating descriptions sourced from Rotten Tomatoes. It contains data for approximately 7800 titles, primarily from the 1990s onwards, and has been filtered to include English language content with specific criteria for ratings and votes. The purpose of this dataset is to facilitate projects involving cross-content analysis, content-based recommendation systems, and genre prediction tasks. It offers a rich resource for understanding entertainment media consumption and developing machine learning applications.

    Columns

    • SNo.: Serial number for each record.
    • index: An internal index for the record.
    • tconst: A unique identifier for the title.
    • titleType: Specifies the type of content, such as 'movie' or 'tvSeries'.
    • primaryTitle: The most commonly known title for the content.
    • originalTitle: The official original title of the content.
    • isAdult?: A boolean indicator for adult content.
    • startYear: The year the title was released or started.
    • endYear: The year the title concluded (for TV series) or was released.
    • runtimeMinutes: The duration of the content in minutes.
    • Genres: Categories or types of content (multiple values may be present).
    • Average Rating: The average rating of the title as found on IMDb.
    • Num. of Votes: The total number of votes received for the rating on IMDb.
    • Region: The geographic region associated with the title's availability or origin.
    • Number of Ratings Types: Details related to how ratings are categorised.
    • Attributes: Additional characteristics or tags associated with the title.
    • Description: A textual description of the title, likely from Rotten Tomatoes.

    Distribution

    The dataset comprises approximately 7800 individual movie and TV show records. It is typically provided in a CSV file format. The data has been curated, filtering the original IMDb dataset to focus on content from the 1990s through to 2023. Only titles in English ('en') have been retained, and specific rating and vote thresholds have been applied, such as movies/shows from the 90s-00s with ratings of 7.9 or higher, and those from the 2000s onwards with ratings of 6.5 or higher. Titles from Canada, Greater Britain, India, and the USA are represented.

    Usage

    This dataset is highly suitable for various analytical and machine learning tasks, including: * Developing content-based recommendation systems using genres, descriptions, and ratings. * Performing exploratory data analysis on movie and TV show trends. * Implementing Natural Language Processing (NLP) techniques on title descriptions for insights or feature extraction. * Executing multi-label classification to predict genres from description data. * Clustering movies and shows based on their descriptions and genre attributes. * Aiding projects that require cross-content analysis across different media types.

    Coverage

    The dataset primarily covers movies and TV shows released from 1990 to 2023. Geographically, the data includes titles relevant to Canada, Greater Britain, India, and the USA. There is no specific demographic scope mentioned beyond the inclusion of English-language titles. The dataset has specific filtering criteria for data availability based on rating scores and the number of votes, ensuring a focus on well-received or highly-engaged content.

    License

    CCO

    Who Can Use It

    This dataset is ideal for: * Data Scientists and Analysts: For conducting exploratory data analysis, building predictive models, and deriving insights into media consumption. * Machine Learning Engineers: For developing and training recommendation engines, NLP models, and classification algorithms. * Researchers: Studying trends in film and television, cross-media analysis, and content categorisation. * Developers: Creating applications that require rich movie and TV show data, such as content discovery platforms. * Academics and Students: For educational purposes, coursework, and research projects in data science, AI, and media studies.

    Dataset Name Suggestions

    • IMDb Films & Shows with Descriptions
    • Nineties and Beyond IMDb Data
    • Rotten Tomatoes-IMDb Integrated Dataset
    • Filtered IMDb Movies & TV Shows
    • Entertainment Content Analytics Dataset

    Attributes

    Original Data Source: IMDb Movies/Shows with Descriptions

  4. IMDb Movie Review Sentiment

    • kaggle.com
    Updated Dec 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). IMDb Movie Review Sentiment [Dataset]. https://www.kaggle.com/datasets/thedevastator/imdb-movie-review-sentiment-dataset/suggestions?status=pending&yourSuggestions=true
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 2, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    IMDb Movie Review Sentiment

    Movie Review Sentiment

    By imdb (From Huggingface) [source]

    About this dataset

    The IMDb Large Movie Review Dataset is a comprehensive collection of movie reviews used for sentiment classification. The dataset includes a wide range of movie reviews along with their corresponding sentiment labels, which indicate whether the review is positive or negative in nature. This invaluable dataset is aimed at facilitating sentiment analysis and classification tasks in the field of natural language processing.

    The main purpose of the train.csv file within this dataset is to provide a curated collection of movie reviews, each accompanied by its respective sentiment label. This file proves particularly useful for training machine learning models to accurately predict sentiment and classify reviews based on their emotional tone.

    Similarly, the test.csv file contains another set of movie reviews along with corresponding sentiment labels. Meant for testing and validating the performance of trained models, this dataset enables researchers and developers to evaluate their models' effectiveness in real-world scenarios.

    Additionally, the unsupervised.csv file offers an alternative subset within the dataset. Unlike train.csv and test.csv, unsupervised.csv does not include any associated sentiment labels for individual movie reviews. This specific subset serves as a valuable resource for exploring unsupervised learning techniques within the domain of sentiment classification.

    By utilizing this meticulously compiled IMDb Large Movie Review Dataset, researchers and data scientists can delve into various aspects related to analyzing sentiments in textual data. With its carefully labeled data points covering both positive and negative sentiments expressed in diverse film critiques, this dataset empowers users to develop sophisticated machine learning algorithms that accurately assess subjective opinions from text data

    How to use the dataset

    Introduction:

    Dataset Overview: - Train.csv: This file contains a set of movie reviews along with their sentiment labels. It is intended for training your sentiment analysis models. - Test.csv: This file provides another set of movie reviews along with their corresponding sentiment labels. You can use this file to evaluate the performance of your trained models. - Unsupervised.csv: This file includes movie reviews without any associated sentiment labels. It can be used for unsupervised sentiment classification tasks.

    Columns in the Dataset: - text: The main column containing the text of each movie review. - label: The sentiment label assigned to each review, indicating whether it is positive or negative.

    Guidelines for Using the Dataset:

    • Training Your Model:

      • Begin by loading and preprocessing the data from train.csv
      • Treat 'text' as your input feature and 'label' as your target variable
      • Explore different machine learning or deep learning algorithms suitable for text classification
      • Train your model using various techniques, such as bag-of-words, word embeddings, or transformers
      • Evaluate and fine-tune your model's performance using test.csv
    • Evaluating Your Model:

      • Load test.csv and preprocess the data similar to what you did with train.csv
      • Use this preprocessed test data to evaluate the accuracy, precision, recall, F1 score or other relevant metrics of your trained model on unseen data
      • Analyze these metrics to understand how well your model is performing in predicting sentiments
    • Advancing Your Model (Unsupervised Classification):

      • Utilize unsupervised.csv for unsupervised sentiment classification tasks
      • Preprocess the movie reviews in this file and explore techniques like clustering, topic modeling, or self-supervised learning
      • Extract patterns, themes, or sentiments from the reviews without any guidance from labeled data

    Conclusion:

    Research Ideas

    • Sentiment Analysis: This dataset can be used to train models for sentiment analysis, where the goal is to predict whether a movie review is positive or negative based on its text.
    • NLP Research: The dataset can be used for various natural language processing (NLP) tasks such as text classification, information extraction, or named entity recognition. Researchers and practitioners can leverage this dataset to develop and evaluate new algorithms and techniques in the field of NLP.
    • Recommendation Systems: The sentiment labels in this dataset can be used as a source of feedback or user preferences for recommendation systems. By analyzing the sentiments expressed in reviews,...
  5. o

    Popular Movies of IMDb

    • opendatabay.com
    .undefined
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Popular Movies of IMDb [Dataset]. https://www.opendatabay.com/data/web-social/c9597b23-d205-46ff-abb3-674815373730
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 9, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Entertainment & Media Consumption
    Description

    Introduction

    TMDB.org is a crowd-sourced movie information database used by many film-related consoles, sites and apps, such as XBMC, MythTV and Plex. Dozens of media managers, mobile apps and social sites make use of its API. TMDb lists some 80,000 films at time of writing, which is considerably fewer than IMDb. While not as complete as IMDb, it holds extensive information for most popular/Hollywood films. This is dataset of the 10,000 most popular movies across the world has been fetched through the read API. TMDB's free API provides for developers and their team to programmatically fetch and use TMDb's data. Their API is to use as long as you attribute TMDb as the source of the data and/or images. Also, they update their API from time to time.

    This data set is fetched using exception handling process so the data set contains some null values as there are missing fields in the tmdb database. Thought it's good for a young analyst to deal with messing value. Hey analyst are you all excited?

    Original Data Source: Popular Movies of IMDb

  6. o

    Wikipedia Movie Plot Collection

    • opendatabay.com
    .undefined
    Updated Jul 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Wikipedia Movie Plot Collection [Dataset]. https://www.opendatabay.com/data/ai-ml/624e3736-74ea-4f5c-9ee5-fda14c16c770
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 8, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Entertainment & Media Consumption
    Description

    This dataset contains movie plots extracted from Wikipedia, along with other key metadata. It is specifically curated for movies released between 1950 and 2023 that have accumulated over 1000 ratings on IMDb. The primary purpose of this dataset is to facilitate development in Large Language Models (LLMs) for applications such as movie searching or recommendation systems. The plot summaries have been meticulously cleaned to remove irrelevant elements like links and references, ensuring a pure text value. Where Wikipedia plots were unavailable, IMDb synopses were used as a fallback. The dataset includes 89% of movies with detailed plot information, while 100% include a short summary untouched from Wikipedia, which is useful for matching metadata in retriever applications. Columns like 'stars', 'directors', and 'genres' are provided as lists of values, making them suitable for direct loading into vector databases.

    Columns

    • title: The title of the film, presented in lowercase.
    • stars: The names of the actors featured in the film, also in lowercase.
    • directors: The names of the film's directors, in lowercase.
    • year: The year when the movie was released.
    • genre: The genres associated with the film, listed in lowercase.
    • runtime: The duration of the film, measured in minutes.
    • ratingCount: An indication of the film's popularity, showing the number of people who have rated it on IMDb.
    • plot: Detailed storyline of the film.
    • summary: A short overview and additional details about the film.
    • imdb_rating: The film's rating on IMDb, on a scale of 1 to 10.

    Distribution

    The data file is typically in CSV format. The dataset spans movies released from 1950 up to 2023. There are 20,617 unique movie titles, 21,596 unique star names, and 9,863 unique director names. The genres column contains 21,675 unique values. Movie runtimes range from -1 to 776 minutes, with a significant majority (17,433 entries) falling between 76.70 and 115.55 minutes. The number of ratings (ratingCount) varies widely, starting from 1,001 and going up to 2.73 million. IMDb ratings range from 1.2 to 9.3. While specific total row/record counts are not available, the distribution data for year, runtime, ratingCount, and imdb_rating show various value counts within different ranges.

    Usage

    This dataset is ideal for: * Developing demonstration projects leveraging Large Language Models (LLMs). * Creating movie search applications, such as the example of a movie searching app like cinemattr.ca. * Building retriever applications where the 'summary' column can be used for metadata matching. * Populating vector databases with structured information from 'stars', 'directors', and 'genres' for advanced querying and analysis.

    Coverage

    The dataset's geographic scope is global. It includes movies released within the time frame of 1950 to 2023. The data availability specifies that 89% of the movies have detailed plot information, and all movies (100%) include a short summary. The dataset focuses on films with more than 1000 ratings on IMDb.

    License

    CC0

    Who Can Use It

    This dataset is suitable for: * AI and machine learning developers who are building models based on natural language processing. * Data scientists and researchers interested in film data and entertainment analytics. * Software engineers developing applications that require movie plot summaries or metadata, such as recommendation engines. * Students and enthusiasts looking for high-quality, pre-processed text data for LLM projects.

    Dataset Name Suggestions

    • IMDb Verified Movie Plots
    • Historical Film Summaries (1950-2023)
    • Wikipedia Movie Plot Collection
    • LLM-Ready Movie Dataset
    • Global Cinema Plot Archive

    Attributes

    Original Data Source: Movie Plots from Wikipedia

  7. R

    Relational In-Memory Database Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Relational In-Memory Database Report [Dataset]. https://www.datainsightsmarket.com/reports/relational-in-memory-database-1978756
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    May 28, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The relational in-memory database (IMDB) market is experiencing robust growth, driven by the increasing demand for real-time analytics and applications requiring ultra-low latency data processing. The market, estimated at $15 billion in 2025, is projected to expand at a Compound Annual Growth Rate (CAGR) of 18% between 2025 and 2033, reaching approximately $60 billion by 2033. This growth is fueled by several key factors. Firstly, the rise of big data and the need for faster insights across various sectors like finance, healthcare, and telecommunications are propelling adoption. Secondly, advancements in technology, such as improved memory capacity and processing power, are making IMDBs more affordable and efficient. Finally, cloud computing platforms are playing a significant role, offering scalable and cost-effective IMDB solutions. Major players like Microsoft, IBM, Oracle, and Amazon are investing heavily in this space, leading to increased competition and innovation. While the market faces challenges such as data security concerns and the complexity of integrating IMDBs into existing systems, these are likely to be mitigated by continuous technological advancements and increasing industry expertise. Despite the overall positive outlook, market segmentation reveals distinct growth patterns. The financial services sector is currently the largest adopter of IMDB technology, followed by the telecommunications and healthcare industries. Geographic distribution shows that North America and Europe currently hold the largest market shares, but significant growth is anticipated in Asia-Pacific regions due to increasing digitalization and data generation. Challenges remain in ensuring data consistency and managing the potential cost overhead associated with high-memory requirements. However, the continuous development of efficient memory management techniques and the integration of IMDBs with advanced analytics tools are likely to address these concerns and further fuel market expansion. The long-term outlook for the relational in-memory database market remains exceptionally promising, suggesting consistent high-growth potential well into the next decade.

  8. Popular Movies of IMDb

    • kaggle.com
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sankha Subhra Mondal (2025). Popular Movies of IMDb [Dataset]. https://www.kaggle.com/sankha1998/tmdb-top-10000-popular-movies-dataset/notebooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 9, 2025
    Dataset provided by
    Kaggle
    Authors
    Sankha Subhra Mondal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    forthebadge

    Introduction

    TMDB.org is a crowd-sourced movie information database used by many film-related consoles, sites and apps, such as XBMC, MythTV and Plex. Dozens of media managers, mobile apps and social sites make use of its API. TMDb lists some 80,000 films at time of writing, which is considerably fewer than IMDb. While not as complete as IMDb, it holds extensive information for most popular/Hollywood films. This is dataset of the 10,000 most popular movies across the world has been fetched through the read API. TMDB's free API provides for developers and their team to programmatically fetch and use TMDb's data. Their API is to use as long as you attribute TMDb as the source of the data and/or images. Also, they update their API from time to time.

    This data set is fetched using exception handling process so the data set contains some null values as there are missing fields in the tmdb database. Thought it's good for a young analyst to deal with messing value.
    Hey analyst are you all excited?

  9. Full Netflix Dataset

    • kaggle.com
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OctopusTeam (2025). Full Netflix Dataset [Dataset]. https://www.kaggle.com/datasets/octopusteam/full-netflix-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 9, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    OctopusTeam
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides a comprehensive collection of all titles (Movies and TV Series) available on Netflix. In addition to basic information, it includes IMDb-specific data like IMDb ID, Average Rating, and Number of Votes.

    A dataset is updated daily at 10:00 AM CET. If you find this dataset helpful, feel free to give it an upvote! 😊

    You can find all our APIs, maintained and developed by us, at the following link: octopusteam.dev. These APIs provide access to various features and data, ensuring high-quality and reliable integration options for your needs.

    All Datasets:

  10. I

    In Memory Database Industry Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Feb 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). In Memory Database Industry Report [Dataset]. https://www.datainsightsmarket.com/reports/in-memory-database-industry-13053
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Feb 15, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Market Overview: The global in-memory database (IMDB) market is poised for substantial growth, with a projected CAGR of 19.00% from 2025 to 2033. The market size, valued at XX million in 2025, is attributed to the increasing adoption of IMDBs in various industries, including telecommunications, BFSI, logistics, retail, entertainment, and healthcare. Key drivers behind this growth include the need for real-time data processing, improved performance, and the rise of big data and analytics. Market Dynamics: The IMDB market is influenced by several trends and challenges. The growing adoption of cloud-based IMDB solutions is a key trend, as it provides flexibility and cost-effectiveness. However, security concerns and latency issues associated with cloud-based deployments pose challenges. Additionally, the increasing demand for high-performance computing and the need for faster data processing are driving the development of advanced IMDB technologies. The market is fragmented, with established players such as IBM, Oracle, and Microsoft competing alongside emerging startups like VoltDB and MemSQL. Regional variations in market maturity and adoption rates are also observed, with North America leading the way in terms of market penetration. Recent developments include: May 2022: IBM and SAP announced the extension of their collaboration as IBM embarks on a corporate transformation initiative to optimize its business operations using RISE and SAP S/4HANA Cloud. To execute work for over 1,000 legal entities in more than 120 countries and multiple IBM companies supporting hardware, software, consulting, and finance, IBM said it is transferring to SAP S/4HANA, SAP's most recent ERP system, as part of the extended relationship. The replacement for SAP R/3 and SAP ERP, SAP S/4HANA, is SAP's ERP system for large businesses. It is intended to work optimally with SAP's in-memory database, SAP HANA., November 2022: Redis, a provider of real-time in-memory databases, and Amazon Web Services have announced a multi-year strategic alliance. Redis is a networked, open-source NoSQL system that stores data on disk for durability before moving it to DRAM as necessary. It can function as a streaming engine, message broker, database, or cache. The business claims that when Redis is used as a database, apps may instantly search across tens of millions of rows of customer data to locate information specific to one particular customer. A managed database-as-a-service product on AWS is called the real-time Redis Enterprise Cloud., December 2022: The National Stock Exchange, the largest stock exchange in India, chose the Raima Database Manager (RDM) Workgroup 12.0 in-memory system as a foundational component for the next iterations of its trading platform front-end, the National Exchange for Automated Trading (NEAT).. Key drivers for this market are: Decreasing Hardware Cost, Increasing Penetration Of Trends Like Big Data And IOT; Increase In The Volume Of Data Generated And Shift Of Enterprise Operations. Potential restraints include: Resilience In Integration With VLDB'S. Notable trends are: Telecommunication End-User Industry to Hold Significant Market Share.

  11. f

    DataSheet1_Quantifying Award Network and Career Development in the Movie...

    • frontiersin.figshare.com
    pdf
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yixuan Liu; Yifang Ma (2023). DataSheet1_Quantifying Award Network and Career Development in the Movie Industry.pdf [Dataset]. http://doi.org/10.3389/fphy.2022.902890.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Yixuan Liu; Yifang Ma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In show business, awards are conferred to persons and films to provide incentives to performers’ future career development through periodic film festivals and events. In this work, we focused on exploring the growth and dynamics of the film award system, the structure of the award network, and the relationships between historical performance, collaborations, and future career success of performers in the movie industry. We collected data from IMDb, which covers more than 3.5K movie events for 520K individuals with their award-winning and career records for over 90 years. By using network analysis and regression models, we find several novel results. At first, we found the exponential proliferation of awards across all genres of films and all professions of individuals and the uneven distribution of the number of awards in careers across time. More than 30% of the performers have won multiple awards. Second, we built an award network to reveal the interlocks between awards based on multiple award-winning phenomena. We found that for prestigious awards, 47% of the linkages were over-representative than the expectations from the null model. Furthermore, the performers’ collaboration network was highly clustered, exhibiting a high propensity of linkages between awarded performers. Lastly, our regression models revealed that multiple factors were related to performers’ early career success and award winning. Specifically, we showed that along with the performers’ historical achievements, their collaborators serve an important role in award winning after being nominated, with the scope and depth of the impact differing in the awards’ prestige. This work has strong implications for the harmonious dynamics of the movie industry and the career development of performers.

  12. o

    Oppenheimer IMDb reviews

    • opendatabay.com
    .undefined
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Oppenheimer IMDb reviews [Dataset]. https://www.opendatabay.com/data/ai-ml/5fff8d2c-4db6-426f-9a39-64d7daa3059e
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 28, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Entertainment & Media Consumption
    Description

    "Oppenheimer," directed by the legendary Christopher Nolan, is set to grace theaters on July 21, 2023. This cinematic masterpiece offers an enthralling journey into history, recounting the extraordinary life of J. Robert Oppenheimer, a pivotal figure in the development of the atomic bomb during World War II.

    License

    CC0

    Original Data Source: Oppenheimer IMDb reviews

  13. FiveThirtyEight Biopics Dataset

    • kaggle.com
    Updated Mar 26, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FiveThirtyEight (2019). FiveThirtyEight Biopics Dataset [Dataset]. https://www.kaggle.com/fivethirtyeight/fivethirtyeight-biopics-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 26, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    FiveThirtyEight
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Content

    Biopics

    This folder contains the data behind the story 'Straight Outta Compton' Is The Rare Biopic Not About White Dudes.

    biopics.csv contains the following variables:

    VariableDefinition
    titleTitle of the film.
    siteURL from IMDB.
    countryCountry of origin.
    year_releasedYear of release.
    box_officeGross earnings at U.S. box office.
    directorDirector of film.
    number_of_subjectsThe number of subjects featured in the film.
    subjectThe actual name of the featured subject.
    type_of_subjectThe occupation of subject or reason for recognition.
    race_knownIndicates whether the subject’s race was discernible based on background of self, parent, or grandparent.
    subject_raceRace of the subject.
    person_of_colorDummy variable that indicates person of color.
    subject_sexSex of subject.
    lead_actor_actressThe actor or actress who played the subject.

    Source: IMDb.

    Context

    This is a dataset from FiveThirtyEight hosted on their GitHub. Explore FiveThirtyEight data using Kaggle and all of the data sources available through the FiveThirtyEight organization page!

    • Update Frequency: This dataset is updated daily.

    Acknowledgements

    This dataset is maintained using GitHub's API and Kaggle's API.

    This dataset is distributed under the Attribution 4.0 International (CC BY 4.0) license.

    Cover photo by Denisse Leon on Unsplash
    Unsplash Images are distributed under a unique Unsplash License.

  14. c

    Amazon prime tv shows and movies dataset

    • crawlfeeds.com
    csv, zip
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Amazon prime tv shows and movies dataset [Dataset]. https://crawlfeeds.com/datasets/amazon-prime-tv-shows-and-movies-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jul 4, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Amazon Prime TV Shows and Movies Dataset offered by Crawl Feeds is an extensive resource containing over 92,000 records in JSON format. This dataset encompasses a wide array of data points, including links, titles, descriptions, release dates, genres, posters, streaming platforms, countries, number of seasons, content ratings, IMDb ratings, cast and crew details, unique identifiers, and scraping timestamps. Such comprehensive information is invaluable for researchers, data analysts, and developers aiming to conduct in-depth analyses, develop recommendation systems, or explore trends within Amazon Prime's content library.

    For those interested in broader media datasets, Crawl Feeds also offers the Movies and TV Shows Dataset, which includes 118,000 records, and the IMDb Movie Details Dataset, comprising 250,000 records. These datasets provide extensive information across various platforms, facilitating comparative studies and cross-platform analyses.

    Integrating these datasets into your projects can significantly enhance the depth and quality of your analyses, providing a robust foundation for exploring various facets of the entertainment industry. Whether you're developing a new application, conducting market research, or performing academic studies, these datasets serve as a valuable resource for gaining insights into the dynamic world of streaming media.

    Explore the Amazon Prime TV Shows and Movies Dataset and other related datasets on Crawl Feeds to elevate your data-driven projects.

  15. h

    imdb_ckb

    • huggingface.co
    Updated Aug 29, 2009
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Razhan Hameed (2009). imdb_ckb [Dataset]. https://huggingface.co/datasets/razhan/imdb_ckb
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 29, 2009
    Authors
    Razhan Hameed
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Card for IMDB Kurdish

      Dataset Summary
    

    Central Kurdish translation of the famous IMDB movie reviews dataset. The dataset contains 50K highly polar movie reviews, divided into two equal classes of positive and negative reviews. We can perform binary sentiment classification using this dataset. The availability of datasets in Kurdish, such as the IMDB movie reviews dataset, can help researchers and developers train and evaluate machine learning models for Kurdish… See the full description on the dataset page: https://huggingface.co/datasets/razhan/imdb_ckb.

  16. o

    Oppenheimer Film Audience Sentiment Data

    • opendatabay.com
    .undefined
    Updated Jul 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Oppenheimer Film Audience Sentiment Data [Dataset]. https://www.opendatabay.com/data/consumer/5fff8d2c-4db6-426f-9a39-64d7daa3059e
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 8, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Entertainment & Media Consumption
    Description

    This dataset provides IMDb user reviews for Christopher Nolan's highly anticipated film "**Oppenheimer**," which premiered on July 21, 2023. The film offers an engaging journey into history, recounting the extraordinary life of J. Robert Oppenheimer, a pivotal figure in the development of the atomic bomb during World War II. This collection of reviews allows for an insightful examination of public sentiment and audience reactions to this cinematic masterpiece.

    Columns

    • Title: The title given by the user to their review.
    • Rating: The numerical score assigned by the user, expressed out of a maximum of 10.0.
    • Review: The full textual content of the user's opinion or critique.

    Distribution

    The dataset is presented in a tabular format, comprising individual user reviews linked with their respective ratings. It contains 2445 entries or rows. The ratings span from 1.00 to 10.00, with a significant proportion of scores concentrated in the higher ranges. While specific file type details are not provided, data files of this nature are typically available in formats such as CSV.

    Usage

    This dataset is ideally suited for: * Analysing audience sentiment and public opinion regarding the film "Oppenheimer." * Performing Natural Language Processing (NLP) tasks on unstructured movie review text, such as topic modelling or entity extraction. * Developing and evaluating sentiment analysis models to predict review polarity. * Visualising movie ratings distribution and identifying trends in audience reception. * Academic and market research into film criticism, audience engagement, and the public's response to historical dramas.

    Coverage

    • Geographic Scope: Global, reflecting the worldwide reach of IMDb users.
    • Time Range: The reviews are specifically for the film "Oppenheimer," released on July 21, 2023. The dataset itself was listed on 27 June 2025, suggesting it captures reactions around or after its release.
    • Demographic Scope: Comprises user-submitted reviews from the IMDb platform; specific demographic breakdowns of reviewers are not included in the dataset details.

    License

    CC0

    Who Can Use It

    • Data Scientists and Analysts: To train machine learning models for text classification, sentiment analysis, and recommender systems.
    • Film Critics and Researchers: To gain deeper insights into audience perceptions, identify recurring themes in feedback, and study the social impact of historical films.
    • Academics: For studies on online review platforms, collective intelligence, and the dynamics of public opinion in entertainment.
    • Marketing and Media Professionals: To understand audience reception, inform promotional strategies, and identify key discussion points surrounding the film.

    Dataset Name Suggestions

    • Oppenheimer IMDb Reviews Dataset
    • Oppenheimer Film Audience Sentiment Data
    • Christopher Nolan's Oppenheimer User Reviews
    • Movie Ratings: Oppenheimer Audience Feedback

    Attributes

    Original Data Source: Oppenheimer IMDb reviews

  17. 350 000+ movies from themoviedb.org

    • kaggle.com
    zip
    Updated Oct 12, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephanerappeneau (2017). 350 000+ movies from themoviedb.org [Dataset]. https://www.kaggle.com/stephanerappeneau/350-000-movies-from-themoviedborg
    Explore at:
    zip(70483259 bytes)Available download formats
    Dataset updated
    Oct 12, 2017
    Authors
    Stephanerappeneau
    Description

    Context

    I love movies.

    I tend to avoid marvel-transformers-standardized products, and prefer a mix of classic hollywood-golden-age and obscure polish artsy movies. Throw in an occasional japanese-zombie-slasher-giallo as an alibi. Good movies don't exist without bad movies.

    On average I watch 200+ movies each year, with peaks at more than 500 movies. Nine years ago I started to log my movies to avoid watching the same movie twice, and also assign scores. Over the years, it gave me a couple insights on my viewing habits but nothing more than what a tenth-grader would learn at school.

    I've recently suscribed to Netflix and it pains me to see the global inefficiency of recommendation systems for people like me, who mostly swear by "La politique des auteurs". It's a term coined by famous new-wave french movie critic André Bazin, meaning that the quality of a movie is essentially linked to the director and it's capacity to execute his vision with his crew. We could debate it depends on movie production pipeline, but let's not for now. Practically, what it means, is that I essentially watch movies from directors who made films I've liked.

    I suspect Neflix calibrate their recommandation models taking into account the way the "average-joe" chooses a movie. A few months ago I had read a study based on a survey, showing that people chose a movie mostly based on genre (55%), then by leading actors (45%). Director or Release Date were far behind around 10% each. It is not surprising, since most people I know don't care who the director is. Lots of US blockbusters don't even mention it on the movie poster. I am aware that collaborative filtering is based on user proximity , which I believe decreases (or even eliminates) the need to characterize a movie. So here I'm more interested in content based filtering which is based on product proximity for several reasons :

    • Users tastes are not easily accessible. It is, after all, Netflix treasure chest

    • Movie offer on Netflix is so bad for someone who likes author's movies that it wouldn't help

    • Modeling a movie intrinsic qualities is a nice challenge

    Enough.

    "*The secret of getting ahead is getting started*" (Mark Twain)

    https://img11.hostingpics.net/pics/117765networkgraph.png" alt="network graph">

    Content

    The primary source is www.themoviedb.org. If you watch obscure artsy romanian homemade movies you may find only 95% of your movies referenced...but for anyone else it should be in the 98%+ range.

    Here is overview of the available sources that I've tried :

    • Imdb.com free csv dumps (ftp://ftp.funet.fi/pub/mirrors/ftp.imdb.com/pub/temporaryaccess/) are badly documented, incomplete, loosely structured and impossible to join/merge. There's an API hosted by Amazon Web Service : 1€ every 100 000 requests. With around 1 million movies, it could become expensive also features are bare. So I've searched for other sources.

    • www.themoviedb.org is based on crowdsourcing and has an excellent API, limited to 40 requests every 10 seconds. It is quite generous, well documented, and enough to sweep the 450 000 movies in a few days. For my purpose, data quality is not significantly worse than imdb, and as imdb key is also included there's always the possibility to complete my dataset later (I actually did it)

    • www.Boxofficemojo.com has some interesting budget/revenue figures (which are sorely lacking in both imdb & tmdb), but it actually tracks only a few thousand movies, mainly blockbusters. There are other professional sources that are used by film industry to get better predictive / marketing insights but that's beyond my reach for this experiment.

    • www.wikipedia.com is an interesting source with no real cap on API calls, however it requires a bit of webscraping and for movies or directors the layout and quality varies a lot, so I suspected it'd get a lot of work to get insights so I put this source in lower priority.

    • www.google.com will ban you after a few minutes of web scraping because their job is to scrap data from others, than sell it, duh.

    • It's worth mentionning that there are a few dumps of Netflix anonymized user tastes on kaggle, because they've organised a few competitions to improve their recommendation models. https://www.kaggle.com/netflix-inc/netflix-prize-data

    • Online databases are largely white anglo-saxon centric, meaning bollywood (India is the 2nd bigger producer of movies) offer is mostly absent from datasets. I'm fine with that, as it's not my cup of tea plus I lack domain knowledge. The sheer amount of indian movies would probably skew my results anyway (I don't want to have too many martial-arts-musicals in my recommendations ;-)). I have, however, tremendous respect for indian movie industry so I'd love to collaborate with an indian cinephile ! https://img11.hostingpics.net/pics/340226westerns.png" alt="Westerns">

    Inspiration

    Starting from there, I had multiple problem statements for both supervised / unsupervised machine learning

    • Can I program a tailored-recommendation system based on my own criteria ?

    • What are the characteristics of movies/directors I like the most ?

    • What is the probability that I will like my next movie ?

    • Can I find the data ?

    One of the objectives of sharing my work here is to find cinephile data-scientists who might be interested and, hopefully, contribute or share insights :) Other interesting leads : use tagline for NLP/Clustering/Genre guessing, leverage on budget/revenue, link with other data sources using the imdb normalized title, etc.

    https://img11.hostingpics.net/pics/977004matrice.png" alt="Correlation matrix">

    Motivation, Disclaimer and Acknowledgements

    • I've graduated from an french engineering school, majoring in artificial intelligence, but that was 17 years ago right in the middle of A.I-winter. Like a lot of white male rocket scientists, I've ended up in one of the leading european investment bank, quickly abandonning IT development to specialize in trading/risk project management and internal politics. My recent appointment in the Data Office made me aware of recent breakthroughts in datascience, and I thought that developing a side project would be an excellent occasion to learn something new. Plus it'd give me a well-needed credibility which too often lack decision makers when it comes to datascience.

    • I've worked on some of the features with Cédric Paternotte, a fellow friend of mine who is a professor of philosophy of sciences in La Sorbonne. Working with someone with a different background seem a good idea for motivation, creativity and rigor.

    • Kudos to www.themoviedb.org or www.wikipedia.com sites, who really have a great attitude towards open data. This is typically NOT the case of modern-bigdata companies who mostly keep data to themselves to try to monetize it. Such a huge contrast with imdb or instagram API, which generously let you grab your last 3 comments at a miserable rate. Even if 15 years ago this seemed a mandatory path to get services for free, I predict one day governments will need to break this data monopoly.

    [Disclaimer : I apologize in advance for my engrish (I'm french ^-^), any bad-code I've written (there are probably hundreds of way to do it better and faster), any pseudo-scientific assumption I've made, I'm slowly getting back in statistics and lack senior guidance, one day I regress a non-stationary time series and the day after I'll discover I shouldn't have, and any incorrect use of machine-learning models]

    https://img11.hostingpics.net/pics/898068408x161poweredbyrectanglegreen.png" alt="powered by themoviedb.org">

  18. Z

    Data from: ACTIV-ES: a comparable Spanish corpus comprised of film dialogue...

    • data.niaid.nih.gov
    • live.european-language-grid.eu
    • +1more
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jerid Francom (2020). ACTIV-ES: a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1492612
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Jerid Francom
    License

    https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

    Description

    DESCRIPTION: ACTIV-ES is a comparable Spanish corpus comprised of film dialogue from Argentine, Mexican and Spanish productions. Titles for each of these three countries were seeded from the Internet Movie Database, subtitle data for the hearing impaired was provided by Opensubtitles.org and was post-processed to correct/remove subtitle, OCR and diacritic artifacts and annotated for part-of-speech.

    The data is available in two main formats: 1) running text for each document and 2) 1:5 gram aggregate files. Each format includes a plain text and part-of-speech annotated version. Document names reflect the language code, country, year, title, type, genre (first genre listed in the IMDb), and IMDb ID.

    For more information about the development and evaluation of these resources and to cite this work refer to:

    Francom, J., Hulden, M. and Ussishkin, A.. (2014) ACTIV-ES: a comparable, cross-dialect corpus of 'everyday' Spanish from Argentina, Mexico, and Spain. In Proceedings of the Ninth Annual Language Resources and Evaluation Conference, Reykjavik, Iceland. European Language Resources Association (ELRA).

    In version .02 of the tagged running format corpus in the /eagles directory has been added which includes the EAGLES tagset. This tagset is much more fleshed out than the simplified tagset in the /tagged directory. For information on the tagset refer here: http://nlp.lsi.upc.edu/freeling/doc/tagsets/tagset-es.html.

  19. Yelp Text Sentiment Analysis 2015

    • kaggle.com
    Updated Feb 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dev Makwana (2024). Yelp Text Sentiment Analysis 2015 [Dataset]. https://www.kaggle.com/datasets/channingfisher/yelp-text-sentiment-analysis-2015
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2024
    Dataset provided by
    Kaggle
    Authors
    Dev Makwana
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    This dataset contains sentences labelled with positive or negative sentiment, extracted from reviews of products, movies, and restaurants.

    UPDATE: Newer Version includes similar data from amazon, imdb.

    Format:

    sentence \t score

    =======

    Details:

    Score is either 1 (for positive) or 0 (for negative)

    The source for these sentences is: yelp.com

    This dataset is an extract of a dataset created for the Paper 'From Group to Individual Labels using Deep Features', Kotzias et. al,. KDD 2015.

  20. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Crawl Feeds (2025). IMDB movie details dataset [Dataset]. https://crawlfeeds.com/datasets/imdb-movie-details-dataset

IMDB movie details dataset

IMDB movie details dataset from imdb.com

Explore at:
zip, csvAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Crawl Feeds
License

https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

Description
The IMDB Movie Details Dataset is a comprehensive collection of movie datasets that offers a treasure trove of information about movies, TV shows, and streaming content listed on IMDB. This dataset includes detailed data such as titles, release years, genres, cast, crew, ratings, and more, making it a go-to resource for film and entertainment enthusiasts. Ideal for data analysis, IMDB movie dataset applications span machine learning projects, predictive modeling, and insights into industry trends.
Researchers can explore patterns in movie ratings and genre popularity, while developers can use the dataset to build recommendation systems or applications. Movie buffs can dive deep into historical and contemporary trends in the world of cinema. This dataset not only supports academic and professional pursuits but also opens doors for creative projects in storytelling, content creation, and audience engagement. Whether you’re a developer, researcher, or film enthusiast, the IMDB movie dataset is a powerful tool for uncovering trends and gaining deeper insights into the evolving entertainment landscape.
Search
Clear search
Close search
Google apps
Main menu