6 datasets found
  1. T

    imdb_reviews

    • tensorflow.org
    Updated Sep 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). imdb_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/imdb_reviews
    Explore at:
    Dataset updated
    Sep 20, 2024
    Description

    Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('imdb_reviews', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  2. i

    IMDb Movie Reviews Dataset

    • ieee-dataport.org
    Updated Aug 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aditya Pal (2022). IMDb Movie Reviews Dataset [Dataset]. https://ieee-dataport.org/open-access/imdb-movie-reviews-dataset
    Explore at:
    Dataset updated
    Aug 2, 2022
    Authors
    Aditya Pal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R

  3. IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage)

    • crawlfeeds.com
    csv, zip
    Updated Aug 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage) [Dataset]. https://crawlfeeds.com/datasets/imdb-movies-metadata-dataset-4-5m-records-global-coverage
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Aug 10, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Unlock one of the most comprehensive movie datasets available—4.5 million structured IMDb movie records, extracted and enriched for data science, machine learning, and entertainment research.

    This dataset includes a vast collection of global movie metadata, including details on title, release year, genre, country, language, runtime, cast, directors, IMDb ratings, reviews, and synopsis. Whether you're building a recommendation engine, benchmarking trends, or training AI models, this dataset is designed to give you deep and wide access to cinematic data across decades and continents.

    Perfect for use in film analytics, OTT platforms, review sentiment analysis, knowledge graphs, and LLM fine-tuning, the dataset is cleaned, normalized, and exportable in multiple formats.

    What’s Included:

    • Genres: Drama, Comedy, Horror, Action, Sci-Fi, Documentary, and more

    • Delivery: Direct download

    Use Cases:

    • Train LLMs or chatbots on cinematic language and metadata

    • Build or enrich movie recommendation engines

    • Run cross-lingual or multi-region film analytics

    • Benchmark genre popularity across time periods

    • Power academic studies or entertainment dashboards

    • Feed into knowledge graphs, search engines, or NLP pipelines

  4. Simplified MM-IMDb

    • kaggle.com
    Updated Dec 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Javier Ureña (2024). Simplified MM-IMDb [Dataset]. https://www.kaggle.com/datasets/javierurea/simplified-mm-imdb
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 5, 2024
    Dataset provided by
    Kaggle
    Authors
    Javier Ureña
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The original dataset, contribution from Arevalo et al. in their work Gated Multimodal Units for Information Fusion can be downloaded from their git repository where you can also make use of the web-scrapping scripts they used to create it. From there you can download the hdf5 file and metadata.

    The main problem is that this dataset contains data that in many cases is not necessary, for example the image latent features, the words n-grams, imdb ids... Furthermore, the poster captions are already tokenized, so if you want to see the real text then you must apply the ix_to_word dictionary from the metadata, which adds an extra step if you are trying different word tokenizers. The hdf5 file ends up being 15.6GB, plus the metadata npy file which is 65MB, makes a rather big dataset to meddle with if you really want to just use the minimal information.

    Simplified MM-IMDb only has two files: - data.npy (18.1MB). Stores image index, one-hot encoding of the genre, and the caption/description of the poster. - images.npz (3.2GB). Stores all dataset images as numpy arrays.

    With this dataset you can start training your multimodal models for multi-class classification, modality alignment, Masked-Language-Modelling, caption-based image retrieval, visual question answering, and many more.

  5. Scooby Doo Episodes

    • kaggle.com
    Updated Nov 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Scooby Doo Episodes [Dataset]. https://www.kaggle.com/datasets/thedevastator/the-unsolved-mysteries-of-scooby-doo
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2022
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Scooby Doo Episodes

    A Dataset for exploration of the best and worst Scooby Doo episodes

    About this dataset

    Scooby-Doo is one of the most iconic cartoon characters of all time. The lovable Great Dane and his human friends have been solving mysteries and catching bad guys for over 50 years.

    This dataset contains information on every Scooby-Doo episode and movie, including the title, air date, run time, and various other variables. It took me over a year to watch every Scooby-Doo iteration and track every variable. Many values are subjective by nature of watching but I tried my hardest to keep the data collection consistent.

    If you plan to use this data for anything school/entertainment related you are free to (credit is always welcome)

    How to use the dataset

    To use this dataset, simply download it and then import it into your preferred software program. Once you have imported the dataset, you can then begin to analyze the data.

    There are a number of different ways that you can analyze this data. For example, you could look at the distribution of Scooby Doo episodes by season, or by year. You could also look at the popularity of different Scooby Doo characters by looking at how often they are mentioned in the dataset.

    This dataset is a great resource for anyone interested in Scooby Doo, or in analyzing television data more generally. Enjoy!

    Research Ideas

    -Using the IMDB rating, run time, and engagement score, predict how much I will enjoy an episode/movie. -Determine which network airs the best Scooby-Doo content based on average IMDB rating and engagement score. -Analyze the impact of gender on catch rate for monsters/culprits

    Acknowledgements

    License

    License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) - You are free to: - Share - copy and redistribute the material in any medium or format for non-commercial purposes only. - Adapt - remix, transform, and build upon the material for non-commercial purposes only. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - You may not: - Use the material for commercial purposes.

    Columns

    File: scoobydoo.csv | Column name | Description | |:-----------------------------|:----------------------------------------------------------------------------------------| | level_0 | The level of the episode or movie. (Numeric) | | series_name | The name of the series the episode or movie is from. (String) | | network | The network the episode or movie aired on. (String) | | season | The season of the series the episode or movie is from. (Numeric) | | title | The title of the episode or movie. (String) | | imdb | The IMDB rating of the episode or movie. (Numeric) | | engagement | The engagement rating of the episode or movie. (Numeric) | | date_aired | The date the episode or movie aired. (Date) | | run_time | The run time of the episode or movie. (Time) | | format | The format of the episode or movie. (String) | | monster_name | The name of the monster in the episode or movie. (String) | | monster_gender | The gender of the monster in the episode or movie. (String) | | monster_type | The type of monster in the episode or movie. (String) | | monster_subtype | The subtype of monster in the episode or movie. (String) | | monster_species | The species of monster in the episode or movie. (String) | | monster_real | Whether the monster is real or not. (Boolean) | | monster_amount | The number of monsters in the episode or movie. (Numeric) ...

  6. n

    Data from: Collaboration Networks

    • networkrepository.com
    csv
    Updated Jan 6, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Network Data Repository (2017). Collaboration Networks [Dataset]. https://networkrepository.com/ca.php
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 6, 2017
    Dataset authored and provided by
    Network Data Repository
    License

    https://networkrepository.com/policy.phphttps://networkrepository.com/policy.php

    Description

    Co-authoship networks, collaboration networks, collaboration graphs, communication networks, email networks, IMDB, aminer data, DBLP data, network science co-authorship network, citeseer, HepPh, CondMat, download information networks, collaboration graph data

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). imdb_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/imdb_reviews

imdb_reviews

Explore at:
29 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Sep 20, 2024
Description

Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('imdb_reviews', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

Search
Clear search
Close search
Google apps
Main menu