100+ datasets found
  1. h

    Data from: imdb

    • huggingface.co
    Updated Aug 3, 2003
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford NLP (2003). imdb [Dataset]. https://huggingface.co/datasets/stanfordnlp/imdb
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 3, 2003
    Dataset authored and provided by
    Stanford NLP
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Card for "imdb"

      Dataset Summary
    

    Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

      Supported Tasks and Leaderboards
    

    More Information Needed

      Languages
    

    More Information Needed

      Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/imdb.
    
  2. h

    Data from: imdb

    • huggingface.co
    Updated May 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    scikit-learn (2025). imdb [Dataset]. https://huggingface.co/datasets/scikit-learn/imdb
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 10, 2025
    Dataset authored and provided by
    scikit-learn
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    This is the sentiment analysis dataset based on IMDB reviews initially released by Stanford University. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided. See the README file contained in the release for more… See the full description on the dataset page: https://huggingface.co/datasets/scikit-learn/imdb.

  3. ImDb Movie Reviews Dataset

    • kaggle.com
    zip
    Updated Sep 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nidhi Mantri (2019). ImDb Movie Reviews Dataset [Dataset]. https://www.kaggle.com/datasets/mantri7/imdb-movie-reviews-dataset
    Explore at:
    zip(26921499 bytes)Available download formats
    Dataset updated
    Sep 12, 2019
    Authors
    Nidhi Mantri
    Description

    Context

    This is the IMDB dataset exactly same as ImDb Movie Reviews Dataset, contains the movie reviews.

    Content

    The real dataset contains text files for training and testing purpose, but I created two csv files from those text files to ease the task ✌️ . Now you only need to download and apply your model. Each file contains 25000 reviews with label 0 for negative and 1 for positive. Each file has two columns 0 and 1, 0 represents reviews and 1 represents labels.

  4. IMDB movie details dataset

    • crawlfeeds.com
    csv, zip
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). IMDB movie details dataset [Dataset]. https://crawlfeeds.com/datasets/imdb-movie-details-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description
    The IMDB Movie Details Dataset is a comprehensive collection of movie datasets that offers a treasure trove of information about movies, TV shows, and streaming content listed on IMDB. This dataset includes detailed data such as titles, release years, genres, cast, crew, ratings, and more, making it a go-to resource for film and entertainment enthusiasts. Ideal for data analysis, IMDB movie dataset applications span machine learning projects, predictive modeling, and insights into industry trends.
    Researchers can explore patterns in movie ratings and genre popularity, while developers can use the dataset to build recommendation systems or applications. Movie buffs can dive deep into historical and contemporary trends in the world of cinema. This dataset not only supports academic and professional pursuits but also opens doors for creative projects in storytelling, content creation, and audience engagement. Whether you’re a developer, researcher, or film enthusiast, the IMDB movie dataset is a powerful tool for uncovering trends and gaining deeper insights into the evolving entertainment landscape.
  5. h

    mini-imdb

    • huggingface.co
    Updated Sep 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Vila (2022). mini-imdb [Dataset]. https://huggingface.co/datasets/dvilasuero/mini-imdb
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 30, 2022
    Authors
    Daniel Vila
    Description

    dvilasuero/mini-imdb dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    imdb-movie-reviews

    • huggingface.co
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ajay Karthick Senthil Kumar (2023). imdb-movie-reviews [Dataset]. https://huggingface.co/datasets/ajaykarthick/imdb-movie-reviews
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2023
    Authors
    Ajay Karthick Senthil Kumar
    Description

    IMDB Movie Reviews

    This is a dataset for binary sentiment classification containing substantially huge data. This dataset contains a set of 50,000 highly polar movie reviews for training models for text classification tasks. The dataset is downloaded from https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz This data is processed and splitted into training and test datasets (0.2% test split). Training dataset contains 40000 reviews and test dataset contains 10000… See the full description on the dataset page: https://huggingface.co/datasets/ajaykarthick/imdb-movie-reviews.

  7. IMDb Movie and Crew Data

    • kaggle.com
    Updated Jan 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). IMDb Movie and Crew Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/imdb-movie-and-crew-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 16, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    IMDb Movie and Crew Data

    Insights into Movie Performance and Crew Performance

    By mahesh [source]

    About this dataset

    This IMDb Movies dataset contains information about some of the most beloved and critically praised films of all time. It includes a variety of features, such as the movie's title, original title, year published, date released, genre, duration in minutes, country of origin, language spoken in the movie, director and writer credits, production company responsible for its creation and distribution. Additionally we've included field descriptions for each actor involved as well members member who had a role in its makeup or promotion. Along with these fields we can also see detailed reviews from users and critics alike regarding the film’s basis; thereby providing a comprehensive set to evaluate how different generations have rated it throughout the years. Our selection even offers a description field offering viewers an intimate peek into its plot line before watching if desired! Finally you can discover what kind of budget was appropriated to make this movie possible along with gross income both domestically and globally worldwide! So grab your popcorn and search within this dataset today to find out more info on some classic cinematic favorites!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    In order to use this dataset properly, it is important to become familiar with the columns that make up the data set. The columns include: title, original_title, year, date_published ,genre, duration, country , language , director , writer , production_company , actors , :description avg_vote votes budget usa_gross income metascore reviews from users reviews from critics .

    By studying the various columns in this dataset you can discover trends in movies over time such as genres gaining in popularity or budgets increasing or decreasing annually. Additionally you can compare productions companies or directors over time to see how their output has changed or if they produce consistently well-regarded content. Finally by looking at actors over time you can track whether particular actors have experienced ups and downs in their career as well as seeing which actors have remained popular for extended periods of times thanks to larger bodies of work.

    With so many data points available it is easy to come up with dozens of questions that this dataset could help answer about movies both past present & future! Have fun exploring!

    Research Ideas

    • Identifying movie trends in different countries, such as genre preference and budget size.
    • Studying how aspects of the movie, such as actors, writers and crew, influence ratings and gross income.
    • Analysing reviews from critics and users to understand correlations between reviews and metascores or vote values

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: IMDb names.csv | Column name | Description | |:-----------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | title | The title of the movie. (String) | | original_title | The original title of the movie (in case it was changed in other languages)...

  8. IMDB & Social Media Dataset

    • kaggle.com
    Updated Nov 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    momo5577 (2023). IMDB & Social Media Dataset [Dataset]. https://www.kaggle.com/datasets/momo5577/imdb-and-social-media-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 5, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    momo5577
    Description

    This dataset is compiled using this dataset from GitHub.

    Data Description Table

    Variable NameDescription
    movie_titleTitle of the Movie
    durationDuration in minutes
    director_nameName of the Director of the Movie
    director_facebook_likesNumber of likes of the Director on his Facebook Page
    actor_1_namePrimary actor starring in the movie
    actor_1_facebook_likesNumber of likes of the Actor_1 on his/her Facebook Page
    actor_2_nameOther actor starring in the movie
    actor_2_facebook_likesNumber of likes of the Actor_2 on his/her Facebook Page
    actor_3_nameOther actor starring in the movie
    actor_3_facebook_likesNumber of likes of the Actor_3 on his/her Facebook Page
    num_user_for_reviewsNumber of users who gave a review
    num_critic_for_reviewsNumber of critical reviews on imdb
    num_voted_usersNumber of people who voted for the movie
    cast_total_facebook_likesTotal number of facebook likes of the entire cast of the movie
    movie_facebook_likesNumber of Facebook likes in the movie page
    plot_keywordsKeywords describing the movie plot
    facenumber_in_posterNumber of the actor who featured in the movie poster
    colorFilm colorization. ‘Black and White’ or ‘Color’
    genresFilm categorization like ‘Animation’, ‘Comedy’, etc
    title_yearThe year in which the movie is released (1916:2016)
    languageLanguages like English, Arabic, Chinese, etc
    countryCountry where the movie is produced
    content_ratingContent rating of the movie
    aspect_ratioAspect ratio the movie was made in
    movie_imdb_linkIMDB link of the movie
    grossGross earnings of the movie in Dollars
    budgetBudget of the movie in Dollars
    imdb_scoreIMDB Score of the movie on IMDB
  9. IMDB Movie Reviews for Sentiment Analysis

    • kaggle.com
    Updated May 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tuan Nguyen (2024). IMDB Movie Reviews for Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/tuannguyen8531/imdb-movie-reviews-for-sentiment-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 27, 2024
    Dataset provided by
    Kaggle
    Authors
    Tuan Nguyen
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    IMDB Movie Reviews for Sentiment Analysis

    Description:

    This dataset is a collection used for sentiment analysis of movie comments on the IMDB website. This dataset includes comments from IMDB users classified into three different sentiment levels: negative, neutral, and positive. Each comment is accompanied by a sentiment label.

    Columns:

    • sentiment: The sentiment label of the comment: 0 (negative), 1 (neutral), 2 (positive).
    • review: The content of the comment.

    Stopwords:

    The file stopwords.txt contains a list of common words that are often removed during text preprocessing. These words are specifically designed for sentiment analysis.

    Purpose of use:

    This dataset is used to train and evaluate deep learning models in the task of classifying sentiment of movie comments from IMDB into negative, neutral, and positive groups.

  10. h

    IMDb-Dataset

    • huggingface.co
    Updated Oct 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahil (2025). IMDb-Dataset [Dataset]. https://huggingface.co/datasets/labofsahil/IMDb-Dataset
    Explore at:
    Dataset updated
    Oct 16, 2025
    Dataset authored and provided by
    Sahil
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    title.akas.csv

    titleId (string) - a tconst, an alphanumeric unique identifier of the title ordering (integer) – a number to uniquely identify rows for a given titleId title (string) – the localized title region (string) - the region for this version of the title language (string) - the language of the title types (array) - Enumerated set of attributes for this alternative title. One or more of the following: "alternative", "dvd", "festival", "tv", "video", "working", "original"… See the full description on the dataset page: https://huggingface.co/datasets/labofsahil/IMDb-Dataset.

  11. h

    IMDB-Dataset-of-50K-Movie-Reviews-Backup

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Q-b1t, IMDB-Dataset-of-50K-Movie-Reviews-Backup [Dataset]. https://huggingface.co/datasets/Q-b1t/IMDB-Dataset-of-50K-Movie-Reviews-Backup
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Q-b1t
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Q-b1t/IMDB-Dataset-of-50K-Movie-Reviews-Backup dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. IMDb Movies

    • kaggle.com
    Updated Dec 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elvin Rustamov (2023). IMDb Movies [Dataset]. https://www.kaggle.com/datasets/elvinrustam/imdb-movies-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 7, 2023
    Dataset provided by
    Kaggle
    Authors
    Elvin Rustamov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    IMDb Movies Dataset

    Start Date: November 29, 2023

    Finish Date: December 1, 2023

    This dataset was scraped based on the popularity of IMDb movies (highest to lowest popularity).

    There are total 9083 movies in the dataset.

    !UNCLEAN VERSION: IMDbMovies

    About Features:

    Title: The name of the movie.

    Summary: A brief overview of the movie's plot.

    Director: The person responsible for overseeing the creative aspects of the film.

    Writer: The individual who crafted the screenplay and story for the movie.

    Main Genres: The primary categories or styles that the movie falls under.

    Motion Picture Rating: The age-appropriate classification for viewers.

    Motion Picture Rating Categories:

    • G (General Audience): Suitable for all ages; no offensive content.

    • PG (Parental Guidance): May contain mild language, violence, or thematic elements; parental guidance advised.

    • PG-13 (Parents Strongly Cautioned): Some material may be inappropriate for those under 13; more intense violence, language, or suggestive content.

    • R (Restricted): Restricted to viewers over 17 or 18; may contain adult themes, strong language, sexual content, or violence.

    • NC-17 (Adults Only): Restricted to adults 17 and older; may contain explicit sexual content or graphic violence.

    Runtime: The total duration of the movie.

    Release Year: The year in which the movie was officially released.

    Rating: The average score given to the movie by viewers.

    Number of Ratings: The total count of ratings submitted by viewers.

    Budget: The estimated cost of producing the movie.

    Gross in US & Canada: The total earnings from the movie's screening in the United States and Canada.

    Gross worldwide: The overall worldwide earnings of the movie.

    Opening Weekend Gross in US & Canada: The amount generated during the initial weekend of the movie's release in the United States and Canada.

    !CLEAN VERSION: IMDbMovies-Clean

    What I did:

    • I keep all missing values. Most of the cases missing values stem from lack of information in the website. There is few cases missing values stem from scraper. For example: Some movies will release in 2024 and there are no runtimes and ratings for these movies.

    • I changed the syntax of the 'Runtime', 'Rating', 'Number of Ratings', 'Budget', 'Gross in US & Canada', 'Gross worldwide', and 'Opening Weekend Gross in US & Canada' columns.

    • In some cases, I utilized the information from a single column to create two separate columns.

  13. IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage)

    • crawlfeeds.com
    csv, zip
    Updated Aug 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage) [Dataset]. https://crawlfeeds.com/datasets/imdb-movies-metadata-dataset-4-5m-records-global-coverage
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Aug 10, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Unlock one of the most comprehensive movie datasets available—4.5 million structured IMDb movie records, extracted and enriched for data science, machine learning, and entertainment research.

    This dataset includes a vast collection of global movie metadata, including details on title, release year, genre, country, language, runtime, cast, directors, IMDb ratings, reviews, and synopsis. Whether you're building a recommendation engine, benchmarking trends, or training AI models, this dataset is designed to give you deep and wide access to cinematic data across decades and continents.

    Perfect for use in film analytics, OTT platforms, review sentiment analysis, knowledge graphs, and LLM fine-tuning, the dataset is cleaned, normalized, and exportable in multiple formats.

    What’s Included:

    • Genres: Drama, Comedy, Horror, Action, Sci-Fi, Documentary, and more

    • Delivery: Direct download

    Use Cases:

    • Train LLMs or chatbots on cinematic language and metadata

    • Build or enrich movie recommendation engines

    • Run cross-lingual or multi-region film analytics

    • Benchmark genre popularity across time periods

    • Power academic studies or entertainment dashboards

    • Feed into knowledge graphs, search engines, or NLP pipelines

  14. t

    IMDB: Movie Reviews - Dataset - LDM

    • service.tib.eu
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). IMDB: Movie Reviews - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/imdb--movie-reviews
    Explore at:
    Dataset updated
    Dec 3, 2024
    Description

    1000 randomly sampled movie reviews from IMDB dataset

  15. A

    ‘IMDB Movies Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘IMDB Movies Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-imdb-movies-dataset-f301/9b433bd2/?iid=018-445&v=presentation
    Explore at:
    Dataset updated
    Nov 13, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘IMDB Movies Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harshitshankhdhar/imdb-dataset-of-top-1000-movies-and-tv-shows on 13 November 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    IMDB Dataset of top 1000 movies and tv shows. You can find the EDA Process on - https://www.kaggle.com/harshitshankhdhar/eda-on-imdb-movies-dataset

    Please consider UPVOTE if you found it useful.

    Content

    Data:- - Poster_Link - Link of the poster that imdb using - Series_Title = Name of the movie - Released_Year - Year at which that movie released - Certificate - Certificate earned by that movie - Runtime - Total runtime of the movie - Genre - Genre of the movie - IMDB_Rating - Rating of the movie at IMDB site - Overview - mini story/ summary - Meta_score - Score earned by the movie - Director - Name of the Director - Star1,Star2,Star3,Star4 - Name of the Stars - No_of_votes - Total number of votes - Gross - Money earned by that movie

    Inspiration

    • Analysis of the gross of a movie vs directors.
    • Analysis of the gross of a movie vs different - different stars.
    • Analysis of the No_of_votes of a movie vs directors.
    • Analysis of the No_of_votes of a movie vs different - different stars.
    • Which actor prefer which Genre more?
    • Which combination of actors are getting good IMDB_Rating maximum time?
    • Which combination of actors are getting good gross?

    --- Original source retains full ownership of the source dataset ---

  16. Sentiment analysis in Galaxy with IMDB movie review dataset

    • zenodo.org
    • data.niaid.nih.gov
    tsv
    Updated Aug 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaivan Kamali; Kaivan Kamali (2022). Sentiment analysis in Galaxy with IMDB movie review dataset [Dataset]. http://doi.org/10.5281/zenodo.4477881
    Explore at:
    tsvAvailable download formats
    Dataset updated
    Aug 4, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kaivan Kamali; Kaivan Kamali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IMDB movie review sentiment classification dataset (Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011)). For more information please refer to: https://ai.stanford.edu/~amaas/data/sentiment/

    The IMDB dataset was modified as follows to prepare it for use in a Galaxy Training Tutorial (https://training.galaxyproject.org/):

    The top 50 words are excluded (mostly stop words). Included the next 10,000 top words. Reviews are limited to 500 words max (Longer reviews trimmed and shorter reviews are padded). 25,000 reviews are used for training and testing each. Files are in tsv (tab separated value) format to be consumed by Galaxy (www.usegalaxy.org).

  17. h

    Data from: imdb

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark, imdb [Dataset]. https://huggingface.co/datasets/mteb/imdb
    Explore at:
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    ImdbClassification An MTEB dataset Massive Text Embedding Benchmark

    Large Movie Review Dataset

    Task category t2c

    Domains Reviews, Written

    Reference http://www.aclweb.org/anthology/P11-1015

      How to evaluate on this task
    

    You can evaluate an embedding model on this dataset using the following code: import mteb

    task = mteb.get_tasks(["ImdbClassification"]) evaluator = mteb.MTEB(task)

    model = mteb.get_model(YOUR_MODEL) evaluator.run(model)

    To learn more… See the full description on the dataset page: https://huggingface.co/datasets/mteb/imdb.

  18. t

    MovieLens-IMDB - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). MovieLens-IMDB - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/movielens-imdb
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The MovieLens-IMDB dataset is a collection of user ratings for movies, with each rating indicating the user's preference for the movie.

  19. b

    IMDb Movie Reviews Dataset

    • berd-platform.de
    bin
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew L. Maas; Raymond E. Daly; Peter T. Pham; Dan Huang; Andrew Y. Ng; Christopher Potts; Andrew L. Maas; Raymond E. Daly; Peter T. Pham; Dan Huang; Andrew Y. Ng; Christopher Potts (2025). IMDb Movie Reviews Dataset [Dataset]. http://doi.org/10.82939/z8gxk-w3567
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Stanford University
    Authors
    Andrew L. Maas; Raymond E. Daly; Peter T. Pham; Dan Huang; Andrew Y. Ng; Christopher Potts; Andrew L. Maas; Raymond E. Daly; Peter T. Pham; Dan Huang; Andrew Y. Ng; Christopher Potts
    License

    https://ai.stanford.edu/~amaas/data/sentimenthttps://ai.stanford.edu/~amaas/data/sentiment

    Description

    The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The providers also include an additional 50,000 unlabeled documents for unsupervised learning.

    The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset also contains an additional 50,000 unlabeled documents for unsupervised learning. See the README file contained in the release for more details.

    The data is split into a train (25k reviews) and test (25k reviews) set. A preview file cannot be provided - please download the data directly from the data provider's website.

    When using the dataset, please cite: Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).

  20. h

    Data from: imdb

    • huggingface.co
    Updated May 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    testtest (2025). imdb [Dataset]. https://huggingface.co/datasets/test3534/imdb
    Explore at:
    Dataset updated
    May 4, 2025
    Dataset authored and provided by
    testtest
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    test3534/imdb dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Stanford NLP (2003). imdb [Dataset]. https://huggingface.co/datasets/stanfordnlp/imdb

Data from: imdb

IMDB

stanfordnlp/imdb

Related Article
Explore at:
24 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 3, 2003
Dataset authored and provided by
Stanford NLP
License

https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

Description

Dataset Card for "imdb"

  Dataset Summary

Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

  Supported Tasks and Leaderboards

More Information Needed

  Languages

More Information Needed

  Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/imdb.
Search
Clear search
Close search
Google apps
Main menu