2 datasets found
  1. o

    IMDB Movie Reviews (Binary Sentiment)

    • opendatabay.com
    .csv
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). IMDB Movie Reviews (Binary Sentiment) [Dataset]. https://www.opendatabay.com/data/ai-ml/c48f7110-3d06-45be-9cae-aa8799720eec
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jun 18, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Entertainment & Media Consumption
    Description

    Source Huggingface Hub: link

    About this dataset This is a large dataset for binary sentiment classification containing a substantial amount of data compared to previous benchmark datasets. Provided are 25,000 highly polar movie reviews for training and 25,000 for testing. There is also additional unlabeled data available for use. The data fields are consistent among all splits of the dataset

    How to use the dataset In order to use this dataset, you will need to first download the IMDB Large Movie Review Dataset. Once you have downloaded the dataset, you can either use it in its original form or split it into training and testing sets. To split the dataset, you will need to create a new file called unsupervised.csv and copy the text column from train.csv into it. You can then split unsupervised.csv into two files: train_unsupervised.csv and test_unsupervised.csv.

    Once you have either the original dataset or the training and testing sets, you can begin using them for binary sentiment classification. In order to do this, you will need to use a machine learning algorithm that is capable of performing binary classification, such as logistic regression or support vector machines. Once you have trained your model on the training set, you can then evaluate its performance on the test set by predicting the labels of the reviews in test_unsupervised.csv

    Research Ideas This dataset can be used to train a binary sentiment classification model. This dataset can be used to train a model to classify movie reviews into positive and negative sentiment categories. This dataset can be used to build a large movie review database for research purposes

    License

    CC0

    Original Data Source: IMDB Movie Reviews (Binary Sentiment)

  2. O

    IMDb Movie Reviews

    • opendatalab.com
    • huggingface.co
    zip
    Updated Aug 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford University (2022). IMDb Movie Reviews [Dataset]. https://opendatalab.com/OpenDataLab/IMDb_Movie_Reviews
    Explore at:
    zip(221218527 bytes)Available download formats
    Dataset updated
    Aug 25, 2022
    Dataset provided by
    Stanford University
    Description

    This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Datasimple (2025). IMDB Movie Reviews (Binary Sentiment) [Dataset]. https://www.opendatabay.com/data/ai-ml/c48f7110-3d06-45be-9cae-aa8799720eec

IMDB Movie Reviews (Binary Sentiment)

Explore at:
.csvAvailable download formats
Dataset updated
Jun 18, 2025
Dataset authored and provided by
Datasimple
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered
Entertainment & Media Consumption
Description

Source Huggingface Hub: link

About this dataset This is a large dataset for binary sentiment classification containing a substantial amount of data compared to previous benchmark datasets. Provided are 25,000 highly polar movie reviews for training and 25,000 for testing. There is also additional unlabeled data available for use. The data fields are consistent among all splits of the dataset

How to use the dataset In order to use this dataset, you will need to first download the IMDB Large Movie Review Dataset. Once you have downloaded the dataset, you can either use it in its original form or split it into training and testing sets. To split the dataset, you will need to create a new file called unsupervised.csv and copy the text column from train.csv into it. You can then split unsupervised.csv into two files: train_unsupervised.csv and test_unsupervised.csv.

Once you have either the original dataset or the training and testing sets, you can begin using them for binary sentiment classification. In order to do this, you will need to use a machine learning algorithm that is capable of performing binary classification, such as logistic regression or support vector machines. Once you have trained your model on the training set, you can then evaluate its performance on the test set by predicting the labels of the reviews in test_unsupervised.csv

Research Ideas This dataset can be used to train a binary sentiment classification model. This dataset can be used to train a model to classify movie reviews into positive and negative sentiment categories. This dataset can be used to build a large movie review database for research purposes

License

CC0

Original Data Source: IMDB Movie Reviews (Binary Sentiment)

Search
Clear search
Close search
Google apps
Main menu