Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imdb_reviews', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
This dataset contains CSV versions of the Large Movie Review dataset by Maas, et al. (2011) from its original Stanford AI Repository. It contains 50k highly polar movie reviews, evenly split to 25k positives and 25k negatives. Each sample is labeled with a 0 (positive) or 1 (negative). The additional ~11k unlabeled review data has also been included in CSV format for your convenience.
Works using this dataset must use the appropriate citations via this bibtex entry:
@InProceedings{maas-EtAl:2011:ACL-HLT2011,
author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},
title = {Learning Word Vectors for Sentiment Analysis},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
year = {2011},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {142--150},
url = {http://www.aclweb.org/anthology/P11-1015}
}
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
A dataset for binary sentiment classification containing 25,000 highly polarized movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Large Movie Review Dataset v1.0
. 😃
https://static.amazon.jobs/teams/53/images/IMDb_Header_Page.jpg?1501027252" alt="IMDB wall">
This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. Provided a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided.
In the entire collection, no more than 30 reviews are allowed for any given movie because reviews for the same movie tend to have correlated ratings. Further, the train and test sets contain a disjoint set of movies, so no significant performance is obtained by memorising movie-unique terms and their associated with observed labels. In the labelled train/test sets, a negative
review has a score <= 4 out of 10, and a positive
review has a score >= 7 out of 10. Thus reviews with more neutral ratings are not included in the train/test sets. In the unsupervised set, reviews of any rating are included and there are an even number of reviews > 5 and <= 5.
Reference:
http://ai.stanford.edu/~amaas/data/sentiment/
NOTE
A starter kernel is here :
https://www.kaggle.com/atulanandjha/bert-testing-on-imdb-dataset-starter-kernel
A kernel to expose Dataset collection :
Now let’s understand the task in hand: given a movie review, predict whether it’s positive
or negative
.
The dataset we use is 50,000 IMDB reviews (25K for train and 25K for test) from the PyTorch-NLP library.
Each review is tagged pos or neg .
There are 50% positive reviews and 50% negative reviews both in train and test sets.
text :
Reviews from people.
Sentiment :
Negative or Positive tag on the review/feedback (Boolean).
When using this Dataset Please Cite
this ACL paper using :
@InProceedings{
maas-EtAl:2011:ACL-HLT2011,
author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},
title = {Learning Word Vectors for Sentiment Analysis},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
year = {2011},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {142--150},
}
Link to ref Dataset: https://pytorchnlp.readthedocs.io/en/latest/_modules/torchnlp/datasets/imdb.html
https://www.samyzaf.com/ML/imdb/imdb.html
BERT and other Transformer Architecture models have always been on hype recently due to a great breakthrough by introducing Transfer Learning in NLP. So, Let's use this simple yet efficient Data-set to Test these models, and also compare our results with theirs. Also, I invite fellow researchers to try out their State of the Art Algorithms on this data-set.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for IMDb Movie Reviews
Dataset Summary
This is a custom train/test/validation split of the IMDb Large Movie Review Dataset available from http://ai.stanford.edu/~amaas/data/sentiment/.
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]
Dataset Structure
IMDb_movie_reviews
An example of 'train': { "text": "Beautifully photographed and ably acted… See the full description on the dataset page: https://huggingface.co/datasets/jahjinx/IMDb_movie_reviews.
IMDB Movie Reviews
This is a dataset for binary sentiment classification containing substantially huge data. This dataset contains a set of 50,000 highly polar movie reviews for training models for text classification tasks. The dataset is downloaded from https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz This data is processed and splitted into training and test datasets (0.2% test split). Training dataset contains 40000 reviews and test dataset contains 10000… See the full description on the dataset page: https://huggingface.co/datasets/ajaykarthick/imdb-movie-reviews.
https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/
Large Movie Review Dataset translated to Javanese.
This is a dataset for binary sentiment classification containing substantially
more data than previous benchmark datasets. We provide a set of 25,000 highly
polar movie reviews for training, and 25,000 for testing. There is additional
unlabeled data for use as well. We translated the original IMDB Dataset to
Javanese using the multi-lingual MarianMT Transformer model from
Helsinki-NLP/opus-mt-en-mul
.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for IMDB 3000 Sphere
Homepage: http://ai.stanford.edu/~amaas/data/sentiment/
Dataset Summary
Large Movie Review Dataset. This is a 3000 item selection from the imdb dataset for binary sentiment classification for use in the Sphere course on AutoTrain.
Dataset Structure
An example of 'train' looks as follows. { "label": 0, "text": "Goodbye world2 " }
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Large Movie Review Dataset translated to Dutch.
This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 24,992 highly polar movie reviews for training, and 24,992 for testing. There is additional unlabeled data for use as well.\
https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/
Dataset Description
In this Task , we conducted one-shot sentiment analysis on a subset of the IMDb movie reviews dataset using multiple language models. The goal was to predict the sentiment (positive or negative) of movie reviews without fine-tuning the models on the specific task. We utilized three different pre-trained language models for zero-shot classification: BART-large, DistilBERT-base, and RoBERTa-base. For each model, we generated predicted sentiment labels for a… See the full description on the dataset page: https://huggingface.co/datasets/Mouwiya/imdb_reviews_with_labels.
Dataset Card for "HebrewMetaphors"
Dataset Summary
A common dataset for text classification task is IMDb. Large Movie Review Dataset. This is a dataset for binary sentiment classification. The first step in our project was to create a Hebrew dataset with an IMDB-like structure but different in that, in addition to the sentences we have, there will also be verb names, and a classification of whether the verb name is literal or metaphorical in the given sentence.… See the full description on the dataset page: https://huggingface.co/datasets/tdklab/HebrewMetaphors.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imdb_reviews', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.