11 datasets found

T
imdb_reviews
tensorflow.org
lejournall24.net
+1more
Updated Sep 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). imdb_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/imdb_reviews
Explore at:
Dataset updated
Sep 20, 2024
Description
Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('imdb_reviews', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
IMDB Large Movie Reviews Sentiment Dataset
kaggle.com
zip
Updated Nov 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan Christian Blaise Cruz (2019). IMDB Large Movie Reviews Sentiment Dataset [Dataset]. https://www.kaggle.com/jcblaise/imdb-sentiments
Explore at:
zip(38677807 bytes)Available download formats
Dataset updated
Nov 18, 2019
Authors
Jan Christian Blaise Cruz
Description
IMDB Movie Reviews Sentiment Dataset

This dataset contains CSV versions of the Large Movie Review dataset by Maas, et al. (2011) from its original Stanford AI Repository. It contains 50k highly polar movie reviews, evenly split to 25k positives and 25k negatives. Each sample is labeled with a 0 (positive) or 1 (negative). The additional ~11k unlabeled review data has also been included in CSV format for your convenience.

Citations

Works using this dataset must use the appropriate citations via this bibtex entry:

@InProceedings{maas-EtAl:2011:ACL-HLT2011, author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher}, title = {Learning Word Vectors for Sentiment Analysis}, booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies}, month = {June}, year = {2011}, address = {Portland, Oregon, USA}, publisher = {Association for Computational Linguistics}, pages = {142--150}, url = {http://www.aclweb.org/anthology/P11-1015} }
a
IMDb Large Movie Review Dataset
academictorrents.com
bittorrent
Updated Nov 19, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew L. Maas et al., 2011 (2018). IMDb Large Movie Review Dataset [Dataset]. https://academictorrents.com/details/fd24bc44d461b10288469e05a64a8344eb079f15
Explore at:
bittorrentAvailable download formats
Dataset updated
Nov 19, 2018
Dataset authored and provided by
Andrew L. Maas et al., 2011
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
A dataset for binary sentiment classification containing 25,000 highly polarized movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
IMDB 50K Movie Reviews (TEST your BERT)
airtestest.uk
zip
Updated Dec 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Atul Anand {Jha} (2019). IMDB 50K Movie Reviews (TEST your BERT) [Dataset]. https://www.airtestest.uk/datasets/atulanandjha/imdb-50k-movie-reviews-test-your-bert
Explore at:
zip(26933554 bytes)Available download formats
Dataset updated
Dec 18, 2019
Authors
Atul Anand {Jha}
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Description
Context

Large Movie Review Dataset v1.0 . 😃

https://static.amazon.jobs/teams/53/images/IMDb_Header_Page.jpg?1501027252" alt="IMDB wall">

This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. Provided a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided.

In the entire collection, no more than 30 reviews are allowed for any given movie because reviews for the same movie tend to have correlated ratings. Further, the train and test sets contain a disjoint set of movies, so no significant performance is obtained by memorising movie-unique terms and their associated with observed labels. In the labelled train/test sets, a negative review has a score <= 4 out of 10, and a positive review has a score >= 7 out of 10. Thus reviews with more neutral ratings are not included in the train/test sets. In the unsupervised set, reviews of any rating are included and there are an even number of reviews > 5 and <= 5.

Reference: http://ai.stanford.edu/~amaas/data/sentiment/

NOTE

A starter kernel is here : https://www.kaggle.com/atulanandjha/bert-testing-on-imdb-dataset-starter-kernel

A kernel to expose Dataset collection :

Content

Now let’s understand the task in hand: given a movie review, predict whether it’s positive or negative.

The dataset we use is 50,000 IMDB reviews (25K for train and 25K for test) from the PyTorch-NLP library.

Each review is tagged pos or neg .

There are 50% positive reviews and 50% negative reviews both in train and test sets.

Columns:

text : Reviews from people.

Sentiment : Negative or Positive tag on the review/feedback (Boolean).

Acknowledgements

When using this Dataset Please Cite this ACL paper using :

@InProceedings{

maas-EtAl:2011:ACL-HLT2011,

author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},

title = {Learning Word Vectors for Sentiment Analysis},

booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},

month = {June},

year = {2011},

address = {Portland, Oregon, USA},

publisher = {Association for Computational Linguistics},

pages = {142--150},

url = {http://www.aclweb.org/anthology/P11-1015}

}

Link to ref Dataset: https://pytorchnlp.readthedocs.io/en/latest/_modules/torchnlp/datasets/imdb.html

https://www.samyzaf.com/ML/imdb/imdb.html

Inspiration

BERT and other Transformer Architecture models have always been on hype recently due to a great breakthrough by introducing Transfer Learning in NLP. So, Let's use this simple yet efficient Data-set to Test these models, and also compare our results with theirs. Also, I invite fellow researchers to try out their State of the Art Algorithms on this data-set.
h
IMDb_movie_reviews
huggingface.co
opendatalab.com
Updated Aug 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jarrad Jinx (2023). IMDb_movie_reviews [Dataset]. https://huggingface.co/datasets/jahjinx/IMDb_movie_reviews
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 14, 2023
Authors
Jarrad Jinx
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Dataset Card for IMDb Movie Reviews

Dataset Summary

This is a custom train/test/validation split of the IMDb Large Movie Review Dataset available from http://ai.stanford.edu/~amaas/data/sentiment/.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information Needed]

Dataset Structure IMDb_movie_reviews

An example of 'train': { "text": "Beautifully photographed and ably acted… See the full description on the dataset page: https://huggingface.co/datasets/jahjinx/IMDb_movie_reviews.
h
imdb-movie-reviews
huggingface.co
Updated Aug 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ajay Karthick Senthil Kumar (2024). imdb-movie-reviews [Dataset]. https://huggingface.co/datasets/ajaykarthick/imdb-movie-reviews
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 23, 2024
Authors
Ajay Karthick Senthil Kumar
Description
IMDB Movie Reviews

This is a dataset for binary sentiment classification containing substantially huge data. This dataset contains a set of 50,000 highly polar movie reviews for training models for text classification tasks. The dataset is downloaded from https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz This data is processed and splitted into training and test datasets (0.2% test split). Training dataset contains 40000 reviews and test dataset contains 10000… See the full description on the dataset page: https://huggingface.co/datasets/ajaykarthick/imdb-movie-reviews.
h
imdb-javanese
huggingface.co
Updated Feb 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wilson Wongso (2022). imdb-javanese [Dataset]. https://huggingface.co/datasets/w11wo/imdb-javanese
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2022
Authors
Wilson Wongso
License
https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/
Description
Large Movie Review Dataset translated to Javanese. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. We translated the original IMDB Dataset to Javanese using the multi-lingual MarianMT Transformer model from Helsinki-NLP/opus-mt-en-mul.
h
imdb_3000_sphere
huggingface.co
Updated Apr 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eno Reyes (2023). imdb_3000_sphere [Dataset]. https://huggingface.co/datasets/enoreyes/imdb_3000_sphere
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 24, 2023
Authors
Eno Reyes
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for IMDB 3000 Sphere

Homepage: http://ai.stanford.edu/~amaas/data/sentiment/

Dataset Summary

Large Movie Review Dataset. This is a 3000 item selection from the imdb dataset for binary sentiment classification for use in the Sphere course on AutoTrain.

Dataset Structure

An example of 'train' looks as follows. { "label": 0, "text": "Goodbye world2 " }
h
imdb_dutch
huggingface.co
Updated Mar 8, 2003
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yeb Havinga (2003). imdb_dutch [Dataset]. https://huggingface.co/datasets/yhavinga/imdb_dutch
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 8, 2003
Authors
Yeb Havinga
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Large Movie Review Dataset translated to Dutch.

This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 24,992 highly polar movie reviews for training, and 24,992 for testing. There is additional unlabeled data for use as well.\
h
imdb_reviews_with_labels
huggingface.co
Updated Apr 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mouwiya S. A. Al-Qaisieh (2024). imdb_reviews_with_labels [Dataset]. https://huggingface.co/datasets/Mouwiya/imdb_reviews_with_labels
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 21, 2024
Authors
Mouwiya S. A. Al-Qaisieh
License
https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/
Description
Dataset Description

In this Task , we conducted one-shot sentiment analysis on a subset of the IMDb movie reviews dataset using multiple language models. The goal was to predict the sentiment (positive or negative) of movie reviews without fine-tuning the models on the specific task. We utilized three different pre-trained language models for zero-shot classification: BART-large, DistilBERT-base, and RoBERTa-base. For each model, we generated predicted sentiment labels for a… See the full description on the dataset page: https://huggingface.co/datasets/Mouwiya/imdb_reviews_with_labels.
h
HebrewMetaphors
huggingface.co
Updated Oct 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technion Data and Knowledge Lab (2023). HebrewMetaphors [Dataset]. https://huggingface.co/datasets/tdklab/HebrewMetaphors
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 3, 2023
Dataset authored and provided by
Technion Data and Knowledge Lab
Description
Dataset Card for "HebrewMetaphors"

Dataset Summary

A common dataset for text classification task is IMDb. Large Movie Review Dataset. This is a dataset for binary sentiment classification. The first step in our project was to create a Hebrew dataset with an IMDB-like structure but different in that, in addition to the sentences we have, there will also be verb names, and a classification of whether the verb name is literal or metaphorical in the given sentence.… See the full description on the dataset page: https://huggingface.co/datasets/tdklab/HebrewMetaphors.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2024). imdb_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/imdb_reviews

imdb_reviews

Explore at:

28 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Sep 20, 2024

Description

Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('imdb_reviews', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

Clear search

Close search

Google apps

Main menu

imdb_reviews

IMDB Large Movie Reviews Sentiment Dataset

IMDB Movie Reviews Sentiment Dataset

Citations

IMDb Large Movie Review Dataset

IMDB 50K Movie Reviews (TEST your BERT)

Context

Content

Columns:

Acknowledgements

Inspiration

IMDb_movie_reviews

imdb-movie-reviews

imdb-javanese

imdb_3000_sphere

imdb_dutch

imdb_reviews_with_labels

HebrewMetaphors

imdb_reviewsSee More Versions

imdb_reviews