11 datasets found
  1. T

    imdb_reviews

    • tensorflow.org
    • lejournall24.net
    • +1more
    Updated Sep 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). imdb_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/imdb_reviews
    Explore at:
    Dataset updated
    Sep 20, 2024
    Description

    Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('imdb_reviews', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  2. IMDB Large Movie Reviews Sentiment Dataset

    • kaggle.com
    zip
    Updated Nov 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Christian Blaise Cruz (2019). IMDB Large Movie Reviews Sentiment Dataset [Dataset]. https://www.kaggle.com/jcblaise/imdb-sentiments
    Explore at:
    zip(38677807 bytes)Available download formats
    Dataset updated
    Nov 18, 2019
    Authors
    Jan Christian Blaise Cruz
    Description

    IMDB Movie Reviews Sentiment Dataset

    This dataset contains CSV versions of the Large Movie Review dataset by Maas, et al. (2011) from its original Stanford AI Repository. It contains 50k highly polar movie reviews, evenly split to 25k positives and 25k negatives. Each sample is labeled with a 0 (positive) or 1 (negative). The additional ~11k unlabeled review data has also been included in CSV format for your convenience.

    Citations

    Works using this dataset must use the appropriate citations via this bibtex entry:

    @InProceedings{maas-EtAl:2011:ACL-HLT2011,
     author  = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},
     title   = {Learning Word Vectors for Sentiment Analysis},
     booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
     month   = {June},
     year   = {2011},
     address  = {Portland, Oregon, USA},
     publisher = {Association for Computational Linguistics},
     pages   = {142--150},
     url    = {http://www.aclweb.org/anthology/P11-1015}
    }
    
  3. a

    IMDb Large Movie Review Dataset

    • academictorrents.com
    bittorrent
    Updated Nov 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew L. Maas et al., 2011 (2018). IMDb Large Movie Review Dataset [Dataset]. https://academictorrents.com/details/fd24bc44d461b10288469e05a64a8344eb079f15
    Explore at:
    bittorrentAvailable download formats
    Dataset updated
    Nov 19, 2018
    Dataset authored and provided by
    Andrew L. Maas et al., 2011
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    A dataset for binary sentiment classification containing 25,000 highly polarized movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

  4. IMDB 50K Movie Reviews (TEST your BERT)

    • airtestest.uk
    zip
    Updated Dec 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atul Anand {Jha} (2019). IMDB 50K Movie Reviews (TEST your BERT) [Dataset]. https://www.airtestest.uk/datasets/atulanandjha/imdb-50k-movie-reviews-test-your-bert
    Explore at:
    zip(26933554 bytes)Available download formats
    Dataset updated
    Dec 18, 2019
    Authors
    Atul Anand {Jha}
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    Context

    Large Movie Review Dataset v1.0 . 😃

    https://static.amazon.jobs/teams/53/images/IMDb_Header_Page.jpg?1501027252" alt="IMDB wall">

    This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. Provided a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided.

    In the entire collection, no more than 30 reviews are allowed for any given movie because reviews for the same movie tend to have correlated ratings. Further, the train and test sets contain a disjoint set of movies, so no significant performance is obtained by memorising movie-unique terms and their associated with observed labels. In the labelled train/test sets, a negative review has a score <= 4 out of 10, and a positive review has a score >= 7 out of 10. Thus reviews with more neutral ratings are not included in the train/test sets. In the unsupervised set, reviews of any rating are included and there are an even number of reviews > 5 and <= 5.

    Reference: http://ai.stanford.edu/~amaas/data/sentiment/

    NOTE

    A starter kernel is here : https://www.kaggle.com/atulanandjha/bert-testing-on-imdb-dataset-starter-kernel

    A kernel to expose Dataset collection :

    Content

    Now let’s understand the task in hand: given a movie review, predict whether it’s positive or negative.

    The dataset we use is 50,000 IMDB reviews (25K for train and 25K for test) from the PyTorch-NLP library.

    Each review is tagged pos or neg .

    There are 50% positive reviews and 50% negative reviews both in train and test sets.

    Columns:

    text : Reviews from people.

    Sentiment : Negative or Positive tag on the review/feedback (Boolean).

    Acknowledgements

    When using this Dataset Please Cite this ACL paper using :

    @InProceedings{

    maas-EtAl:2011:ACL-HLT2011,

    author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},

    title = {Learning Word Vectors for Sentiment Analysis},

    booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},

    month = {June},

    year = {2011},

    address = {Portland, Oregon, USA},

    publisher = {Association for Computational Linguistics},

    pages = {142--150},

    url = {http://www.aclweb.org/anthology/P11-1015}

    }

    Link to ref Dataset: https://pytorchnlp.readthedocs.io/en/latest/_modules/torchnlp/datasets/imdb.html

    https://www.samyzaf.com/ML/imdb/imdb.html

    Inspiration

    BERT and other Transformer Architecture models have always been on hype recently due to a great breakthrough by introducing Transfer Learning in NLP. So, Let's use this simple yet efficient Data-set to Test these models, and also compare our results with theirs. Also, I invite fellow researchers to try out their State of the Art Algorithms on this data-set.

  5. h

    IMDb_movie_reviews

    • huggingface.co
    • opendatalab.com
    Updated Aug 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jarrad Jinx (2023). IMDb_movie_reviews [Dataset]. https://huggingface.co/datasets/jahjinx/IMDb_movie_reviews
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 14, 2023
    Authors
    Jarrad Jinx
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Card for IMDb Movie Reviews

      Dataset Summary
    

    This is a custom train/test/validation split of the IMDb Large Movie Review Dataset available from http://ai.stanford.edu/~amaas/data/sentiment/.

      Supported Tasks and Leaderboards
    

    [More Information Needed]

      Languages
    

    [More Information Needed]

      Dataset Structure
    
    
    
    
    
    
    
      IMDb_movie_reviews
    

    An example of 'train': { "text": "Beautifully photographed and ably acted… See the full description on the dataset page: https://huggingface.co/datasets/jahjinx/IMDb_movie_reviews.

  6. h

    imdb-movie-reviews

    • huggingface.co
    Updated Aug 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ajay Karthick Senthil Kumar (2024). imdb-movie-reviews [Dataset]. https://huggingface.co/datasets/ajaykarthick/imdb-movie-reviews
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 23, 2024
    Authors
    Ajay Karthick Senthil Kumar
    Description

    IMDB Movie Reviews

    This is a dataset for binary sentiment classification containing substantially huge data. This dataset contains a set of 50,000 highly polar movie reviews for training models for text classification tasks. The dataset is downloaded from https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz This data is processed and splitted into training and test datasets (0.2% test split). Training dataset contains 40000 reviews and test dataset contains 10000… See the full description on the dataset page: https://huggingface.co/datasets/ajaykarthick/imdb-movie-reviews.

  7. h

    imdb-javanese

    • huggingface.co
    Updated Feb 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wilson Wongso (2022). imdb-javanese [Dataset]. https://huggingface.co/datasets/w11wo/imdb-javanese
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2022
    Authors
    Wilson Wongso
    License

    https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/

    Description

    Large Movie Review Dataset translated to Javanese. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. We translated the original IMDB Dataset to Javanese using the multi-lingual MarianMT Transformer model from Helsinki-NLP/opus-mt-en-mul.

  8. h

    imdb_3000_sphere

    • huggingface.co
    Updated Apr 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eno Reyes (2023). imdb_3000_sphere [Dataset]. https://huggingface.co/datasets/enoreyes/imdb_3000_sphere
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 24, 2023
    Authors
    Eno Reyes
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for IMDB 3000 Sphere

    Homepage: http://ai.stanford.edu/~amaas/data/sentiment/

      Dataset Summary
    

    Large Movie Review Dataset. This is a 3000 item selection from the imdb dataset for binary sentiment classification for use in the Sphere course on AutoTrain.

      Dataset Structure
    

    An example of 'train' looks as follows. { "label": 0, "text": "Goodbye world2 " }

  9. h

    imdb_dutch

    • huggingface.co
    Updated Mar 8, 2003
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yeb Havinga (2003). imdb_dutch [Dataset]. https://huggingface.co/datasets/yhavinga/imdb_dutch
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 8, 2003
    Authors
    Yeb Havinga
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Large Movie Review Dataset translated to Dutch.

    This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 24,992 highly polar movie reviews for training, and 24,992 for testing. There is additional unlabeled data for use as well.\

  10. h

    imdb_reviews_with_labels

    • huggingface.co
    Updated Apr 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mouwiya S. A. Al-Qaisieh (2024). imdb_reviews_with_labels [Dataset]. https://huggingface.co/datasets/Mouwiya/imdb_reviews_with_labels
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2024
    Authors
    Mouwiya S. A. Al-Qaisieh
    License

    https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/

    Description

    Dataset Description

    In this Task , we conducted one-shot sentiment analysis on a subset of the IMDb movie reviews dataset using multiple language models. The goal was to predict the sentiment (positive or negative) of movie reviews without fine-tuning the models on the specific task. We utilized three different pre-trained language models for zero-shot classification: BART-large, DistilBERT-base, and RoBERTa-base. For each model, we generated predicted sentiment labels for a… See the full description on the dataset page: https://huggingface.co/datasets/Mouwiya/imdb_reviews_with_labels.

  11. h

    HebrewMetaphors

    • huggingface.co
    Updated Oct 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technion Data and Knowledge Lab (2023). HebrewMetaphors [Dataset]. https://huggingface.co/datasets/tdklab/HebrewMetaphors
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 3, 2023
    Dataset authored and provided by
    Technion Data and Knowledge Lab
    Description

    Dataset Card for "HebrewMetaphors"

      Dataset Summary
    

    A common dataset for text classification task is IMDb. Large Movie Review Dataset. This is a dataset for binary sentiment classification. The first step in our project was to create a Hebrew dataset with an IMDB-like structure but different in that, in addition to the sentences we have, there will also be verb names, and a classification of whether the verb name is literal or metaphorical in the given sentence.… See the full description on the dataset page: https://huggingface.co/datasets/tdklab/HebrewMetaphors.

  12. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). imdb_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/imdb_reviews

imdb_reviews

Explore at:
28 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Sep 20, 2024
Description

Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('imdb_reviews', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

Search
Clear search
Close search
Google apps
Main menu