5 datasets found

h
Data from: imdb
huggingface.co
Updated Aug 3, 2003
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford NLP (2003). imdb [Dataset]. https://huggingface.co/datasets/stanfordnlp/imdb
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 3, 2003
Dataset authored and provided by
Stanford NLP
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Dataset Card for "imdb"

Dataset Summary

Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

Supported Tasks and Leaderboards

More Information Needed

Languages

More Information Needed

Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/imdb.
i
IMDb Movie Reviews Dataset
ieee-dataport.org
Updated Aug 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aditya Pal (2022). IMDb Movie Reviews Dataset [Dataset]. https://ieee-dataport.org/open-access/imdb-movie-reviews-dataset
Explore at:
Dataset updated
Aug 2, 2022
Authors
Aditya Pal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
R
o
IMDB Movie Reviews (Binary Sentiment)
opendatabay.com
.csv
Updated Jun 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). IMDB Movie Reviews (Binary Sentiment) [Dataset]. https://www.opendatabay.com/data/ai-ml/c48f7110-3d06-45be-9cae-aa8799720eec
Explore at:
.csvAvailable download formats
Dataset updated
Jun 18, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Entertainment & Media Consumption
Description
Source Huggingface Hub: link

About this dataset This is a large dataset for binary sentiment classification containing a substantial amount of data compared to previous benchmark datasets. Provided are 25,000 highly polar movie reviews for training and 25,000 for testing. There is also additional unlabeled data available for use. The data fields are consistent among all splits of the dataset

How to use the dataset In order to use this dataset, you will need to first download the IMDB Large Movie Review Dataset. Once you have downloaded the dataset, you can either use it in its original form or split it into training and testing sets. To split the dataset, you will need to create a new file called unsupervised.csv and copy the text column from train.csv into it. You can then split unsupervised.csv into two files: train_unsupervised.csv and test_unsupervised.csv.

Once you have either the original dataset or the training and testing sets, you can begin using them for binary sentiment classification. In order to do this, you will need to use a machine learning algorithm that is capable of performing binary classification, such as logistic regression or support vector machines. Once you have trained your model on the training set, you can then evaluate its performance on the test set by predicting the labels of the reviews in test_unsupervised.csv

Research Ideas This dataset can be used to train a binary sentiment classification model. This dataset can be used to train a model to classify movie reviews into positive and negative sentiment categories. This dataset can be used to build a large movie review database for research purposes

License

CC0

Original Data Source: IMDB Movie Reviews (Binary Sentiment)
P
IMDB-MULTI Dataset
paperswithcode.com
opendatalab.com
Updated Sep 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pinar Yanardag; S. V. N. Vishwanathan (2021). IMDB-MULTI Dataset [Dataset]. https://paperswithcode.com/dataset/imdb-multi
Explore at:
Dataset updated
Sep 1, 2021
Authors
Pinar Yanardag; S. V. N. Vishwanathan
Description
IMDB-MULTI is a relational dataset that consists of a network of 1000 actors or actresses who played roles in movies in IMDB. A node represents an actor or actress, and an edge connects two nodes when they appear in the same movie. In IMDB-MULTI, the edges are collected from three different genres: Comedy, Romance and Sci-Fi.
How I Met Your Mother Episodes Data
kaggle.com
Updated Jan 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bill Cruise (2022). How I Met Your Mother Episodes Data [Dataset]. https://www.kaggle.com/datasets/bcruise/how-i-met-your-mother-episodes-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 2, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bill Cruise
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Series created by: Carter Bays and Craig Thomas Number of seasons: 9 Number of episodes: 208 Original air dates: September 19, 2005 – March 31, 2014

Content

Data was acquired through downloading IMDb TV episodes datasets and scraping information from Wikipedia.

Acknowledgements

Thanks to IMDb, Wikipedia, and community curators.

Use

It should be easy to join these data files together on Title and Air Date fields to compare (for example) US viewers and IMDb ratings.

Motivation

I wanted to share a dataset about How I Met Your Mother, one of my favorite TV shows to binge watch.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Stanford NLP (2003). imdb [Dataset]. https://huggingface.co/datasets/stanfordnlp/imdb

Data from: imdb

IMDB

stanfordnlp/imdb

Explore at:

21 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 3, 2003

Dataset authored and provided by

Stanford NLP

License

https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

Description

Dataset Card for "imdb"

  Dataset Summary

Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

  Supported Tasks and Leaderboards

More Information Needed

  Languages

More Information Needed

  Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/imdb.

Clear search

Close search

Google apps

Main menu

Data from: imdb

IMDb Movie Reviews Dataset

IMDB Movie Reviews (Binary Sentiment)

License

IMDB-MULTI Dataset

How I Met Your Mother Episodes Data

Context

Content

Acknowledgements

Use

Motivation

Data from: imdbSee More Versions

IMDB

stanfordnlp/imdb

Data from: imdb