4 datasets found

T
movielens
tensorflow.org
opendatalab.com
+1more
Updated Jul 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). movielens [Dataset]. https://www.tensorflow.org/datasets/catalog/movielens
Explore at:
Dataset updated
Jul 8, 2020
Description
This dataset contains a set of movie ratings from the MovieLens website, a movie recommendation service. This dataset was collected and maintained by GroupLens, a research group at the University of Minnesota. There are 5 versions included: "25m", "latest-small", "100k", "1m", "20m". In all datasets, the movies data and ratings data are joined on "movieId". The 25m dataset, latest-small dataset, and 20m dataset contain only movie data and rating data. The 1m dataset and 100k dataset contain demographic data in addition to movie and rating data.

"25m": This is the latest stable version of the MovieLens dataset. It is recommended for research purposes.

"latest-small": This is a small subset of the latest version of the MovieLens dataset. It is changed and updated over time by GroupLens.

"100k": This is the oldest version of the MovieLens datasets. It is a small dataset with demographic data.

"1m": This is the largest MovieLens dataset that contains demographic data.

"20m": This is one of the most used MovieLens datasets in academic papers along with the 1m dataset.

For each version, users can view either only the movies data by adding the "-movies" suffix (e.g. "25m-movies") or the ratings data joined with the movies data (and users data in the 1m and 100k datasets) by adding the "-ratings" suffix (e.g. "25m-ratings").

The features below are included in all versions with the "-ratings" suffix.

"movie_id": a unique identifier of the rated movie

"movie_title": the title of the rated movie with the release year in parentheses

"movie_genres": a sequence of genres to which the rated movie belongs

"user_id": a unique identifier of the user who made the rating

"user_rating": the score of the rating on a five-star scale

"timestamp": the timestamp of the ratings, represented in seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970

The "100k-ratings" and "1m-ratings" versions in addition include the following demographic features.

"user_gender": gender of the user who made the rating; a true value corresponds to male

"bucketized_user_age": bucketized age values of the user who made the rating, the values and the corresponding ranges are:

1: "Under 18"

18: "18-24"

25: "25-34"

35: "35-44"

45: "45-49"

50: "50-55"

56: "56+"

"user_occupation_label": the occupation of the user who made the rating represented by an integer-encoded label; labels are preprocessed to be consistent across different versions

"user_occupation_text": the occupation of the user who made the rating in the original string; different versions can have different set of raw text labels

"user_zip_code": the zip code of the user who made the rating

In addition, the "100k-ratings" dataset would also have a feature "raw_user_age" which is the exact ages of the users who made the rating

Datasets with the "-movies" suffix contain only "movie_id", "movie_title", and "movie_genres" features.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('movielens', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
g
MovieLens 100K
grouplens.org
Updated Oct 12, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). MovieLens 100K [Dataset]. https://grouplens.org/datasets/movielens/100k/
Explore at:
Dataset updated
Oct 12, 2015
Description
Stable benchmark dataset. 100,000 ratings from 1000 users on 1700 movies. Released 4/1998.
A
‘Movie Lens Small Latest Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Sep 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Movie Lens Small Latest Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-movie-lens-small-latest-dataset-6de3/a98cfad2/?iid=001-258&v=presentation
Explore at:
Dataset updated
Sep 30, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Movie Lens Small Latest Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shubhammehta21/movie-lens-small-latest-dataset on 30 September 2021.

--- Dataset description provided by original source is as follows ---

Summary

This dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 100836 ratings and 3683 tag applications across 9742 movies. These data were created by 610 users between March 29, 1996 and September 24, 2018. This dataset was generated on September 26, 2018.

Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided.

The data are contained in the files links.csv, movies.csv, ratings.csv and tags.csv. More details about the contents and use of all these files follows.

This is a development dataset. As such, it may change over time and is not an appropriate dataset for shared research results. See available benchmark datasets if that is your intent.

This and other GroupLens data sets are publicly available for download at

--- Original source retains full ownership of the source dataset ---
MovieLens Dataset - 100K Ratings
kaggle.com
Updated Feb 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sriharsha (2025). MovieLens Dataset - 100K Ratings [Dataset]. https://www.kaggle.com/datasets/sriharshabsprasad/movielens-dataset-100k-ratings/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 28, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sriharsha
Description
This dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 100836 ratings and 3683 tag applications across 9742 movies. These data were created by 610 users between March 29, 1996 and September 24, 2018. This dataset was generated on September 26, 2018.

Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided.

The data are contained in the files - - links.csv - movies.csv - ratings.csv - tags.csv

This and other GroupLens data sets are publicly available for download at http://grouplens.org/datasets/.

License: This dataset is sourced from the GroupLens Research Group at the University of Minnesota. It is provided for non-commercial research and educational purposes only. License details can be found here under Usage License - https://files.grouplens.org/datasets/movielens/ml-latest-small-README.html

Important:

This dataset is provided "as is" without warranty.

For commercial use, please contact grouplens-info@umn.edu."

Citation F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. https://doi.org/10.1145/2827872
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2020). movielens [Dataset]. https://www.tensorflow.org/datasets/catalog/movielens

movielens

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jul 8, 2020

Description

This dataset contains a set of movie ratings from the MovieLens website, a movie recommendation service. This dataset was collected and maintained by GroupLens, a research group at the University of Minnesota. There are 5 versions included: "25m", "latest-small", "100k", "1m", "20m". In all datasets, the movies data and ratings data are joined on "movieId". The 25m dataset, latest-small dataset, and 20m dataset contain only movie data and rating data. The 1m dataset and 100k dataset contain demographic data in addition to movie and rating data.

"25m": This is the latest stable version of the MovieLens dataset. It is recommended for research purposes.
"latest-small": This is a small subset of the latest version of the MovieLens dataset. It is changed and updated over time by GroupLens.
"100k": This is the oldest version of the MovieLens datasets. It is a small dataset with demographic data.
"1m": This is the largest MovieLens dataset that contains demographic data.
"20m": This is one of the most used MovieLens datasets in academic papers along with the 1m dataset.

For each version, users can view either only the movies data by adding the "-movies" suffix (e.g. "25m-movies") or the ratings data joined with the movies data (and users data in the 1m and 100k datasets) by adding the "-ratings" suffix (e.g. "25m-ratings").

The features below are included in all versions with the "-ratings" suffix.

"movie_id": a unique identifier of the rated movie
"movie_title": the title of the rated movie with the release year in parentheses
"movie_genres": a sequence of genres to which the rated movie belongs
"user_id": a unique identifier of the user who made the rating
"user_rating": the score of the rating on a five-star scale
"timestamp": the timestamp of the ratings, represented in seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970

The "100k-ratings" and "1m-ratings" versions in addition include the following demographic features.

"user_gender": gender of the user who made the rating; a true value corresponds to male
"bucketized_user_age": bucketized age values of the user who made the rating, the values and the corresponding ranges are:
- 1: "Under 18"
- 18: "18-24"
- 25: "25-34"
- 35: "35-44"
- 45: "45-49"
- 50: "50-55"
- 56: "56+"
"user_occupation_label": the occupation of the user who made the rating represented by an integer-encoded label; labels are preprocessed to be consistent across different versions
"user_occupation_text": the occupation of the user who made the rating in the original string; different versions can have different set of raw text labels
"user_zip_code": the zip code of the user who made the rating

In addition, the "100k-ratings" dataset would also have a feature "raw_user_age" which is the exact ages of the users who made the rating

Datasets with the "-movies" suffix contain only "movie_id", "movie_title", and "movie_genres" features.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('movielens', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

Clear search

Close search

Google apps

Main menu

movielens

MovieLens 100K

‘Movie Lens Small Latest Dataset’ analyzed by Analyst-2

Summary

MovieLens Dataset - 100K Ratings

movielens