4 datasets found
  1. T

    movielens

    • tensorflow.org
    • opendatalab.com
    • +1more
    Updated Jul 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). movielens [Dataset]. https://www.tensorflow.org/datasets/catalog/movielens
    Explore at:
    Dataset updated
    Jul 8, 2020
    Description

    This dataset contains a set of movie ratings from the MovieLens website, a movie recommendation service. This dataset was collected and maintained by GroupLens, a research group at the University of Minnesota. There are 5 versions included: "25m", "latest-small", "100k", "1m", "20m". In all datasets, the movies data and ratings data are joined on "movieId". The 25m dataset, latest-small dataset, and 20m dataset contain only movie data and rating data. The 1m dataset and 100k dataset contain demographic data in addition to movie and rating data.

    • "25m": This is the latest stable version of the MovieLens dataset. It is recommended for research purposes.
    • "latest-small": This is a small subset of the latest version of the MovieLens dataset. It is changed and updated over time by GroupLens.
    • "100k": This is the oldest version of the MovieLens datasets. It is a small dataset with demographic data.
    • "1m": This is the largest MovieLens dataset that contains demographic data.
    • "20m": This is one of the most used MovieLens datasets in academic papers along with the 1m dataset.

    For each version, users can view either only the movies data by adding the "-movies" suffix (e.g. "25m-movies") or the ratings data joined with the movies data (and users data in the 1m and 100k datasets) by adding the "-ratings" suffix (e.g. "25m-ratings").

    The features below are included in all versions with the "-ratings" suffix.

    • "movie_id": a unique identifier of the rated movie
    • "movie_title": the title of the rated movie with the release year in parentheses
    • "movie_genres": a sequence of genres to which the rated movie belongs
    • "user_id": a unique identifier of the user who made the rating
    • "user_rating": the score of the rating on a five-star scale
    • "timestamp": the timestamp of the ratings, represented in seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970

    The "100k-ratings" and "1m-ratings" versions in addition include the following demographic features.

    • "user_gender": gender of the user who made the rating; a true value corresponds to male
    • "bucketized_user_age": bucketized age values of the user who made the rating, the values and the corresponding ranges are:
      • 1: "Under 18"
      • 18: "18-24"
      • 25: "25-34"
      • 35: "35-44"
      • 45: "45-49"
      • 50: "50-55"
      • 56: "56+"
    • "user_occupation_label": the occupation of the user who made the rating represented by an integer-encoded label; labels are preprocessed to be consistent across different versions
    • "user_occupation_text": the occupation of the user who made the rating in the original string; different versions can have different set of raw text labels
    • "user_zip_code": the zip code of the user who made the rating

    In addition, the "100k-ratings" dataset would also have a feature "raw_user_age" which is the exact ages of the users who made the rating

    Datasets with the "-movies" suffix contain only "movie_id", "movie_title", and "movie_genres" features.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('movielens', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  2. g

    MovieLens 100K

    • grouplens.org
    Updated Oct 12, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2015). MovieLens 100K [Dataset]. https://grouplens.org/datasets/movielens/100k/
    Explore at:
    Dataset updated
    Oct 12, 2015
    Description

    Stable benchmark dataset. 100,000 ratings from 1000 users on 1700 movies. Released 4/1998.

  3. A

    ‘Movie Lens Small Latest Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Movie Lens Small Latest Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-movie-lens-small-latest-dataset-6de3/a98cfad2/?iid=001-258&v=presentation
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Movie Lens Small Latest Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shubhammehta21/movie-lens-small-latest-dataset on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    Summary

    This dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 100836 ratings and 3683 tag applications across 9742 movies. These data were created by 610 users between March 29, 1996 and September 24, 2018. This dataset was generated on September 26, 2018.

    Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided.

    The data are contained in the files links.csv, movies.csv, ratings.csv and tags.csv. More details about the contents and use of all these files follows.

    This is a development dataset. As such, it may change over time and is not an appropriate dataset for shared research results. See available benchmark datasets if that is your intent.

    This and other GroupLens data sets are publicly available for download at

    --- Original source retains full ownership of the source dataset ---

  4. MovieLens Dataset - 100K Ratings

    • kaggle.com
    Updated Feb 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sriharsha (2025). MovieLens Dataset - 100K Ratings [Dataset]. https://www.kaggle.com/datasets/sriharshabsprasad/movielens-dataset-100k-ratings/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 28, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sriharsha
    Description

    This dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 100836 ratings and 3683 tag applications across 9742 movies. These data were created by 610 users between March 29, 1996 and September 24, 2018. This dataset was generated on September 26, 2018.

    Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided.

    The data are contained in the files - - links.csv - movies.csv - ratings.csv - tags.csv

    This and other GroupLens data sets are publicly available for download at http://grouplens.org/datasets/.

    License: This dataset is sourced from the GroupLens Research Group at the University of Minnesota. It is provided for non-commercial research and educational purposes only. License details can be found here under Usage License - https://files.grouplens.org/datasets/movielens/ml-latest-small-README.html

    Important:

    • This dataset is provided "as is" without warranty.
    • For commercial use, please contact grouplens-info@umn.edu."

    Citation F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. https://doi.org/10.1145/2827872

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2020). movielens [Dataset]. https://www.tensorflow.org/datasets/catalog/movielens

movielens

Related Article
Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jul 8, 2020
Description

This dataset contains a set of movie ratings from the MovieLens website, a movie recommendation service. This dataset was collected and maintained by GroupLens, a research group at the University of Minnesota. There are 5 versions included: "25m", "latest-small", "100k", "1m", "20m". In all datasets, the movies data and ratings data are joined on "movieId". The 25m dataset, latest-small dataset, and 20m dataset contain only movie data and rating data. The 1m dataset and 100k dataset contain demographic data in addition to movie and rating data.

  • "25m": This is the latest stable version of the MovieLens dataset. It is recommended for research purposes.
  • "latest-small": This is a small subset of the latest version of the MovieLens dataset. It is changed and updated over time by GroupLens.
  • "100k": This is the oldest version of the MovieLens datasets. It is a small dataset with demographic data.
  • "1m": This is the largest MovieLens dataset that contains demographic data.
  • "20m": This is one of the most used MovieLens datasets in academic papers along with the 1m dataset.

For each version, users can view either only the movies data by adding the "-movies" suffix (e.g. "25m-movies") or the ratings data joined with the movies data (and users data in the 1m and 100k datasets) by adding the "-ratings" suffix (e.g. "25m-ratings").

The features below are included in all versions with the "-ratings" suffix.

  • "movie_id": a unique identifier of the rated movie
  • "movie_title": the title of the rated movie with the release year in parentheses
  • "movie_genres": a sequence of genres to which the rated movie belongs
  • "user_id": a unique identifier of the user who made the rating
  • "user_rating": the score of the rating on a five-star scale
  • "timestamp": the timestamp of the ratings, represented in seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970

The "100k-ratings" and "1m-ratings" versions in addition include the following demographic features.

  • "user_gender": gender of the user who made the rating; a true value corresponds to male
  • "bucketized_user_age": bucketized age values of the user who made the rating, the values and the corresponding ranges are:
    • 1: "Under 18"
    • 18: "18-24"
    • 25: "25-34"
    • 35: "35-44"
    • 45: "45-49"
    • 50: "50-55"
    • 56: "56+"
  • "user_occupation_label": the occupation of the user who made the rating represented by an integer-encoded label; labels are preprocessed to be consistent across different versions
  • "user_occupation_text": the occupation of the user who made the rating in the original string; different versions can have different set of raw text labels
  • "user_zip_code": the zip code of the user who made the rating

In addition, the "100k-ratings" dataset would also have a feature "raw_user_age" which is the exact ages of the users who made the rating

Datasets with the "-movies" suffix contain only "movie_id", "movie_title", and "movie_genres" features.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('movielens', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

Search
Clear search
Close search
Google apps
Main menu