19 datasets found

g
MovieLens 100K
grouplens.org
kaggle.com
Updated Oct 12, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). MovieLens 100K [Dataset]. https://grouplens.org/datasets/movielens/100k/
Explore at:
Dataset updated
Oct 12, 2015
Description
Stable benchmark dataset. 100,000 ratings from 1000 users on 1700 movies. Released 4/1998.
g
MovieLens 1M
grouplens.org
kaggle.com
Updated Mar 19, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). MovieLens 1M [Dataset]. https://grouplens.org/datasets/movielens/1m/
Explore at:
Dataset updated
Mar 19, 2016
Description
Stable benchmark dataset. 1 million ratings from 6000 users on 4000 movies. Released 2/2003.
g
MovieLens 20M
grouplens.org
academictorrents.com
Updated Mar 19, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). MovieLens 20M [Dataset]. https://grouplens.org/datasets/movielens/20m/
Explore at:
Dataset updated
Mar 19, 2016
Description
Stable benchmark dataset. 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users. Includes tag genome data with 12 million relevance scores across 1,100 tags. Released 4/2015; updated 10/2016 to update links.csv and add tag genome data.
g
MovieLens 25M
grouplens.org
Updated Dec 11, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). MovieLens 25M [Dataset]. https://grouplens.org/datasets/movielens/25m/
Explore at:
Dataset updated
Dec 11, 2019
Description
Stable benchmark dataset. 25 million ratings and one million tag applications applied to 62,000 movies by 162,000 users. Includes tag genome data with 15 million relevance scores across 1,129 tags. Released 12/2019
T
movielens
tensorflow.org
opendatalab.com
+1more
Updated Jul 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). movielens [Dataset]. https://www.tensorflow.org/datasets/catalog/movielens
Explore at:
Dataset updated
Jul 8, 2020
Description
This dataset contains a set of movie ratings from the MovieLens website, a movie recommendation service. This dataset was collected and maintained by GroupLens, a research group at the University of Minnesota. There are 5 versions included: "25m", "latest-small", "100k", "1m", "20m". In all datasets, the movies data and ratings data are joined on "movieId". The 25m dataset, latest-small dataset, and 20m dataset contain only movie data and rating data. The 1m dataset and 100k dataset contain demographic data in addition to movie and rating data.

"25m": This is the latest stable version of the MovieLens dataset. It is recommended for research purposes.

"latest-small": This is a small subset of the latest version of the MovieLens dataset. It is changed and updated over time by GroupLens.

"100k": This is the oldest version of the MovieLens datasets. It is a small dataset with demographic data.

"1m": This is the largest MovieLens dataset that contains demographic data.

"20m": This is one of the most used MovieLens datasets in academic papers along with the 1m dataset.

For each version, users can view either only the movies data by adding the "-movies" suffix (e.g. "25m-movies") or the ratings data joined with the movies data (and users data in the 1m and 100k datasets) by adding the "-ratings" suffix (e.g. "25m-ratings").

The features below are included in all versions with the "-ratings" suffix.

"movie_id": a unique identifier of the rated movie

"movie_title": the title of the rated movie with the release year in parentheses

"movie_genres": a sequence of genres to which the rated movie belongs

"user_id": a unique identifier of the user who made the rating

"user_rating": the score of the rating on a five-star scale

"timestamp": the timestamp of the ratings, represented in seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970

The "100k-ratings" and "1m-ratings" versions in addition include the following demographic features.

"user_gender": gender of the user who made the rating; a true value corresponds to male

"bucketized_user_age": bucketized age values of the user who made the rating, the values and the corresponding ranges are:

1: "Under 18"

18: "18-24"

25: "25-34"

35: "35-44"

45: "45-49"

50: "50-55"

56: "56+"

"user_occupation_label": the occupation of the user who made the rating represented by an integer-encoded label; labels are preprocessed to be consistent across different versions

"user_occupation_text": the occupation of the user who made the rating in the original string; different versions can have different set of raw text labels

"user_zip_code": the zip code of the user who made the rating

In addition, the "100k-ratings" dataset would also have a feature "raw_user_age" which is the exact ages of the users who made the rating

Datasets with the "-movies" suffix contain only "movie_id", "movie_title", and "movie_genres" features.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('movielens', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
MovieLens 20M Dataset
kaggle.com
Updated Aug 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GroupLens (2018). MovieLens 20M Dataset [Dataset]. https://www.kaggle.com/datasets/grouplens/movielens-20m-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 15, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
GroupLens
Description
Context

The datasets describe ratings and free-text tagging activities from MovieLens, a movie recommendation service. It contains 20000263 ratings and 465564 tag applications across 27278 movies. These data were created by 138493 users between January 09, 1995 and March 31, 2015. This dataset was generated on October 17, 2016.

Users were selected at random for inclusion. All selected users had rated at least 20 movies.

Content

No demographic information is included. Each user is represented by an id, and no other information is provided.

The data are contained in six files.

tag.csv that contains tags applied to movies by users:

userId

movieId

tag

timestamp

rating.csv that contains ratings of movies by users:

userId

movieId

rating

timestamp

movie.csv that contains movie information:

movieId

title

genres

link.csv that contains identifiers that can be used to link to other sources:

movieId

imdbId

tmbdId

genome_scores.csv that contains movie-tag relevance data:

movieId

tagId

relevance

genome_tags.csv that contains tag descriptions:

tagId

tag

Acknowledgements

The original datasets can be found here. To acknowledge use of the dataset in publications, please cite the following paper:

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872

Inspiration

Some ideas worth exploring:

Which genres receive the highest ratings? How does this change over time?

Determine the temporal trends in the genres/tagging activity of the movies released
H
Standardized Hudup dataset based on Movielens 100k
dataverse.harvard.edu
data.mendeley.com
Updated Feb 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Loc Nguyen (2021). Standardized Hudup dataset based on Movielens 100k [Dataset]. http://doi.org/10.7910/DVN/ZF3GWF
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ZF3GWF
Dataset updated
Feb 16, 2021
Dataset provided by
Harvard Dataverse
Authors
Loc Nguyen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Standardized Hudup dataset receives information from raw data, which is composed of ten units such as “hdp_config”, “hdp_account”, “hdp_attribute_map”, “hdp_nominal”, “hdp_user”, “hdp_item”, “hdp_rating”, “hdp_context_template”, “hdp_context”, and “hdp_sample”. Each unit has particular functions, which is described in the section of data description. Hudup dataset is meta-data which models any raw data with abstract level. The default raw data which is source of Hudup dataset here is Movielens dataset (GroupLens, 1998) 100K has 100,000 ratings from 943 users on 1682 movies (items), which is available at https://files.grouplens.org/datasets/movielens/ml-100k.zip.
g
MovieLens 32M
grouplens.org
Updated May 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). MovieLens 32M [Dataset]. https://grouplens.org/datasets/movielens/32m/
Explore at:
Dataset updated
May 19, 2024
Description
Stable benchmark dataset. 32 million ratings and two million tag applications applied to 87,585 movies by 200,948 users. Collected 10/2023 Released 05/2024
Book Genome Dataset
kaggle.com
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Young (2023). Book Genome Dataset [Dataset]. https://www.kaggle.com/datasets/youngdaniel/book-genome-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 30, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Daniel Young
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
I uploaded GroupLens' Book Genome dataset on Kaggle. It doesn't seem like they're active here any more and I want to use this here for some exploratory learning work I did.

Official link here: https://grouplens.org/datasets/book-genome/

Tag Genome is a data structure containing scores indicating the degree to which tags apply to items, such as movies or books. This dataset contains a Tag Genome generated for a set of books along with the data used for its generation (raw data). Raw data consists of a subset of the Goodreads dataset [Wan and McAuley, 2018, Wan et al., 2019] and book-tag ratings. The Goodreads subset includes information on popular books, such as titles, authors, release years, user ratings, reviews and shelves. Shelves are lists that users use to organize books in Goodreads (https://www.goodreads.com/). In these instructions, we refer to adding books to shelves as attaching tags (shelf names) to books. To collect book-tag ratings, we conducted a survey on Amazon Mechanical Turk, where we asked users to indicate degree to which tags apply to books from this subset. To generate book-tag scores, we used two state-of-the-art algorithms: Glmer [Vig et al., 2012] and TagDL [Kotkov et al., 2021]. The code is available in the following GitHub repository: https://github.com/Bionic1251/Revisiting-the-Tag-Relevance-Prediction-Problem
the_movies_dataset
kaggle.com
zip
Updated Jun 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sezgin ildes (2021). the_movies_dataset [Dataset]. https://www.kaggle.com/sezginildes/the-movies-dataset
Explore at:
zip(15456686 bytes)Available download formats
Dataset updated
Jun 19, 2021
Authors
sezgin ildes
Description
Context These files contain metadata for all 45,000 movies listed in the Full MovieLens Dataset. The dataset consists of movies released on or before July 2017. Data points include cast, crew, plot keywords, budget, revenue, posters, release dates, languages, production companies, countries, TMDB vote counts and vote averages.

This dataset also has files containing 26 million ratings from 270,000 users for all 45,000 movies. Ratings are on a scale of 1-5 and have been obtained from the official GroupLens website.

Content This dataset consists of the following files:

movies_metadata.csv: The main Movies Metadata file. Contains information on 45,000 movies featured in the Full MovieLens dataset. Features include posters, backdrops, budget, revenue, release dates, languages, production countries and companies.

keywords.csv: Contains the movie plot keywords for our MovieLens movies. Available in the form of a stringified JSON Object.

credits.csv: Consists of Cast and Crew Information for all our movies. Available in the form of a stringified JSON Object.

links.csv: The file that contains the TMDB and IMDB IDs of all the movies featured in the Full MovieLens dataset.

links_small.csv: Contains the TMDB and IMDB IDs of a small subset of 9,000 movies of the Full Dataset.

ratings_small.csv: The subset of 100,000 ratings from 700 users on 9,000 movies.

The Full MovieLens Dataset consisting of 26 million ratings and 750,000 tag applications from 270,000 users on all the 45,000 movies in this dataset can be accessed here

Acknowledgements This dataset is an ensemble of data collected from TMDB and GroupLens. The Movie Details, Credits and Keywords have been collected from the TMDB Open API. This product uses the TMDb API but is not endorsed or certified by TMDb. Their API also provides access to data on many additional movies, actors and actresses, crew members, and TV shows. You can try it for yourself here.

The Movie Links and Ratings have been obtained from the Official GroupLens website. The files are a part of the dataset available here

Inspiration This dataset was assembled as part of my second Capstone Project for Springboard's Data Science Career Track. I wanted to perform an extensive EDA on Movie Data to narrate the history and the story of Cinema and use this metadata in combination with MovieLens ratings to build various types of Recommender Systems.

Both my notebooks are available as kernels with this dataset: The Story of Film and Movie Recommender Systems

Some of the things you can do with this dataset: Predicting movie revenue and/or movie success based on a certain metric. What movies tend to get higher vote counts and vote averages on TMDB? Building Content Based and Collaborative Filtering Based Recommendation Engines.
h
MovieLens Dataset - 100K 评级 - Dataset - 海数据
haidatas.com
Updated Mar 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). MovieLens Dataset - 100K 评级 - Dataset - 海数据 [Dataset]. https://haidatas.com/dataset/movielens-shujuji-100k-pingji
Explore at:
Dataset updated
Mar 9, 2025
Description
关于数据集此数据集 (ml-latest-small) 描述了电影推荐服务 MovieLens 的 5 星评分和自由文本标记活动。它包含 9742 部电影的 100836 个评分和 3683 个标签应用。这些数据由 610 名用户在 1996 年 3 月 29 日至 2018 年 9 月 24 日期间创建。此数据集于 2018 年 9 月 26 日生成。用户是随机选择的。所有选定的用户都至少评价过 20 部电影。不包括人口统计信息。每个用户都用一个 ID 表示，不提供其他信息。数据包含在以下文件中 - 链接.csv 电影.csv 评级.csv 标签.csv 该数据集和其他 GroupLens 数据集均可从http://grouplens.org/datasets/公开下载。许可证：此数据集来源于明尼苏达大学的 GroupLens 研究小组。它仅用于非商业研究和教育目的。许可证详细信息可在使用许可证下找到 - https://files.grouplens.org/datasets/movielens/ml-latest-small-README.html 重要的：此数据集按“原样”提供，不提供任何担保。如需商业使用，请联系 grouplens-info@umn.edu。” 引文 F. Maxwell Harper 和 Joseph A. Konstan。2015 年。MovieLens 数据集：历史和背景。ACM 交互式智能系统汇刊 (TiiS) 5, 4: 19:1–19:19。https ://doi.org/10.1145/2827872
movie_rating_data
kaggle.com
Updated Nov 9, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
pooh (2017). movie_rating_data [Dataset]. https://www.kaggle.com/ashukr/movie-rating-data/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 9, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
pooh
Description
Context

Stable benchmark dataset. 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users.

Content

this dataset has got three files named as ratings.csv, movies.csv and tags.csv

movies.csv In the 3 columns stored are the values of movieId, title and genre. The title has got the release year of movie in parenthesis. The movie list range from Dickson Greeting (1891) to movies of 2015. With the total of 27278 movies.

ratings.csv the movies have been rated by 138493 users on the scale of 1 to 5, this file contains the information divided in the column 'userId', 'movieId', 'rating' and 'timestamp'.

tags.csv this file has the data divided under category 'userId','movieId' and 'tag'

Acknowledgements

I got this data from MovieLens, for a mini project. http://grouplens.org/datasets/movielens/20m/"> This is the link to original data set

Inspiration

You have got a ton data. You can use this to make fun decisions like which is the best movie series of all time or create a completely new story out of the data that you have.
data1.tar.gz
figshare.com
application/x-gzip
Updated May 20, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Douglas Bates (2020). data1.tar.gz [Dataset]. http://doi.org/10.6084/m9.figshare.12343910.v1
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12343910.v1
Dataset updated
May 20, 2020
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Douglas Bates
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A collection of several data sets used to illustrate fitting (generalized) linear mixed-effects models. Individual data sets are in Feather format (https://github.com/wesm/feather). They include Dyestuff, Dyestuff2, Penicillin, Pastes, InstEval, sleepstudy, cbpp, Contraception, grouseticks and VerbAgg from the lme4 package for R. The kb07 data is from github.com/dalejbarr/kronmueller-barr-2007 and ml1m is from https://grouplens.org/datasets/movielens/1m/
Z
Fair RecSys Datasets
data.niaid.nih.gov
Updated Feb 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kowald Dominik (2023). Fair RecSys Datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6123878
Explore at:
Dataset updated
Feb 22, 2023
Dataset authored and provided by
Kowald Dominik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Four multimedia recommender systems datasets to study popularity bias and fairness:

Last.fm (lfm.zip), based on the LFM-1b dataset of JKU Linz (http://www.cp.jku.at/datasets/LFM-1b/)

MovieLens (ml.zip), based on MovieLens-1M dataset (https://grouplens.org/datasets/movielens/1m/)

BookCrossing (book.zip), based on the BookCrossing dataset of Uni Freiburg (http://www2.informatik.uni-freiburg.de/~cziegler/BX/)

MyAnimeList (anime.zip), based on the MyAnimeList dataset of Kaggle (https://www.kaggle.com/CooperUnion/anime-recommendations-database)

Each dataset contains of user interactions (user_events.txt) and three user groups that differ in their inclination to popular/mainstream items: LowPop (low_main_users.txt), MedPop (med_main_users.txt), and HighPop (high_main_users.txt).

The format of the three user files are "user,mainstreaminess"

The format of the user-events files are "user,item,preference"

Example Python-code for analyzing the datasets as well as more information on the user groups can be found on Github (https://github.com/domkowald/FairRecSys) and on Arxiv (https://arxiv.org/abs/2203.00376)
FEDORA-Recsys Test Traces
zenodo.org
zip
Updated Mar 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jinyu Liu; Jinyu Liu (2025). FEDORA-Recsys Test Traces [Dataset]. http://doi.org/10.5281/zenodo.14818428
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14818428
Dataset updated
Mar 28, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jinyu Liu; Jinyu Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Synthetic trace generated using techniques from DLRM from the data distributions of the Taobao Ad Display/Click Dataset and the Movielens 20M Dataset. Intended for testing of FEDORA-OramSim simulator.
Grouplens Datasets (ml-1m, ml-100K, and hetrec2011-movielens-2k-v2)
figshare.com
zip
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. Maxwell Harper; Joseph A. Konstan (2023). Grouplens Datasets (ml-1m, ml-100K, and hetrec2011-movielens-2k-v2) [Dataset]. http://doi.org/10.6084/m9.figshare.7093595.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7093595.v1
Dataset updated
Jun 6, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
F. Maxwell Harper; Joseph A. Konstan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
GroupLens is a research lab in the Department of Computer Science and Engineering at the University of Minnesota, Twin Cities specializing in recommender systems, online communities, mobile and ubiquitous technologies, digital libraries, and local geographic information systems.
Movielens 100k dataset
kaggle.com
Updated Dec 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fakhre Alam (2020). Movielens 100k dataset [Dataset]. https://www.kaggle.com/datasets/fakhrealam0786/movielens-100k-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 1, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Fakhre Alam
Description
Context

This dataset is a subset of MovieLens 100k data which were collected by the GroupLens Research Project at the University of Minnesota. You can find full dataset from here👍

Content

This data set consists of 6 columns: * movie_id -- unique id for each movie * title -- title of the movie * year -- year in which the movie was released * directors -- director of the movie * actors -- actors of the movie * genres -- genres of the movie (ex: comedy, action, horror, etc...)

Acknowledgements

Thanks to GroupLens for providing up this data.
MovieLens Latest Small
kaggle.com
zip
Updated Oct 12, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GroupLens (2018). MovieLens Latest Small [Dataset]. https://www.kaggle.com/grouplens/movielens-latest-small
Explore at:
zip(993937 bytes)Available download formats
Dataset updated
Oct 12, 2018
Dataset authored and provided by
GroupLens
Description
Dataset

This dataset was created by Max Harper

Released under Other (specified in description)

Contents

It contains the following files:
Movielens - Case Study
kaggle.com
Updated Mar 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khushboo Nagdewani (2020). Movielens - Case Study [Dataset]. https://www.kaggle.com/khushboon/movielens-case-study/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 23, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Khushboo Nagdewani
Description
Background of Problem Statement

The GroupLens Research Project is a research group in the Department of Computer Science and Engineering at the University of Minnesota. Members of the GroupLens Research Project are involved in many research projects related to the fields of information filtering, collaborative filtering, and recommender systems. The project is led by professors John Riedl and Joseph Konstan. The project began to explore automated collaborative filtering in 1992 but is most well known for its worldwide trial of an automated collaborative filtering system for Usenet news in 1996. Since then the project has expanded its scope to research overall information by filtering solutions, integrating into content-based methods, as well as, improving current collaborative filtering technology.

Problem Objective :

Here, we ask you to perform the analysis using the Exploratory Data Analysis technique. You need to find features affecting the ratings of any particular movie and build a model to predict the movie ratings.
Not seeing a result you expected?
Learn how you can add new datasets to our index.