The MovieLens datasets, first released in 1998, describe people’s expressed preferences for movies. These preferences take the form of tuples, each the result of a person expressing a preference (a 0-5 star rating) for a movie at a particular time. These preferences were entered by way of the MovieLens web site1 — a recommender system that asks its users to give movie ratings in order to receive personalized movie recommendations.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a synthetic collection of data for movies, users, and ratings. It is intended for use in developing and testing recommendation algorithms, particularly those used in movie recommendation systems. The dataset includes:
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These datasets include Douban movies and NetEase songs with attributes such as actors, directors, singers, albums and so on. Furthermore, the source code of ACAM model is also provided, which is a feature-level co-attention based recommendation model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains visual features extracted from 12875 movie trailers. The visual features are extracted from key-frames of movie trailers with the VGG-19 CNN, pre-trained on ImageNet.
Movies in the datset are identified by their MovieLens movieId.
Features_sparse.zip contains the 4096-dimensional feature vectors of each key-frame from every movie.
Visual labels.zip contains the1000 dimensional label feature vectors of each key-frame from every movie.
DeepCineProp-f.p has combined the label features of each movie into a vector space model with the use of tf-idf.
CineSub.p contains the subtitles of each movie represented in a vector space model pre-processed with various nlp techniques and produced using tf-idf.
Abstract:
When a movie is uploaded to a movie Recommender System (e.g., YouTube), the system can exploit various forms of descriptive features (e.g., tags and genre) in order to generate personalized recommendation for users. However, there are situations where the descriptive features are missing or very limited and the system may fail to include such a movie in the recommendation list, known as Cold-start problem. This thesis investigates recommendation based on a novel form of content features, extracted from movies, in order to generate recommendation for users. Such features represent the visual aspects of movies, based on Deep Learning models, and hence, do not require any human annotation when extracted. The proposed technique has been evaluated in both offline and online evaluations using a large dataset of movies. The online evaluation has been carried out in a evaluation framework developed for this thesis. Results from the offline and online evaluation (N=150) show that automatically extracted visual features can mitigate the cold-start problem by generating recommendation with a superior quality compared to different baselines, including recommendation based on human-annotated features. The results also point to subtitles as a high-quality future source of automatically extracted features.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
"Movie Recommendation on the IMDB Dataset: A Journey into Machine Learning" is an exciting project focused on leveraging the IMDB Dataset for developing an advanced movie recommendation system. This project aims to explore the vast potential of machine learning techniques in providing personalized movie recommendations to users.
The IMDB Dataset, comprising a wealth of movie information including genres, ratings, and user reviews, serves as the foundation for this project. By harnessing the power of machine learning algorithms and data analysis, the project seeks to build a recommendation system that can accurately suggest movies tailored to each individual's preferences.
vatsal1704/movie-recommendation-system dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary of the MovieLens 1M data set.
This dataset was created by Tanisha Saggar765
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Movielens is a movie recommendation dataset widely used for benchmarking process. 385There are nearly 100,000 hard ratings on 19 different types of movies (Action, Comedy 386and so on).
Movie Recommender Dataset
This dataset contains the pickled files for a Streamlit-based movie recommendation system.
Movies.pkl: Preprocessed movie metadata and tags Similarity.pkl: Cosine similarity matrix
Uploaded for use in Hugging Face Spaces.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MA14KD (Movie Attract 14K Dataset) provides a set of 181 aggregated VISUAL features extracted from 14074 movie and tv series trailers. The movie IDs are in agreement with the movie IDs provided by another rating dataset that also contains movie genres and tags (see the description within the file). More details can be found in the following publication:
Farshad B. Moghaddam, Mehdi Elahi, Reza Hosseini, Christoph Trattner, Marko Tkalcic, Predicting Movie Popularity and Ratings with Visual Features, IEEE SMAP’19, 9-10 June 2019, Larnaca, Cyprus
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Nowadays, there are lots of datasets available for training and experimentation in the field of recommender systems. Specifically, in the recommendation of audiovisual content, the MovieLens dataset is a prominent example. It is focused on the user-item relationship, providing actual interaction data between users and movies. However, although movies can be described with several characteristics, this dataset only offers limited information about the movie genres.
In this work, we propose enriching the MovieLens dataset by incorporating metadata available on the web (such as cast, description, keywords, etc.) and movie trailers. By leveraging the trailers, we extract audio information and generate transcriptions for each trailer, introducing a crucial textual dimension to the dataset. The audio information was extracted by the waveform and frequency analysis, followed by the application of dimensionality reduction techniques. For the transcription generation, the deep learning model Whisper was used. Finally, metadata was obtained from TMDB, and the BERT model was applied to extract embeddings.
These additional attributes enrich the original dataset, providing deeper and more precise analysis. Then, the use of this extended and enhanced dataset could drive significant advancements in recommendation systems, enhancing user experiences by providing more relevant and tailored movie recommendations based on their tastes and preferences.
MHMirzaei/movie-recommendation-queries dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This data set is user historical viewing record data crawled from Douban platform using crawler technology, including 27819 scoring data of 198 users, with a sparsity of 97.8%. The data set includes not only the basic attribute information of the movie, but also the user's interest value of short-term interest and long-term interest and the score after resetting.
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The global movie rating sites market is experiencing robust growth, driven by the increasing consumption of online streaming services and a surge in user-generated content. The market's expansion is fueled by several key factors. Firstly, the rising popularity of streaming platforms like Netflix, Hulu, and Amazon Prime Video has led to a greater demand for reliable movie rating and review information. Users rely on these sites to make informed decisions about which movies to watch, enhancing their overall viewing experience. Secondly, the proliferation of social media and online communities focused on film discussion fosters engagement with movie rating platforms, creating a network effect that increases usage and influence. The segmentation by application (movie promotion, research, audience choice) and type (user ratings, professional ratings) indicates a diverse market landscape with opportunities for both user-driven and expert-curated content. While established players like Rotten Tomatoes and IMDb dominate, newer platforms are emerging, offering specialized features and niche audiences. Geographic expansion, particularly in regions with rapidly growing internet penetration and a rising middle class, presents significant growth potential. However, challenges remain, including the need to manage fake reviews and maintain data accuracy to retain user trust. Furthermore, competition from within the streaming platforms themselves, which often integrate their own rating systems, presents an ongoing challenge. Despite these challenges, the market is projected for continued growth. A conservative estimate, considering a global CAGR of 15% (a reasonable figure based on the growth of the streaming industry and online movie engagement), predicts substantial market expansion over the forecast period (2025-2033). This growth will be driven by technological advancements that enhance user experience and the integration of AI-driven recommendation systems within movie rating platforms. The market is ripe for innovation, with opportunities for personalized recommendation engines and the incorporation of data analytics to provide more insightful reviews and audience sentiment analysis. The competitive landscape will likely see consolidation and further specialization, with platforms focusing on specific niches or geographical regions to gain a competitive edge.
This dataset contains a set of movie ratings from the MovieLens website, a movie recommendation service. This dataset was collected and maintained by GroupLens, a research group at the University of Minnesota. There are 5 versions included: "25m", "latest-small", "100k", "1m", "20m". In all datasets, the movies data and ratings data are joined on "movieId". The 25m dataset, latest-small dataset, and 20m dataset contain only movie data and rating data. The 1m dataset and 100k dataset contain demographic data in addition to movie and rating data.
For each version, users can view either only the movies data by adding the "-movies" suffix (e.g. "25m-movies") or the ratings data joined with the movies data (and users data in the 1m and 100k datasets) by adding the "-ratings" suffix (e.g. "25m-ratings").
The features below are included in all versions with the "-ratings" suffix.
The "100k-ratings" and "1m-ratings" versions in addition include the following demographic features.
In addition, the "100k-ratings" dataset would also have a feature "raw_user_age" which is the exact ages of the users who made the rating
Datasets with the "-movies" suffix contain only "movie_id", "movie_title", and "movie_genres" features.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('movielens', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The dataset has a total of 1000top English movies. Use it for building a movie recommendation system or classification based on genre, 1. Box office prediction: Predict movie revenue based on factors like budget, genre, and release date. 2. Rating prediction: Predict movie ratings based on attributes like genre, director, and cast.
Genre annotations for movies The file genre2movies.csv contains genre-movie tuples based on Wikidata annotations (https://www.wikidata.org/).
Data Each line in genre2movies.csv represents one genre-movie tuple. The first entry is the genre. The second entry of each line is the movie name. There are 83,670 genre-movie tuples. Joining with the Movielens 20M dataset
The movies considered are from the Movielens 20M corpus: https://grouplens.org/datasets/movielens/20m/ The movie names in genre2movies.csv match the movie 'titles' in Movielens 20M.
Compositions The directory "compositions" contains movies assigned to compositions of genres. The compositions are of the form: "genre A and genre B", "genre A and not genre B", "genre A and genre B and genre C", "genre A and genre B and not genre C". These assignments have been automatically generated from genre2movies.csv. We try to generate genre-compositions that are useful, e.g., for a "genre A and genre B" composition we ensure that genre B is not a subgenre of genre A, because an interesection of a superset with a subset is identical to the subset and does not form a new concept.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
The MovieLens datasets, first released in 1998, describe people’s expressed preferences for movies. These preferences take the form of tuples, each the result of a person expressing a preference (a 0-5 star rating) for a movie at a particular time. These preferences were entered by way of the MovieLens web site1 — a recommender system that asks its users to give movie ratings in order to receive personalized movie recommendations.