Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
What does this dataset contain?
This dataset contains over 700 million time-stamped listening events collected from 3.4M anonymised users on the music streaming service Deezer, occurred between March and August 2022. It includes 50k anonymised songs, among the most popular ones on the service as well as their pre-trained embedding vectors, calculated by our internal model. All files are in parquet format which could be read by using pandas.read_parquet
function.
What could this dataset be used for?
This dataset could be used for collaborative filtering as well as sequential recommendation (including both next-item and next-session recommendations).
Citation
If you use this dataset, please cite following paper:
@inproceedings{tran-recsys2024, title={Transformers Meet ACT-R: Repeat-Aware and Sequential Listening Session Recommendation}, author={Viet-Anh Tran, Guillaume Salha-Galvan, Bruno Sguerra and Romain Hennequin}, booktitle = {Proceedings of the 18th ACM Conference on Recommender Systems}, year = {2024} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Music recommender systems can offer users personalized and contextualized recommendation and are therefore important for music information retrieval. An increasing number of datasets have been compiled to facilitate research on different topics, such as content-based, context-based or next-song recommendation. However, these topics are usually addressed separately using different datasets, due to the lack of a unified dataset that contains a large variety of feature types such as item features, user contexts, and timestamps. To address this issue, we propose a large-scale benchmark dataset called #nowplaying-RS, which contains 11.6 million music listening events (LEs) of 139K users and 346K tracks collected from Twitter. The dataset comes with a rich set of item content features and user context features, and the timestamps of the LEs. Moreover, some of the user context features imply the cultural origin of the users, and some others—like hashtags—give clues to the emotional state of a user underlying an LE. In this paper, we provide some statistics to give insight into the dataset, and some directions in which the dataset can be used for making music recommendation. We also provide standardized training and test sets for experimentation, and some baseline results obtained by using factorization machines.
The dataset contains three files:
user_track_hashtag_timestamp.csv contains basic information about each listening event. For each listening event, we provide an id, the user_id, track_id, hashtag, created_at
context_content_features.csv: contains all context and content features. For each listening event, we provide the id of the event, user_id, track_id, artist_id, content features regarding the track mentioned in the event (instrumentalness, liveness, speechiness, danceability, valence, loudness, tempo, acousticness, energy, mode, key) and context features regarding the listening event (coordinates (as geoJSON), place (as geoJSON), geo (as geoJSON), tweet_language, created_at, user_lang, time_zone, entities contained in the tweet).
sentiment_values.csv contains sentiment information for hashtags. It contains the hashtag itself and the sentiment values gathered via four different sentiment dictionaries: AFINN, Opinion Lexicon, Sentistrength Lexicon and vader. For each of these dictionaries we list the minimum, maximum, sum and average of all sentiments of the tokens of the hashtag (if available, else we list empty values). However, as most hashtags only consist of a single token, these values are equal in most cases. Please note that the lexica are rather diverse and therefore, are able to resolve very different terms against a score. Hence, the resulting csv is rather sparse. The file contains the following comma-separated values: , where we abbreviate all scores gathered over the Opinion Lexicon with the prefix 'ol'. Similarly, 'ss' stands for SentiStrength.
Please also find the training and test-splits for the dataset in this repo. Also, prototypical implementations of a context-aware recommender system based on the dataset can be found at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM.
If you make use of this dataset, please cite the following paper where we describe and experiment with the dataset:
@inproceedings{smc18, title = {#nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music Recommender Systems}, author = {Asmita Poddar and Eva Zangerle and Yi-Hsuan Yang}, url = {http://mac.citi.sinica.edu.tw/~yang/pub/poddar18smc.pdf}, year = {2018}, date = {2018-07-04}, booktitle = {Proceedings of the 15th Sound & Music Computing Conference}, address = {Limassol, Cyprus}, note = {code at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM}, tppubtype = {inproceedings} }
We introduce PDMX: a Public Domain MusicXML dataset for symbolic music processing, including over 250k musical scores in MusicXML format. PDMX is the largest publicly available, copyright-free MusicXML dataset in existence. PDMX includes genre, tag, description, and popularity metadata for every file.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These datasets include Douban movies and NetEase songs with attributes such as actors, directors, singers, albums and so on. Furthermore, the source code of ACAM model is also provided, which is a feature-level co-attention based recommendation model.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset contains data collected during an experiment at Delft University of Technology, as part of Paul Raingeard de la Bletiere PhD Thesis project.
It is being made public both to act as supplementary data for publications and the PhD thesis of Paul Raingeard de la Bletiere and in order for other researchers to use this data in their own work.
The data in this dataset was collected through a website accessed by participants between August 2024 and December 2024.
This research project was made possible by a grant from the Dutch Research Council (NWO) (Grant Number KICH1.GZ02.20.008). Additional support from Alzheimer Nederland is gratefully acknowledged.
The purpose of this experiment was to test a music recommender system linking music with specific episodic memories chosen by participants, through a discussion with a virtual agent. This specific part of the data relates to the ratings of recommendations by participants.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The MSD-A is a dataset related to the Million Song Dataset (MSD). It is a collection of artist tags and biographies gathered from Last.fm for all the artists that have songs in the MSD. In addition, the MSD Taste Profile (recommendation dataset) is adapted to artists.
We provide the biographies, tags, data splits, and feature embeddings to reproduce the experiments from the paper:
Oramas S., Nieto O., Sordo M., & Serra X. (2017) A Deep Multimodal Approach for Cold-start Music Recommendation. https://arxiv.org/abs/1706.09739
Source code is available at https://github.com/sergiooramas/tartarus
The file dlrs-data.tar.gz in this zenodo version is corrupted. You can download the good file in this link:
https://drive.google.com/open?id=0B-oq_x72w8NUbUpkMzZSc1JPd28
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Yambda-5B — A Large-Scale Multi-modal Dataset for Ranking And Retrieval
Industrial-scale music recommendation dataset with organic/recommendation interactions and audio embeddings 📌 Overview • 🔑 Key Features • 📊 Statistics • 📝 Format • 🏆 Benchmark • ⬇️ Download • ❓ FAQ
Overview
The Yambda-5B dataset is a large-scale open database comprising 4.79 billion user-item interactions collected from 1 million users and spanning 9.39 million tracks. The dataset includes… See the full description on the dataset page: https://huggingface.co/datasets/yandex/yambda.
This is a common Zenodo repository for both lastfm-360K and lastfm-1K datasets. See below the details of both datasets, including license, acknowledgements, contact, and instructions to cite.
LASTFM-360K (version 1.2, March 2010).
What is this? This dataset contains tuples (for ~360,000 users) collected from Last.fm API, using the user.getTopArtists() method.
Files:
usersha1-artmbid-artname-plays.tsv (MD5: be672526eb7c69495c27ad27803148f1)
usersha1-profile.tsv (MD5: 51159d4edf6a92cb96f87768aa2be678)
mbox_sha1sum.py (MD5: feb3485eace85f3ba62e324839e6ab39)
Data Statistics:
File usersha1-artmbid-artname-plays.tsv:
Total Lines: 17,559,530
Unique Users: 359,347
Artists with MBID: 186,642
Artists without MBID: 107,373
Data Format: The data is formatted one entry per line as follows (tab separated "\t"):
File usersha1-artmbid-artname-plays.tsv:
user-mboxsha1 \t musicbrainz-artist-id \t artist-name \t plays
File usersha1-profile.tsv:
user-mboxsha1 \t gender (m|f|empty) \t age (int|empty) \t country (str|empty) \t signup (date|empty)
Example:
File usersha1-artmbid-artname-plays.tsv:
000063d3fe1cf2ba248b9e3c3f0334845a27a6be \t a3cb23fc-acd3-4ce0-8f36-1e5aa6a18432 \t u2 \t 31 ...
File usersha1-profile.tsv:
000063d3fe1cf2ba248b9e3c3f0334845a27a6be \t m \t 19 \t Mexico \t Apr 28, 2008 ...
LASTFM-1K (version 1.0, March 2010).
What is this? This dataset contains tuples collected from Last.fm API, using the user.getRecentTracks() method. This dataset represents the whole listening habits (till May, 5th 2009) for nearly 1,000 users.
Files:
userid-timestamp-artid-artname-traid-traname.tsv (MD5: 64747b21563e3d2aa95751e0ddc46b68)
userid-profile.tsv (MD5: c53608b6b445db201098c1489ea497df)
Data Statistics:
File userid-timestamp-artid-artname-traid-traname.tsv:
Total Lines: 19,150,868
Unique Users: 992
Artists with MBID: 107,528
Artists without MBDID: 69,420
Data Format: The data is formatted one entry per line as follows (tab separated, "\t"):
File userid-timestamp-artid-artname-traid-traname.tsv:
userid \t timestamp \t musicbrainz-artist-id \t artist-name \t musicbrainz-track-id \t track-name
File userid-profile.tsv:
userid \t gender ('m'|'f'|empty) \t age (int|empty) \t country (str|empty) \t signup (date|empty)
Example:
File userid-timestamp-artid-artname-traid-traname.tsv:
user_000639 \t 2009-04-08T01:57:47Z \t MBID \t The Dogs D'Amour \t MBID \t Fall in Love Again? user_000639 \t 2009-04-08T01:53:56Z \t MBID \t The Dogs D'Amour \t MBID \t Wait Until I'm Dead ...
File userid-profile.tsv:
user_000639 \t m \t Mexico \t Apr 27, 2005 ...
LICENSE OF BOTH DATASETS. The data contained in both datasets is distributed with permission of Last.fm. The data is made available for non-commercial use. Those interested in using the data or web services in a commercial context should contact:
partners [at] last [dot] fm
For more information see Last.fm terms of service
ACKNOWLEDGEMENTS. Thanks to Last.fm for providing the access to this data via their web services. Special thanks to Norman Casagrande.
REFERENCES. When using this dataset you must reference the Last.fm webpage. Optionally (not mandatory at all!), you can cite Chapter 3 of this book:
@book{Celma:Springer2010, author = {Celma, O.}, title = {{Music Recommendation and Discovery in the Long Tail}}, publisher = {Springer}, year = {2010} }
CONTACT: This data was collected by Òscar Celma @ MTG/UPF
https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data444https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data444
Music Recommendation Dataset (KGRec-music). Number of items: 8,640. Number of users: 5,199. Number of items-users interactions: 751,531. All the data comes from songfacts.com and last.fm websites. Items are songs, which are described in terms of textual description extracted from songfacts.com, and tags from last.fm. Files and folders in the dataset: /descriptions: In this folder there is one file per item with the textual description of the item. The name of the file is the id of the item plus the ".txt" extension. /tags: In this folder there is one file per item with the tags of the item separated by spaces. Multiword tags are separated by -. The name of the file is the id of the item plus the ".txt" extension. Not all items have tags, there are 401 items without tags. implicit_lf_dataset.txt: This file contains the interactions between users and items. There is one line per interaction (a user that downloaded a sound in this case) with the following format, fields in one line are separated by tabs: user_id /t sound_id /t 1 /n. Sound Recommendation Dataset (KGRec-sound). Number of items: 21,552. Number of users: 20,000. Number of items-users interactions: 2,117,698. All the data comes from Freesound.org. Items are sounds, which are described in terms of textual description and tags created by the sound creator at uploading time. Files and folders in the dataset: /descriptions: In this folder there is one file per item with the textual description of the item. The name of the file is the id of the item plus the ".txt" extension. /tags: In this folder there is one file per item with the tags of the item separated by spaces. The name of the file is the id of the item plus the ".txt" extension. downloads_fs_dataset.txt: This file contains the interactions between users and items. There is one line per interaction (a user that downloaded a sound in this case) with the following format, fields in one line are separated by tabs: /nuser_id /t sound_id /t 1 /n. Two different datasets with users, items, implicit feedback interactions between users and items, item tags, and item text descriptions are provided, one for Music Recommendation (KGRec-music), and other for Sound Recommendation (KGRec-sound).
This dataset is a summarized, sanitized subset of the one released at The 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011), currently hosted at the GroupLens website (here).
Sanitization included: (a) artist name mispelling correction and standardization; (b) reassignment of artists referenced with two or more artist id's; (c) removal of artists listed as 'unknown' or through their website addresses.
The original dataset contains a larger number of files, including tag-related information, in addition to users, artists and scrobble counts. last.fm was contacted by the author and asked for some recent version of this content, in similar format, with no return until June 15th, 2020.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
The nowplaying-RS dataset features context- and content features of listening events. It contains 11.6 million music listening events of 139K users and 346K tracks collected from Twitter. The dataset comes with a rich set of item content features and user context features, as well as timestamps of the listening events. Moreover, some of the user context features imply the cultural origin of the users, and some others - like hashtags - give clues to the emotional state of a user underlying a listening event.
user_track_hashtag_timestamp.csv contains basic information about each listening event. For each listening event, we provide an id, the user_id, track_id, hashtag, created_at
context_content_features.csv contains all context and content features. For each listening event, we provide the id of the event, user_id, track_id, artist_id, content features regarding the track mentioned in the event (instrumentalness, liveness, speechiness, danceability, valence, loudness, tempo, acousticness, energy, mode, key) and context features regarding the listening event (coordinates (as geoJSON), place (as geoJSON), geo (as geoJSON), tweet_language, created_at, user_lang, time_zone, entities contained in the tweet).
sentiment_values.csv contains sentiment information for hashtags. It contains the hashtag itself and the sentiment values gathered via four different sentiment dictionaries: AFINN, Opinion Lexicon, Sentistrength Lexicon and vader. For each of these dictionaries we list the minimum, maximum, sum and average of all sentiments of the tokens of the hashtag (if available, else we list empty values). However, as most hashtags only consist of a single token, these values are equal in most cases. Please note that the lexica are rather diverse and therefore, are able to resolve very different terms against a score. Hence, the resulting csv is rather sparse.
@inproceedings{smc18, title = {#nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music Recommender Systems}, author = {Asmita Poddar and Eva Zangerle and Yi-Hsuan Yang}, url = {http://mac.citi.sinica.edu.tw/~yang/pub/poddar18smc.pdf}, year = {2018}, date = {2018-07-04}, booktitle = {Proceedings of the 15th Sound & Music Computing Conference}, address = {Limassol, Cyprus}, note = {code at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM}, tppubtype = {inproceedings} }
By incorporating mood related hashtags and timestamps in a neural network to predict the emotion variation of the user based on the track they are playing, can this improve the next song recommendation model?
Spotify Million Playlist Dataset Challenge
Summary
The Spotify Million Playlist Dataset Challenge consists of a dataset and evaluation to enable research in music recommendations. It is a continuation of the RecSys Challenge 2018, which ran from January to July 2018. The dataset contains 1,000,000 playlists, including playlist titles and track titles, created by users on the Spotify platform between January 2010 and October 2017. The evaluation task is automatic playlist continuation: given a seed playlist title and/or initial set of tracks in a playlist, to predict the subsequent tracks in that playlist. This is an open-ended challenge intended to encourage research in music recommendations, and no prizes will be awarded (other than bragging rights).
Background
Playlists like Today’s Top Hits and RapCaviar have millions of loyal followers, while Discover Weekly and Daily Mix are just a couple of our personalized playlists made especially to match your unique musical tastes.
Our users love playlists too. In fact, the Digital Music Alliance, in their 2018 Annual Music Report, state that 54% of consumers say that playlists are replacing albums in their listening habits.
But our users don’t love just listening to playlists, they also love creating them. To date, over 4 billion playlists have been created and shared by Spotify users. People create playlists for all sorts of reasons: some playlists group together music categorically (e.g., by genre, artist, year, or city), by mood, theme, or occasion (e.g., romantic, sad, holiday), or for a particular purpose (e.g., focus, workout). Some playlists are even made to land a dream job, or to send a message to someone special.
The other thing we love here at Spotify is playlist research. By learning from the playlists that people create, we can learn all sorts of things about the deep relationship between people and music. Why do certain songs go together? What is the difference between “Beach Vibes” and “Forest Vibes”? And what words do people use to describe which playlists?
By learning more about nature of playlists, we may also be able to suggest other tracks that a listener would enjoy in the context of a given playlist. This can make playlist creation easier, and ultimately help people find more of the music they love.
Dataset
To enable this type of research at scale, in 2018 we sponsored the RecSys Challenge 2018, which introduced the Million Playlist Dataset (MPD) to the research community. Sampled from the over 4 billion public playlists on Spotify, this dataset of 1 million playlists consist of over 2 million unique tracks by nearly 300,000 artists, and represents the largest public dataset of music playlists in the world. The dataset includes public playlists created by US Spotify users between January 2010 and November 2017. The challenge ran from January to July 2018, and received 1,467 submissions from 410 teams. A summary of the challenge and the top scoring submissions was published in the ACM Transactions on Intelligent Systems and Technology.
In September 2020, we re-released the dataset as an open-ended challenge on AIcrowd.com. The dataset can now be downloaded by registered participants from the Resources page.
Each playlist in the MPD contains a playlist title, the track list (including track IDs and metadata), and other metadata fields (last edit time, number of playlist edits, and more). All data is anonymized to protect user privacy. Playlists are sampled with some randomization, are manually filtered for playlist quality and to remove offensive content, and have some dithering and fictitious tracks added to them. As such, the dataset is not representative of the true distribution of playlists on the Spotify platform, and must not be interpreted as such in any research or analysis performed on the dataset.
Dataset Contains
1000 examples of each scenario:
Title only (no tracks) Title and first track Title and first 5 tracks First 5 tracks only Title and first 10 tracks First 10 tracks only Title and first 25 tracks Title and 25 random tracks Title and first 100 tracks Title and 100 random tracks
Download Link
Full Details: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge
Download Link: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge/dataset_files
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Questionnaire response data set Here, we include the data retrieved from participants at Eurosonic Noorderslag 2023, as described in the paper cited above. When using, analyzing, or publishing this data in any way, please make sure to attribute it to the authors and cite it accordingly.
We include the data in .xlsx, .csv format (semicolon-separated, and .tsv format (tab-separated). We suggest using the Excel file, as its layout makes it more easily readable.
The complete question list as used in the questionnaire is published separately on https://doi.org/10.5281/zenodo.8121151.
Paper title Looking at the FAccTs: Exploring Music Industry Professionals’ Perspectives on Music Streaming Services and Recommendations
Paper abstract Music recommender systems, commonly integrated into streaming services, help listeners find music. Previous research on such systems has focused on providing the best possible recommendations for these services' consumers, as well as on fairness for artists who release their music on streaming services. While those insights are imperative, another group of stakeholders has been omitted so far: the many other professionals working in the music industry. They, too, are (in)directly affected by music streaming services. Therefore, this work explores the perspective of music industry professionals. We present a study that addresses the role of streaming services and recommender systems in their jobs. Results indicate this role is significant. Furthermore, participants feel that music recommender systems lack transparency and are insufficiently controllable, for both customers and artists. Finally, participants desire that music streaming services take charge of increasing recommendation diversity, and variety in consumers' listening behavior and taste.
Citation Karlijn Dinnissen, Isabella Saccardi, Marloes Vredenborg, and Christine Bauer. 2023. Looking at the FAccTs: Exploring Music Industry Professionals’ Perspectives on Music Streaming Services and Recommendations. In 2nd International Conference of the ACM Greek SIGCHI Chapter (CHIGREECE 2023), September 27–28, 2023, Athens, Greece. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3609987.3610011
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
tezbytes/music-recommender-data dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a dump of the #nowplaying dataset which contains so-called listening events of users who publish the music they are currently listening to on Twitter. In particular, this dataset includes tracks which have been tweeted using the hashtags #nowplaying, #listento or #listeningto. In this dataset, we provide the track and artist of a listening event and metadata on the tweet (date sent, user, source). Furthermore, we provide a mapping of tracks to its respective Musicbrainz identifiers. The dataset features a total of 126 mio listening events.
This archive contains the nowplaying.csv file, the main file which contains the following fields:
In case you make use of our dataset in a scientific setting, we kindly ask you to cite the following paper:
Eva Zangerle, Martin Pichl, Wolfgang Gassler, and Günther Specht. 2014. #nowplaying Music Dataset: Extracting Listening Behavior from Twitter. In Proceedings of the First International Workshop on Internet-Scale Multimedia Management (WISMM '14). ACM, New York, NY, USA, 21-26.
If you have any questions or suggestions regarding the dataset, please do not hesitate to contact Eva Zangerle (eva.zangerle@uibk.ac.at).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Aryan Mahawar
Released under Apache 2.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The aim of the paper "Beyond the Big Five Personality Traits for Music Recommendation Systems" is to investigate the influence of personality traits, characterized by the BFI (Big Five Inventory) and its significant revision called BFI-2, on music recommendation error. The BFI-2 describes the lower-order facets of the Big Five personality traits. We performed experiments with 279 participants, using an application (called Music Master) we developed for music listening and ranking, and for collecting personality profiles of the users. Additionally, 29-dimensional vectors of audio features were extracted to describe the music files.
In our paper, we used this data set to test several hypotheses about the influence of personality traits and the audio features on music recommendation error. The experiments have showed that every combination of Big-Five personality traits produces worse results than using lower-order personality facets. Additionally, we found a small subset of personality facets that yielded the lowest recommendation error. This finding allows condensing the personality questionnaire to only the most essential questions.
The EXCEL file contains 5278 entries created for 279 participants. Each entry includes the preferences (expressed using the 5-point Likert scale) that refer to listening to music's cognitive aspect are denoted as Q1. The motivational and interpersonal aspects are denoted as Q2 and Q3, respectively. The following 20 variables (columns) contain 20 dimensional, extended Big Five personality traits values. The last 29 columns contain the values of low-level audio features, including emotions extracted from the audio files. The EXCEL file is ready to be saved in CSV and imported into memory using a suitable programming language (e.g. Python, R, Java, Matlab and others) for further processing, i.e. for creating user-item matrixes for collaborating filtering and evaluating its performance with the usage of proposed new rating types (motivational and interpersonal ones) described the article.
The usage of the data set requires citing the paper.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Training dataset for music recommendation system. The last 30 columns represent the labels, where:1 = liked and saved the song0.6 = liked but didn't save the song 0 = didn't like the song
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We publicly release the anonymized user_features.csv and playlist_features.csv datasets, from the music streaming platform Deezer, as described in the article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" published in the proceedings of the 14th ACM Conference on Recommender Systems (RecSys 2020). The paper is available here.
These datasets are used in the GitHub repository deezer/carousel_bandits to reproduce experiments from the article.
Please cite our paper if you use our code or data in your work.
Dataset Description
This dataset is ideal for training a recommendation system that incorporates time and country information.
Task Summary
A recommender system, or a recommendation system, is a subclass of information filtering system that provides suggestions for items that are most pertinent to a particular user. Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may… See the full description on the dataset page: https://huggingface.co/datasets/matthewfranglen/lastfm-1k.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
What does this dataset contain?
This dataset contains over 700 million time-stamped listening events collected from 3.4M anonymised users on the music streaming service Deezer, occurred between March and August 2022. It includes 50k anonymised songs, among the most popular ones on the service as well as their pre-trained embedding vectors, calculated by our internal model. All files are in parquet format which could be read by using pandas.read_parquet
function.
What could this dataset be used for?
This dataset could be used for collaborative filtering as well as sequential recommendation (including both next-item and next-session recommendations).
Citation
If you use this dataset, please cite following paper:
@inproceedings{tran-recsys2024, title={Transformers Meet ACT-R: Repeat-Aware and Sequential Listening Session Recommendation}, author={Viet-Anh Tran, Guillaume Salha-Galvan, Bruno Sguerra and Romain Hennequin}, booktitle = {Proceedings of the 18th ACM Conference on Recommender Systems}, year = {2024} }