62 datasets found

Deezer listening events dataset
zenodo.org
application/gzip
Updated Oct 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Viet Anh TRAN; Viet Anh TRAN (2024). Deezer listening events dataset [Dataset]. http://doi.org/10.5281/zenodo.13890194
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13890194
Dataset updated
Oct 4, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Viet Anh TRAN; Viet Anh TRAN
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
What does this dataset contain?

This dataset contains over 700 million time-stamped listening events collected from 3.4M anonymised users on the music streaming service Deezer, occurred between March and August 2022. It includes 50k anonymised songs, among the most popular ones on the service as well as their pre-trained embedding vectors, calculated by our internal model. All files are in parquet format which could be read by using pandas.read_parquet function.

What could this dataset be used for?

This dataset could be used for collaborative filtering as well as sequential recommendation (including both next-item and next-session recommendations).

Citation

If you use this dataset, please cite following paper:

@inproceedings{tran-recsys2024, title={Transformers Meet ACT-R: Repeat-Aware and Sequential Listening Session Recommendation}, author={Viet-Anh Tran, Guillaume Salha-Galvan, Bruno Sguerra and Romain Hennequin}, booktitle = {Proceedings of the 18th ACM Conference on Recommender Systems}, year = {2024} }
Z
#nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music...
data.niaid.nih.gov
zenodo.org
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eva Zangerle (2024). #nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music Recommender Systems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1318037
Explore at:
Dataset updated
Jul 22, 2024
Dataset provided by
Asmita Poddar
Yi-Hsuan Yang
Eva Zangerle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Music recommender systems can offer users personalized and contextualized recommendation and are therefore important for music information retrieval. An increasing number of datasets have been compiled to facilitate research on different topics, such as content-based, context-based or next-song recommendation. However, these topics are usually addressed separately using different datasets, due to the lack of a unified dataset that contains a large variety of feature types such as item features, user contexts, and timestamps. To address this issue, we propose a large-scale benchmark dataset called #nowplaying-RS, which contains 11.6 million music listening events (LEs) of 139K users and 346K tracks collected from Twitter. The dataset comes with a rich set of item content features and user context features, and the timestamps of the LEs. Moreover, some of the user context features imply the cultural origin of the users, and some others—like hashtags—give clues to the emotional state of a user underlying an LE. In this paper, we provide some statistics to give insight into the dataset, and some directions in which the dataset can be used for making music recommendation. We also provide standardized training and test sets for experimentation, and some baseline results obtained by using factorization machines.

The dataset contains three files:

user_track_hashtag_timestamp.csv contains basic information about each listening event. For each listening event, we provide an id, the user_id, track_id, hashtag, created_at

context_content_features.csv: contains all context and content features. For each listening event, we provide the id of the event, user_id, track_id, artist_id, content features regarding the track mentioned in the event (instrumentalness, liveness, speechiness, danceability, valence, loudness, tempo, acousticness, energy, mode, key) and context features regarding the listening event (coordinates (as geoJSON), place (as geoJSON), geo (as geoJSON), tweet_language, created_at, user_lang, time_zone, entities contained in the tweet).

sentiment_values.csv contains sentiment information for hashtags. It contains the hashtag itself and the sentiment values gathered via four different sentiment dictionaries: AFINN, Opinion Lexicon, Sentistrength Lexicon and vader. For each of these dictionaries we list the minimum, maximum, sum and average of all sentiments of the tokens of the hashtag (if available, else we list empty values). However, as most hashtags only consist of a single token, these values are equal in most cases. Please note that the lexica are rather diverse and therefore, are able to resolve very different terms against a score. Hence, the resulting csv is rather sparse. The file contains the following comma-separated values: , where we abbreviate all scores gathered over the Opinion Lexicon with the prefix 'ol'. Similarly, 'ss' stands for SentiStrength.

Please also find the training and test-splits for the dataset in this repo. Also, prototypical implementations of a context-aware recommender system based on the dataset can be found at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM.

If you make use of this dataset, please cite the following paper where we describe and experiment with the dataset:

@inproceedings{smc18, title = {#nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music Recommender Systems}, author = {Asmita Poddar and Eva Zangerle and Yi-Hsuan Yang}, url = {http://mac.citi.sinica.edu.tw/~yang/pub/poddar18smc.pdf}, year = {2018}, date = {2018-07-04}, booktitle = {Proceedings of the 15th Sound & Music Computing Conference}, address = {Limassol, Cyprus}, note = {code at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM}, tppubtype = {inproceedings} }
u
PDMX
cseweb.ucsd.edu
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, PDMX [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
We introduce PDMX: a Public Domain MusicXML dataset for symbolic music processing, including over 250k musical scores in MusicXML format. PDMX is the largest publicly available, copyright-free MusicXML dataset in existence. PDMX includes genre, tag, description, and popularity metadata for every file.
H
Movie and Music recommendation dataset and model codes
dataverse.harvard.edu
Updated Feb 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NA Yang (2020). Movie and Music recommendation dataset and model codes [Dataset]. http://doi.org/10.7910/DVN/A5TLOZ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/A5TLOZ
Dataset updated
Feb 29, 2020
Dataset provided by
Harvard Dataverse
Authors
NA Yang
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
These datasets include Douban movies and NetEase songs with attributes such as actors, directors, singers, albums and so on. Furthermore, the source code of ACAM model is also provided, which is a feature-level co-attention based recommendation model.
4
Dataset: A Music Recommender System for Constructed Music Evoked Episodic...
data.4tu.nl
zip
Updated Apr 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Raingeard de la Blétière; M.A. (Mark) Neerincx; Rebecca Schaefer; Catharine Oertel (2025). Dataset: A Music Recommender System for Constructed Music Evoked Episodic Memories (CMEEMs)- non-personal data [Dataset]. http://doi.org/10.4121/39b8137d-301e-4fa2-817f-bbd4b791ea30.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/39b8137d-301e-4fa2-817f-bbd4b791ea30.v1
Dataset updated
Apr 8, 2025
Dataset provided by
4TU.ResearchData
Authors
Paul Raingeard de la Blétière; M.A. (Mark) Neerincx; Rebecca Schaefer; Catharine Oertel
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Time period covered
Aug 2024 - Dec 2024
Description
This dataset contains data collected during an experiment at Delft University of Technology, as part of Paul Raingeard de la Bletiere PhD Thesis project.
It is being made public both to act as supplementary data for publications and the PhD thesis of Paul Raingeard de la Bletiere and in order for other researchers to use this data in their own work.
The data in this dataset was collected through a website accessed by participants between August 2024 and December 2024.
This research project was made possible by a grant from the Dutch Research Council (NWO) (Grant Number KICH1.GZ02.20.008). Additional support from Alzheimer Nederland is gratefully acknowledged.
The purpose of this experiment was to test a music recommender system linking music with specific episodic memories chosen by participants, through a discussion with a virtual agent. This specific part of the data relates to the ratings of recommendations by participants.
Z
MSD-A: Million Song Dataset for Artists
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergio Oramas (2020). MSD-A: Million Song Dataset for Artists [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_831347
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Sergio Oramas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The MSD-A is a dataset related to the Million Song Dataset (MSD). It is a collection of artist tags and biographies gathered from Last.fm for all the artists that have songs in the MSD. In addition, the MSD Taste Profile (recommendation dataset) is adapted to artists.

We provide the biographies, tags, data splits, and feature embeddings to reproduce the experiments from the paper:

Oramas S., Nieto O., Sordo M., & Serra X. (2017) A Deep Multimodal Approach for Cold-start Music Recommendation. https://arxiv.org/abs/1706.09739

Source code is available at https://github.com/sergiooramas/tartarus

The file dlrs-data.tar.gz in this zenodo version is corrupted. You can download the good file in this link:

https://drive.google.com/open?id=0B-oq_x72w8NUbUpkMzZSc1JPd28
h
yambda
huggingface.co
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yandex (2025). yambda [Dataset]. https://huggingface.co/datasets/yandex/yambda
Explore at:
Dataset updated
Jun 6, 2025
Dataset authored and provided by
Yandex
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Yambda-5B — A Large-Scale Multi-modal Dataset for Ranking And Retrieval

Industrial-scale music recommendation dataset with organic/recommendation interactions and audio embeddings 📌 Overview • 🔑 Key Features • 📊 Statistics • 📝 Format • 🏆 Benchmark • ⬇️ Download • ❓ FAQ

Overview

The Yambda-5B dataset is a large-scale open database comprising 4.79 billion user-item interactions collected from 1 million users and spanning 9.39 million tracks. The dataset includes… See the full description on the dataset page: https://huggingface.co/datasets/yandex/yambda.
Z
lastfm Music Recommendation Dataset
data.niaid.nih.gov
zenodo.org
Updated Feb 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Òscar Celma (2022). lastfm Music Recommendation Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6090213
Explore at:
Dataset updated
Feb 15, 2022
Dataset authored and provided by
Òscar Celma
Description
This is a common Zenodo repository for both lastfm-360K and lastfm-1K datasets. See below the details of both datasets, including license, acknowledgements, contact, and instructions to cite.

LASTFM-360K (version 1.2, March 2010).

What is this? This dataset contains tuples (for ~360,000 users) collected from Last.fm API, using the user.getTopArtists() method.

Files:

usersha1-artmbid-artname-plays.tsv (MD5: be672526eb7c69495c27ad27803148f1)

usersha1-profile.tsv (MD5: 51159d4edf6a92cb96f87768aa2be678)

mbox_sha1sum.py (MD5: feb3485eace85f3ba62e324839e6ab39)

Data Statistics:

File usersha1-artmbid-artname-plays.tsv:

Total Lines: 17,559,530

Unique Users: 359,347

Artists with MBID: 186,642

Artists without MBID: 107,373

Data Format: The data is formatted one entry per line as follows (tab separated "\t"):

File usersha1-artmbid-artname-plays.tsv:

user-mboxsha1 \t musicbrainz-artist-id \t artist-name \t plays

File usersha1-profile.tsv:

user-mboxsha1 \t gender (m|f|empty) \t age (int|empty) \t country (str|empty) \t signup (date|empty)

Example:

File usersha1-artmbid-artname-plays.tsv:

000063d3fe1cf2ba248b9e3c3f0334845a27a6be \t a3cb23fc-acd3-4ce0-8f36-1e5aa6a18432 \t u2 \t 31 ...

File usersha1-profile.tsv:

000063d3fe1cf2ba248b9e3c3f0334845a27a6be \t m \t 19 \t Mexico \t Apr 28, 2008 ...

LASTFM-1K (version 1.0, March 2010).

What is this? This dataset contains tuples collected from Last.fm API, using the user.getRecentTracks() method. This dataset represents the whole listening habits (till May, 5th 2009) for nearly 1,000 users.

Files:

userid-timestamp-artid-artname-traid-traname.tsv (MD5: 64747b21563e3d2aa95751e0ddc46b68)

userid-profile.tsv (MD5: c53608b6b445db201098c1489ea497df)

Data Statistics:

File userid-timestamp-artid-artname-traid-traname.tsv:

Total Lines: 19,150,868

Unique Users: 992

Artists with MBID: 107,528

Artists without MBDID: 69,420

Data Format: The data is formatted one entry per line as follows (tab separated, "\t"):

File userid-timestamp-artid-artname-traid-traname.tsv:

userid \t timestamp \t musicbrainz-artist-id \t artist-name \t musicbrainz-track-id \t track-name

File userid-profile.tsv:

userid \t gender ('m'|'f'|empty) \t age (int|empty) \t country (str|empty) \t signup (date|empty)

Example:

File userid-timestamp-artid-artname-traid-traname.tsv:

user_000639 \t 2009-04-08T01:57:47Z \t MBID \t The Dogs D'Amour \t MBID \t Fall in Love Again? user_000639 \t 2009-04-08T01:53:56Z \t MBID \t The Dogs D'Amour \t MBID \t Wait Until I'm Dead ...

File userid-profile.tsv:

user_000639 \t m \t Mexico \t Apr 27, 2005 ...

LICENSE OF BOTH DATASETS. The data contained in both datasets is distributed with permission of Last.fm. The data is made available for non-commercial use. Those interested in using the data or web services in a commercial context should contact:

partners [at] last [dot] fm

For more information see Last.fm terms of service

ACKNOWLEDGEMENTS. Thanks to Last.fm for providing the access to this data via their web services. Special thanks to Norman Casagrande.

REFERENCES. When using this dataset you must reference the Last.fm webpage. Optionally (not mandatory at all!), you can cite Chapter 3 of this book:

@book{Celma:Springer2010, author = {Celma, O.}, title = {{Music Recommendation and Discovery in the Long Tail}}, publisher = {Springer}, year = {2010} }

CONTACT: This data was collected by Òscar Celma @ MTG/UPF
C
Data from: Sound and music recommendation with knowledge graphs [dataset]
dataverse.csuc.cat
txt, zip
Updated Oct 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergio Oramas; Sergio Oramas; Vito Claudio Ostuni; Gabriel Vigliensoni; Gabriel Vigliensoni; Vito Claudio Ostuni (2023). Sound and music recommendation with knowledge graphs [dataset] [Dataset]. http://doi.org/10.34810/data444
Explore at:
txt(3751), zip(56553416)Available download formats
Unique identifier
https://doi.org/10.34810/data444
Dataset updated
Oct 9, 2023
Dataset provided by
CORA.Repositori de Dades de Recerca
Authors
Sergio Oramas; Sergio Oramas; Vito Claudio Ostuni; Gabriel Vigliensoni; Gabriel Vigliensoni; Vito Claudio Ostuni
License
https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data444https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data444
Description
Music Recommendation Dataset (KGRec-music). Number of items: 8,640. Number of users: 5,199. Number of items-users interactions: 751,531. All the data comes from songfacts.com and last.fm websites. Items are songs, which are described in terms of textual description extracted from songfacts.com, and tags from last.fm. Files and folders in the dataset: /descriptions: In this folder there is one file per item with the textual description of the item. The name of the file is the id of the item plus the ".txt" extension. /tags: In this folder there is one file per item with the tags of the item separated by spaces. Multiword tags are separated by -. The name of the file is the id of the item plus the ".txt" extension. Not all items have tags, there are 401 items without tags. implicit_lf_dataset.txt: This file contains the interactions between users and items. There is one line per interaction (a user that downloaded a sound in this case) with the following format, fields in one line are separated by tabs: user_id /t sound_id /t 1 /n. Sound Recommendation Dataset (KGRec-sound). Number of items: 21,552. Number of users: 20,000. Number of items-users interactions: 2,117,698. All the data comes from Freesound.org. Items are sounds, which are described in terms of textual description and tags created by the sound creator at uploading time. Files and folders in the dataset: /descriptions: In this folder there is one file per item with the textual description of the item. The name of the file is the id of the item plus the ".txt" extension. /tags: In this folder there is one file per item with the tags of the item separated by spaces. The name of the file is the id of the item plus the ".txt" extension. downloads_fs_dataset.txt: This file contains the interactions between users and items. There is one line per interaction (a user that downloaded a sound in this case) with the following format, fields in one line are separated by tabs: /nuser_id /t sound_id /t 1 /n. Two different datasets with users, items, implicit feedback interactions between users and items, item tags, and item text descriptions are provided, one for Music Recommendation (KGRec-music), and other for Sound Recommendation (KGRec-sound).
last.fm Music Artist Scrobbles
kaggle.com
Updated Jun 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paulo Breviglieri (2020). last.fm Music Artist Scrobbles [Dataset]. https://www.kaggle.com/pcbreviglieri/lastfm-music-artist-scrobbles/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 15, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Paulo Breviglieri
Description
This dataset is a summarized, sanitized subset of the one released at The 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011), currently hosted at the GroupLens website (here).

Sanitization included: (a) artist name mispelling correction and standardization; (b) reassignment of artists referenced with two or more artist id's; (c) removal of artists listed as 'unknown' or through their website addresses.

The original dataset contains a larger number of files, including tag-related information, in addition to users, artists and scrobble counts. last.fm was contacted by the author and asked for some recent version of this content, in similar format, with no return until June 15th, 2020.
#nowplaying-rs
kaggle.com
explore.openaire.eu
+2more
zip
Updated May 11, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chelsea Power (2020). #nowplaying-rs [Dataset]. https://www.kaggle.com/chelseapower/nowplayingrs
Explore at:
zip(1324201132 bytes)Available download formats
Dataset updated
May 11, 2020
Authors
Chelsea Power
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
Context

The nowplaying-RS dataset features context- and content features of listening events. It contains 11.6 million music listening events of 139K users and 346K tracks collected from Twitter. The dataset comes with a rich set of item content features and user context features, as well as timestamps of the listening events. Moreover, some of the user context features imply the cultural origin of the users, and some others - like hashtags - give clues to the emotional state of a user underlying a listening event.

Content

user_track_hashtag_timestamp.csv contains basic information about each listening event. For each listening event, we provide an id, the user_id, track_id, hashtag, created_at

context_content_features.csv contains all context and content features. For each listening event, we provide the id of the event, user_id, track_id, artist_id, content features regarding the track mentioned in the event (instrumentalness, liveness, speechiness, danceability, valence, loudness, tempo, acousticness, energy, mode, key) and context features regarding the listening event (coordinates (as geoJSON), place (as geoJSON), geo (as geoJSON), tweet_language, created_at, user_lang, time_zone, entities contained in the tweet).

sentiment_values.csv contains sentiment information for hashtags. It contains the hashtag itself and the sentiment values gathered via four different sentiment dictionaries: AFINN, Opinion Lexicon, Sentistrength Lexicon and vader. For each of these dictionaries we list the minimum, maximum, sum and average of all sentiments of the tokens of the hashtag (if available, else we list empty values). However, as most hashtags only consist of a single token, these values are equal in most cases. Please note that the lexica are rather diverse and therefore, are able to resolve very different terms against a score. Hence, the resulting csv is rather sparse.

Acknowledgements

@inproceedings{smc18, title = {#nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music Recommender Systems}, author = {Asmita Poddar and Eva Zangerle and Yi-Hsuan Yang}, url = {http://mac.citi.sinica.edu.tw/~yang/pub/poddar18smc.pdf}, year = {2018}, date = {2018-07-04}, booktitle = {Proceedings of the 15th Sound & Music Computing Conference}, address = {Limassol, Cyprus}, note = {code at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM}, tppubtype = {inproceedings} }

Inspiration

By incorporating mood related hashtags and timestamps in a neural network to predict the emotion variation of the user based on the track they are playing, can this improve the next song recommendation model?
Spotify Million Playlist: Recsys Challenge 2018 Dataset
zenodo.org
explore.openaire.eu
+1more
Updated Apr 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AIcrowd; AIcrowd (2022). Spotify Million Playlist: Recsys Challenge 2018 Dataset [Dataset]. http://doi.org/10.5281/zenodo.6425593
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6425593
Dataset updated
Apr 9, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
AIcrowd; AIcrowd
Description
Spotify Million Playlist Dataset Challenge

Summary

The Spotify Million Playlist Dataset Challenge consists of a dataset and evaluation to enable research in music recommendations. It is a continuation of the RecSys Challenge 2018, which ran from January to July 2018. The dataset contains 1,000,000 playlists, including playlist titles and track titles, created by users on the Spotify platform between January 2010 and October 2017. The evaluation task is automatic playlist continuation: given a seed playlist title and/or initial set of tracks in a playlist, to predict the subsequent tracks in that playlist. This is an open-ended challenge intended to encourage research in music recommendations, and no prizes will be awarded (other than bragging rights).

Background

Playlists like Today’s Top Hits and RapCaviar have millions of loyal followers, while Discover Weekly and Daily Mix are just a couple of our personalized playlists made especially to match your unique musical tastes.

Our users love playlists too. In fact, the Digital Music Alliance, in their 2018 Annual Music Report, state that 54% of consumers say that playlists are replacing albums in their listening habits.

But our users don’t love just listening to playlists, they also love creating them. To date, over 4 billion playlists have been created and shared by Spotify users. People create playlists for all sorts of reasons: some playlists group together music categorically (e.g., by genre, artist, year, or city), by mood, theme, or occasion (e.g., romantic, sad, holiday), or for a particular purpose (e.g., focus, workout). Some playlists are even made to land a dream job, or to send a message to someone special.

The other thing we love here at Spotify is playlist research. By learning from the playlists that people create, we can learn all sorts of things about the deep relationship between people and music. Why do certain songs go together? What is the difference between “Beach Vibes” and “Forest Vibes”? And what words do people use to describe which playlists?

By learning more about nature of playlists, we may also be able to suggest other tracks that a listener would enjoy in the context of a given playlist. This can make playlist creation easier, and ultimately help people find more of the music they love.

Dataset

To enable this type of research at scale, in 2018 we sponsored the RecSys Challenge 2018, which introduced the Million Playlist Dataset (MPD) to the research community. Sampled from the over 4 billion public playlists on Spotify, this dataset of 1 million playlists consist of over 2 million unique tracks by nearly 300,000 artists, and represents the largest public dataset of music playlists in the world. The dataset includes public playlists created by US Spotify users between January 2010 and November 2017. The challenge ran from January to July 2018, and received 1,467 submissions from 410 teams. A summary of the challenge and the top scoring submissions was published in the ACM Transactions on Intelligent Systems and Technology.

In September 2020, we re-released the dataset as an open-ended challenge on AIcrowd.com. The dataset can now be downloaded by registered participants from the Resources page.

Each playlist in the MPD contains a playlist title, the track list (including track IDs and metadata), and other metadata fields (last edit time, number of playlist edits, and more). All data is anonymized to protect user privacy. Playlists are sampled with some randomization, are manually filtered for playlist quality and to remove offensive content, and have some dithering and fictitious tracks added to them. As such, the dataset is not representative of the true distribution of playlists on the Spotify platform, and must not be interpreted as such in any research or analysis performed on the dataset.

Dataset Contains

1000 examples of each scenario:

Title only (no tracks) Title and first track Title and first 5 tracks First 5 tracks only Title and first 10 tracks First 10 tracks only Title and first 25 tracks Title and 25 random tracks Title and first 100 tracks Title and 100 random tracks

Download Link

Full Details: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge
Download Link: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge/dataset_files
Z
Dataset: Music Industry Professionals' Perspectives on Music Streaming...
data.niaid.nih.gov
zenodo.org
Updated Aug 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christine Bauer (2023). Dataset: Music Industry Professionals' Perspectives on Music Streaming Services and Recommendation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8185735
Explore at:
Dataset updated
Aug 11, 2023
Dataset provided by
Karlijn Dinnissen
Marloes Vredenborg
Christine Bauer
Isabella Saccardi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Questionnaire response data set Here, we include the data retrieved from participants at Eurosonic Noorderslag 2023, as described in the paper cited above. When using, analyzing, or publishing this data in any way, please make sure to attribute it to the authors and cite it accordingly.

We include the data in .xlsx, .csv format (semicolon-separated, and .tsv format (tab-separated). We suggest using the Excel file, as its layout makes it more easily readable.

The complete question list as used in the questionnaire is published separately on https://doi.org/10.5281/zenodo.8121151.

Paper title Looking at the FAccTs: Exploring Music Industry Professionals’ Perspectives on Music Streaming Services and Recommendations

Paper abstract Music recommender systems, commonly integrated into streaming services, help listeners find music. Previous research on such systems has focused on providing the best possible recommendations for these services' consumers, as well as on fairness for artists who release their music on streaming services. While those insights are imperative, another group of stakeholders has been omitted so far: the many other professionals working in the music industry. They, too, are (in)directly affected by music streaming services. Therefore, this work explores the perspective of music industry professionals. We present a study that addresses the role of streaming services and recommender systems in their jobs. Results indicate this role is significant. Furthermore, participants feel that music recommender systems lack transparency and are insufficiently controllable, for both customers and artists. Finally, participants desire that music streaming services take charge of increasing recommendation diversity, and variety in consumers' listening behavior and taste.

Citation Karlijn Dinnissen, Isabella Saccardi, Marloes Vredenborg, and Christine Bauer. 2023. Looking at the FAccTs: Exploring Music Industry Professionals’ Perspectives on Music Streaming Services and Recommendations. In 2nd International Conference of the ACM Greek SIGCHI Chapter (CHIGREECE 2023), September 27–28, 2023, Athens, Greece. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3609987.3610011
h
music-recommender-data
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TezBytes (2025). music-recommender-data [Dataset]. https://huggingface.co/datasets/tezbytes/music-recommender-data
Explore at:
Dataset updated
Jun 1, 2025
Dataset authored and provided by
TezBytes
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
tezbytes/music-recommender-data dataset hosted on Hugging Face and contributed by the HF Datasets community
#nowplaying
zenodo.org
explore.openaire.eu
+1more
application/gzip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eva Zangerle; Eva Zangerle (2020). #nowplaying [Dataset]. http://doi.org/10.5281/zenodo.2594483
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2594483
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Eva Zangerle; Eva Zangerle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains a dump of the #nowplaying dataset which contains so-called listening events of users who publish the music they are currently listening to on Twitter. In particular, this dataset includes tracks which have been tweeted using the hashtags #nowplaying, #listento or #listeningto. In this dataset, we provide the track and artist of a listening event and metadata on the tweet (date sent, user, source). Furthermore, we provide a mapping of tracks to its respective Musicbrainz identifiers. The dataset features a total of 126 mio listening events.

This archive contains the nowplaying.csv file, the main file which contains the following fields:

user id (each user is identified by a unique hash value)

source of the tweet (how it was sent; as provided by the Twitter API)

timestamp of the time the tweet underlying the listening event was sent

track title

artist name

musicbrainz identifier of the recording (cf. https://musicbrainz.org/)

In case you make use of our dataset in a scientific setting, we kindly ask you to cite the following paper:

Eva Zangerle, Martin Pichl, Wolfgang Gassler, and Günther Specht. 2014. #nowplaying Music Dataset: Extracting Listening Behavior from Twitter. In Proceedings of the First International Workshop on Internet-Scale Multimedia Management (WISMM '14). ACM, New York, NY, USA, 21-26.

If you have any questions or suggestions regarding the dataset, please do not hesitate to contact Eva Zangerle (eva.zangerle@uibk.ac.at).
Data from: Music Recommendation
kaggle.com
Updated Jun 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aryan Mahawar (2024). Music Recommendation [Dataset]. https://www.kaggle.com/datasets/aryanmahawar/music-recommendation/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aryan Mahawar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Aryan Mahawar

Released under Apache 2.0

Contents
f
Data from: Beyond the Big Five Personality Traits for Music Recommendation...
figshare.com
xls
Updated Jan 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mariusz Kleć; Alicja Wieczorkowska; Włodzimierz Strus; Krzysztof Szklanny (2023). Beyond the Big Five Personality Traits for Music Recommendation Systems - dataset [Dataset]. http://doi.org/10.6084/m9.figshare.19678962.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19678962.v1
Dataset updated
Jan 11, 2023
Dataset provided by
figshare
Authors
Mariusz Kleć; Alicja Wieczorkowska; Włodzimierz Strus; Krzysztof Szklanny
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The aim of the paper "Beyond the Big Five Personality Traits for Music Recommendation Systems" is to investigate the influence of personality traits, characterized by the BFI (Big Five Inventory) and its significant revision called BFI-2, on music recommendation error. The BFI-2 describes the lower-order facets of the Big Five personality traits. We performed experiments with 279 participants, using an application (called Music Master) we developed for music listening and ranking, and for collecting personality profiles of the users. Additionally, 29-dimensional vectors of audio features were extracted to describe the music files.

In our paper, we used this data set to test several hypotheses about the influence of personality traits and the audio features on music recommendation error. The experiments have showed that every combination of Big-Five personality traits produces worse results than using lower-order personality facets. Additionally, we found a small subset of personality facets that yielded the lowest recommendation error. This finding allows condensing the personality questionnaire to only the most essential questions.

The EXCEL file contains 5278 entries created for 279 participants. Each entry includes the preferences (expressed using the 5-point Likert scale) that refer to listening to music's cognitive aspect are denoted as Q1. The motivational and interpersonal aspects are denoted as Q2 and Q3, respectively. The following 20 variables (columns) contain 20 dimensional, extended Big Five personality traits values. The last 29 columns contain the values of low-level audio features, including emotions extracted from the audio files. The EXCEL file is ready to be saved in CSV and imported into memory using a suitable programming language (e.g. Python, R, Java, Matlab and others) for further processing, i.e. for creating user-item matrixes for collaborating filtering and evaluating its performance with the usage of proposed new rating types (motivational and interpersonal ones) described the article.

The usage of the data set requires citing the paper.
f
Music Recomendation Systems
figshare.com
txt
Updated Jul 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adriana Huante (2020). Music Recomendation Systems [Dataset]. http://doi.org/10.6084/m9.figshare.11933157.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11933157.v1
Dataset updated
Jul 8, 2020
Dataset provided by
figshare
Authors
Adriana Huante
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Training dataset for music recommendation system. The last 30 columns represent the labels, where:1 = liked and saved the song0.6 = liked but didn't save the song 0 = didn't like the song
Z
Datasets from the RecSys 2020 article "Carousel Personalization in Music...
data.niaid.nih.gov
Updated Oct 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salha, Guillaume (2020). Datasets from the RecSys 2020 article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4048677
Explore at:
Dataset updated
Oct 12, 2020
Dataset provided by
Bontempelli, Théo
Bendada, Walid
Salha, Guillaume
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We publicly release the anonymized user_features.csv and playlist_features.csv datasets, from the music streaming platform Deezer, as described in the article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" published in the proceedings of the 14th ACM Conference on Recommender Systems (RecSys 2020). The paper is available here.

These datasets are used in the GitHub repository deezer/carousel_bandits to reproduce experiments from the article.

Please cite our paper if you use our code or data in your work.
h
lastfm-1k
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew, lastfm-1k [Dataset]. https://huggingface.co/datasets/matthewfranglen/lastfm-1k
Explore at:
Authors
Matthew
Description
Dataset Description

This dataset is ideal for training a recommendation system that incorporates time and country information.

Task Summary

A recommender system, or a recommendation system, is a subclass of information filtering system that provides suggestions for items that are most pertinent to a particular user. Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may… See the full description on the dataset page: https://huggingface.co/datasets/matthewfranglen/lastfm-1k.

Facebook

Twitter

Click to copy link

Link copied

Cite

Viet Anh TRAN; Viet Anh TRAN (2024). Deezer listening events dataset [Dataset]. http://doi.org/10.5281/zenodo.13890194

Deezer listening events dataset

Explore at:

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.13890194

Dataset updated

Oct 4, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Viet Anh TRAN; Viet Anh TRAN

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

What does this dataset contain?

This dataset contains over 700 million time-stamped listening events collected from 3.4M anonymised users on the music streaming service Deezer, occurred between March and August 2022. It includes 50k anonymised songs, among the most popular ones on the service as well as their pre-trained embedding vectors, calculated by our internal model. All files are in parquet format which could be read by using pandas.read_parquet function.

What could this dataset be used for?

This dataset could be used for collaborative filtering as well as sequential recommendation (including both next-item and next-session recommendations).

Citation

If you use this dataset, please cite following paper:

@inproceedings{tran-recsys2024,
 title={Transformers Meet ACT-R: Repeat-Aware and Sequential Listening Session Recommendation},
 author={Viet-Anh Tran, Guillaume Salha-Galvan, Bruno Sguerra and Romain Hennequin},
 booktitle = {Proceedings of the 18th ACM Conference on Recommender Systems},
 year = {2024}
}

Clear search

Close search

Google apps

Main menu

Deezer listening events dataset

#nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music...

PDMX

Movie and Music recommendation dataset and model codes

Dataset: A Music Recommender System for Constructed Music Evoked Episodic...

MSD-A: Million Song Dataset for Artists

yambda

lastfm Music Recommendation Dataset

Data from: Sound and music recommendation with knowledge graphs [dataset]

last.fm Music Artist Scrobbles

#nowplaying-rs

Context

Content

Acknowledgements

Inspiration

Spotify Million Playlist: Recsys Challenge 2018 Dataset

Dataset: Music Industry Professionals' Perspectives on Music Streaming...

music-recommender-data

#nowplaying

Data from: Music Recommendation

Dataset

Contents

Data from: Beyond the Big Five Personality Traits for Music Recommendation...

Music Recomendation Systems

Datasets from the RecSys 2020 article "Carousel Personalization in Music...

lastfm-1k

Deezer listening events dataset