62 datasets found
  1. Deezer listening events dataset

    • zenodo.org
    application/gzip
    Updated Oct 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Viet Anh TRAN; Viet Anh TRAN (2024). Deezer listening events dataset [Dataset]. http://doi.org/10.5281/zenodo.13890194
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Oct 4, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Viet Anh TRAN; Viet Anh TRAN
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    What does this dataset contain?

    This dataset contains over 700 million time-stamped listening events collected from 3.4M anonymised users on the music streaming service Deezer, occurred between March and August 2022. It includes 50k anonymised songs, among the most popular ones on the service as well as their pre-trained embedding vectors, calculated by our internal model. All files are in parquet format which could be read by using pandas.read_parquet function.

    What could this dataset be used for?

    This dataset could be used for collaborative filtering as well as sequential recommendation (including both next-item and next-session recommendations).

    Citation

    If you use this dataset, please cite following paper:

    @inproceedings{tran-recsys2024,
     title={Transformers Meet ACT-R: Repeat-Aware and Sequential Listening Session Recommendation},
     author={Viet-Anh Tran, Guillaume Salha-Galvan, Bruno Sguerra and Romain Hennequin},
     booktitle = {Proceedings of the 18th ACM Conference on Recommender Systems},
     year = {2024}
    }
  2. Z

    #nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eva Zangerle (2024). #nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music Recommender Systems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1318037
    Explore at:
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Asmita Poddar
    Yi-Hsuan Yang
    Eva Zangerle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Music recommender systems can offer users personalized and contextualized recommendation and are therefore important for music information retrieval. An increasing number of datasets have been compiled to facilitate research on different topics, such as content-based, context-based or next-song recommendation. However, these topics are usually addressed separately using different datasets, due to the lack of a unified dataset that contains a large variety of feature types such as item features, user contexts, and timestamps. To address this issue, we propose a large-scale benchmark dataset called #nowplaying-RS, which contains 11.6 million music listening events (LEs) of 139K users and 346K tracks collected from Twitter. The dataset comes with a rich set of item content features and user context features, and the timestamps of the LEs. Moreover, some of the user context features imply the cultural origin of the users, and some others—like hashtags—give clues to the emotional state of a user underlying an LE. In this paper, we provide some statistics to give insight into the dataset, and some directions in which the dataset can be used for making music recommendation. We also provide standardized training and test sets for experimentation, and some baseline results obtained by using factorization machines.

    The dataset contains three files:

    user_track_hashtag_timestamp.csv contains basic information about each listening event. For each listening event, we provide an id, the user_id, track_id, hashtag, created_at

    context_content_features.csv: contains all context and content features. For each listening event, we provide the id of the event, user_id, track_id, artist_id, content features regarding the track mentioned in the event (instrumentalness, liveness, speechiness, danceability, valence, loudness, tempo, acousticness, energy, mode, key) and context features regarding the listening event (coordinates (as geoJSON), place (as geoJSON), geo (as geoJSON), tweet_language, created_at, user_lang, time_zone, entities contained in the tweet).

    sentiment_values.csv contains sentiment information for hashtags. It contains the hashtag itself and the sentiment values gathered via four different sentiment dictionaries: AFINN, Opinion Lexicon, Sentistrength Lexicon and vader. For each of these dictionaries we list the minimum, maximum, sum and average of all sentiments of the tokens of the hashtag (if available, else we list empty values). However, as most hashtags only consist of a single token, these values are equal in most cases. Please note that the lexica are rather diverse and therefore, are able to resolve very different terms against a score. Hence, the resulting csv is rather sparse. The file contains the following comma-separated values: , where we abbreviate all scores gathered over the Opinion Lexicon with the prefix 'ol'. Similarly, 'ss' stands for SentiStrength.

    Please also find the training and test-splits for the dataset in this repo. Also, prototypical implementations of a context-aware recommender system based on the dataset can be found at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM.

    If you make use of this dataset, please cite the following paper where we describe and experiment with the dataset:

    @inproceedings{smc18, title = {#nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music Recommender Systems}, author = {Asmita Poddar and Eva Zangerle and Yi-Hsuan Yang}, url = {http://mac.citi.sinica.edu.tw/~yang/pub/poddar18smc.pdf}, year = {2018}, date = {2018-07-04}, booktitle = {Proceedings of the 15th Sound & Music Computing Conference}, address = {Limassol, Cyprus}, note = {code at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM}, tppubtype = {inproceedings} }

  3. u

    PDMX

    • cseweb.ucsd.edu
    json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, PDMX [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
    Explore at:
    jsonAvailable download formats
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    We introduce PDMX: a Public Domain MusicXML dataset for symbolic music processing, including over 250k musical scores in MusicXML format. PDMX is the largest publicly available, copyright-free MusicXML dataset in existence. PDMX includes genre, tag, description, and popularity metadata for every file.

  4. H

    Movie and Music recommendation dataset and model codes

    • dataverse.harvard.edu
    Updated Feb 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NA Yang (2020). Movie and Music recommendation dataset and model codes [Dataset]. http://doi.org/10.7910/DVN/A5TLOZ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 29, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    NA Yang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    These datasets include Douban movies and NetEase songs with attributes such as actors, directors, singers, albums and so on. Furthermore, the source code of ACAM model is also provided, which is a feature-level co-attention based recommendation model.

  5. 4

    Dataset: A Music Recommender System for Constructed Music Evoked Episodic...

    • data.4tu.nl
    zip
    Updated Apr 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Raingeard de la Blétière; M.A. (Mark) Neerincx; Rebecca Schaefer; Catharine Oertel (2025). Dataset: A Music Recommender System for Constructed Music Evoked Episodic Memories (CMEEMs)- non-personal data [Dataset]. http://doi.org/10.4121/39b8137d-301e-4fa2-817f-bbd4b791ea30.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    4TU.ResearchData
    Authors
    Paul Raingeard de la Blétière; M.A. (Mark) Neerincx; Rebecca Schaefer; Catharine Oertel
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Time period covered
    Aug 2024 - Dec 2024
    Description

    This dataset contains data collected during an experiment at Delft University of Technology, as part of Paul Raingeard de la Bletiere PhD Thesis project.

    It is being made public both to act as supplementary data for publications and the PhD thesis of Paul Raingeard de la Bletiere and in order for other researchers to use this data in their own work.

    The data in this dataset was collected through a website accessed by participants between August 2024 and December 2024.

    This research project was made possible by a grant from the Dutch Research Council (NWO) (Grant Number KICH1.GZ02.20.008). Additional support from Alzheimer Nederland is gratefully acknowledged.

    The purpose of this experiment was to test a music recommender system linking music with specific episodic memories chosen by participants, through a discussion with a virtual agent. This specific part of the data relates to the ratings of recommendations by participants.

  6. Z

    MSD-A: Million Song Dataset for Artists

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergio Oramas (2020). MSD-A: Million Song Dataset for Artists [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_831347
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Sergio Oramas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MSD-A is a dataset related to the Million Song Dataset (MSD). It is a collection of artist tags and biographies gathered from Last.fm for all the artists that have songs in the MSD. In addition, the MSD Taste Profile (recommendation dataset) is adapted to artists.

    We provide the biographies, tags, data splits, and feature embeddings to reproduce the experiments from the paper:

    Oramas S., Nieto O., Sordo M., & Serra X. (2017) A Deep Multimodal Approach for Cold-start Music Recommendation. https://arxiv.org/abs/1706.09739

    Source code is available at https://github.com/sergiooramas/tartarus

    The file dlrs-data.tar.gz in this zenodo version is corrupted. You can download the good file in this link:

    https://drive.google.com/open?id=0B-oq_x72w8NUbUpkMzZSc1JPd28

  7. h

    yambda

    • huggingface.co
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yandex (2025). yambda [Dataset]. https://huggingface.co/datasets/yandex/yambda
    Explore at:
    Dataset updated
    Jun 6, 2025
    Dataset authored and provided by
    Yandex
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Yambda-5B — A Large-Scale Multi-modal Dataset for Ranking And Retrieval

    Industrial-scale music recommendation dataset with organic/recommendation interactions and audio embeddings 📌 Overview • 🔑 Key Features • 📊 Statistics • 📝 Format • 🏆 Benchmark • ⬇️ Download • ❓ FAQ

      Overview
    

    The Yambda-5B dataset is a large-scale open database comprising 4.79 billion user-item interactions collected from 1 million users and spanning 9.39 million tracks. The dataset includes… See the full description on the dataset page: https://huggingface.co/datasets/yandex/yambda.

  8. Z

    lastfm Music Recommendation Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Òscar Celma (2022). lastfm Music Recommendation Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6090213
    Explore at:
    Dataset updated
    Feb 15, 2022
    Dataset authored and provided by
    Òscar Celma
    Description

    This is a common Zenodo repository for both lastfm-360K and lastfm-1K datasets. See below the details of both datasets, including license, acknowledgements, contact, and instructions to cite.

    LASTFM-360K (version 1.2, March 2010).

    What is this? This dataset contains tuples (for ~360,000 users) collected from Last.fm API, using the user.getTopArtists() method.

    Files:

    usersha1-artmbid-artname-plays.tsv (MD5: be672526eb7c69495c27ad27803148f1)

    usersha1-profile.tsv (MD5: 51159d4edf6a92cb96f87768aa2be678)

    mbox_sha1sum.py (MD5: feb3485eace85f3ba62e324839e6ab39)

    Data Statistics:

    File usersha1-artmbid-artname-plays.tsv:

    Total Lines: 17,559,530

    Unique Users: 359,347

    Artists with MBID: 186,642

    Artists without MBID: 107,373

    Data Format: The data is formatted one entry per line as follows (tab separated "\t"):

    File usersha1-artmbid-artname-plays.tsv:

    user-mboxsha1 \t musicbrainz-artist-id \t artist-name \t plays

    File usersha1-profile.tsv:

    user-mboxsha1 \t gender (m|f|empty) \t age (int|empty) \t country (str|empty) \t signup (date|empty)

    Example:

    File usersha1-artmbid-artname-plays.tsv:

    000063d3fe1cf2ba248b9e3c3f0334845a27a6be \t a3cb23fc-acd3-4ce0-8f36-1e5aa6a18432 \t u2 \t 31 ...

    File usersha1-profile.tsv:

    000063d3fe1cf2ba248b9e3c3f0334845a27a6be \t m \t 19 \t Mexico \t Apr 28, 2008 ...

    LASTFM-1K (version 1.0, March 2010).

    What is this? This dataset contains tuples collected from Last.fm API, using the user.getRecentTracks() method. This dataset represents the whole listening habits (till May, 5th 2009) for nearly 1,000 users.

    Files:

    userid-timestamp-artid-artname-traid-traname.tsv (MD5: 64747b21563e3d2aa95751e0ddc46b68)

    userid-profile.tsv (MD5: c53608b6b445db201098c1489ea497df)

    Data Statistics:

    File userid-timestamp-artid-artname-traid-traname.tsv:

    Total Lines: 19,150,868

    Unique Users: 992

    Artists with MBID: 107,528

    Artists without MBDID: 69,420

    Data Format: The data is formatted one entry per line as follows (tab separated, "\t"):

    File userid-timestamp-artid-artname-traid-traname.tsv:

    userid \t timestamp \t musicbrainz-artist-id \t artist-name \t musicbrainz-track-id \t track-name

    File userid-profile.tsv:

    userid \t gender ('m'|'f'|empty) \t age (int|empty) \t country (str|empty) \t signup (date|empty)

    Example:

    File userid-timestamp-artid-artname-traid-traname.tsv:

    user_000639 \t 2009-04-08T01:57:47Z \t MBID \t The Dogs D'Amour \t MBID \t Fall in Love Again? user_000639 \t 2009-04-08T01:53:56Z \t MBID \t The Dogs D'Amour \t MBID \t Wait Until I'm Dead ...

    File userid-profile.tsv:

    user_000639 \t m \t Mexico \t Apr 27, 2005 ...

    LICENSE OF BOTH DATASETS. The data contained in both datasets is distributed with permission of Last.fm. The data is made available for non-commercial use. Those interested in using the data or web services in a commercial context should contact:

    partners [at] last [dot] fm

    For more information see Last.fm terms of service

    ACKNOWLEDGEMENTS. Thanks to Last.fm for providing the access to this data via their web services. Special thanks to Norman Casagrande.

    REFERENCES. When using this dataset you must reference the Last.fm webpage. Optionally (not mandatory at all!), you can cite Chapter 3 of this book:

    @book{Celma:Springer2010, author = {Celma, O.}, title = {{Music Recommendation and Discovery in the Long Tail}}, publisher = {Springer}, year = {2010} }

    CONTACT: This data was collected by Òscar Celma @ MTG/UPF

  9. C

    Data from: Sound and music recommendation with knowledge graphs [dataset]

    • dataverse.csuc.cat
    txt, zip
    Updated Oct 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergio Oramas; Sergio Oramas; Vito Claudio Ostuni; Gabriel Vigliensoni; Gabriel Vigliensoni; Vito Claudio Ostuni (2023). Sound and music recommendation with knowledge graphs [dataset] [Dataset]. http://doi.org/10.34810/data444
    Explore at:
    txt(3751), zip(56553416)Available download formats
    Dataset updated
    Oct 9, 2023
    Dataset provided by
    CORA.Repositori de Dades de Recerca
    Authors
    Sergio Oramas; Sergio Oramas; Vito Claudio Ostuni; Gabriel Vigliensoni; Gabriel Vigliensoni; Vito Claudio Ostuni
    License

    https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data444https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data444

    Description

    Music Recommendation Dataset (KGRec-music). Number of items: 8,640. Number of users: 5,199. Number of items-users interactions: 751,531. All the data comes from songfacts.com and last.fm websites. Items are songs, which are described in terms of textual description extracted from songfacts.com, and tags from last.fm. Files and folders in the dataset: /descriptions: In this folder there is one file per item with the textual description of the item. The name of the file is the id of the item plus the ".txt" extension. /tags: In this folder there is one file per item with the tags of the item separated by spaces. Multiword tags are separated by -. The name of the file is the id of the item plus the ".txt" extension. Not all items have tags, there are 401 items without tags. implicit_lf_dataset.txt: This file contains the interactions between users and items. There is one line per interaction (a user that downloaded a sound in this case) with the following format, fields in one line are separated by tabs: user_id /t sound_id /t 1 /n. Sound Recommendation Dataset (KGRec-sound). Number of items: 21,552. Number of users: 20,000. Number of items-users interactions: 2,117,698. All the data comes from Freesound.org. Items are sounds, which are described in terms of textual description and tags created by the sound creator at uploading time. Files and folders in the dataset: /descriptions: In this folder there is one file per item with the textual description of the item. The name of the file is the id of the item plus the ".txt" extension. /tags: In this folder there is one file per item with the tags of the item separated by spaces. The name of the file is the id of the item plus the ".txt" extension. downloads_fs_dataset.txt: This file contains the interactions between users and items. There is one line per interaction (a user that downloaded a sound in this case) with the following format, fields in one line are separated by tabs: /nuser_id /t sound_id /t 1 /n. Two different datasets with users, items, implicit feedback interactions between users and items, item tags, and item text descriptions are provided, one for Music Recommendation (KGRec-music), and other for Sound Recommendation (KGRec-sound).

  10. last.fm Music Artist Scrobbles

    • kaggle.com
    Updated Jun 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paulo Breviglieri (2020). last.fm Music Artist Scrobbles [Dataset]. https://www.kaggle.com/pcbreviglieri/lastfm-music-artist-scrobbles/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 15, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Paulo Breviglieri
    Description

    This dataset is a summarized, sanitized subset of the one released at The 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011), currently hosted at the GroupLens website (here).

    Sanitization included: (a) artist name mispelling correction and standardization; (b) reassignment of artists referenced with two or more artist id's; (c) removal of artists listed as 'unknown' or through their website addresses.

    The original dataset contains a larger number of files, including tag-related information, in addition to users, artists and scrobble counts. last.fm was contacted by the author and asked for some recent version of this content, in similar format, with no return until June 15th, 2020.

  11. #nowplaying-rs

    • kaggle.com
    • explore.openaire.eu
    • +2more
    zip
    Updated May 11, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chelsea Power (2020). #nowplaying-rs [Dataset]. https://www.kaggle.com/chelseapower/nowplayingrs
    Explore at:
    zip(1324201132 bytes)Available download formats
    Dataset updated
    May 11, 2020
    Authors
    Chelsea Power
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    Context

    The nowplaying-RS dataset features context- and content features of listening events. It contains 11.6 million music listening events of 139K users and 346K tracks collected from Twitter. The dataset comes with a rich set of item content features and user context features, as well as timestamps of the listening events. Moreover, some of the user context features imply the cultural origin of the users, and some others - like hashtags - give clues to the emotional state of a user underlying a listening event.

    Content

    user_track_hashtag_timestamp.csv contains basic information about each listening event. For each listening event, we provide an id, the user_id, track_id, hashtag, created_at

    context_content_features.csv contains all context and content features. For each listening event, we provide the id of the event, user_id, track_id, artist_id, content features regarding the track mentioned in the event (instrumentalness, liveness, speechiness, danceability, valence, loudness, tempo, acousticness, energy, mode, key) and context features regarding the listening event (coordinates (as geoJSON), place (as geoJSON), geo (as geoJSON), tweet_language, created_at, user_lang, time_zone, entities contained in the tweet).

    sentiment_values.csv contains sentiment information for hashtags. It contains the hashtag itself and the sentiment values gathered via four different sentiment dictionaries: AFINN, Opinion Lexicon, Sentistrength Lexicon and vader. For each of these dictionaries we list the minimum, maximum, sum and average of all sentiments of the tokens of the hashtag (if available, else we list empty values). However, as most hashtags only consist of a single token, these values are equal in most cases. Please note that the lexica are rather diverse and therefore, are able to resolve very different terms against a score. Hence, the resulting csv is rather sparse.

    Acknowledgements

    @inproceedings{smc18, title = {#nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music Recommender Systems}, author = {Asmita Poddar and Eva Zangerle and Yi-Hsuan Yang}, url = {http://mac.citi.sinica.edu.tw/~yang/pub/poddar18smc.pdf}, year = {2018}, date = {2018-07-04}, booktitle = {Proceedings of the 15th Sound & Music Computing Conference}, address = {Limassol, Cyprus}, note = {code at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM}, tppubtype = {inproceedings} }

    Inspiration

    By incorporating mood related hashtags and timestamps in a neural network to predict the emotion variation of the user based on the track they are playing, can this improve the next song recommendation model?

  12. Spotify Million Playlist: Recsys Challenge 2018 Dataset

    • zenodo.org
    • explore.openaire.eu
    • +1more
    Updated Apr 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AIcrowd; AIcrowd (2022). Spotify Million Playlist: Recsys Challenge 2018 Dataset [Dataset]. http://doi.org/10.5281/zenodo.6425593
    Explore at:
    Dataset updated
    Apr 9, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    AIcrowd; AIcrowd
    Description

    Spotify Million Playlist Dataset Challenge

    Summary

    The Spotify Million Playlist Dataset Challenge consists of a dataset and evaluation to enable research in music recommendations. It is a continuation of the RecSys Challenge 2018, which ran from January to July 2018. The dataset contains 1,000,000 playlists, including playlist titles and track titles, created by users on the Spotify platform between January 2010 and October 2017. The evaluation task is automatic playlist continuation: given a seed playlist title and/or initial set of tracks in a playlist, to predict the subsequent tracks in that playlist. This is an open-ended challenge intended to encourage research in music recommendations, and no prizes will be awarded (other than bragging rights).

    Background

    Playlists like Today’s Top Hits and RapCaviar have millions of loyal followers, while Discover Weekly and Daily Mix are just a couple of our personalized playlists made especially to match your unique musical tastes.

    Our users love playlists too. In fact, the Digital Music Alliance, in their 2018 Annual Music Report, state that 54% of consumers say that playlists are replacing albums in their listening habits.

    But our users don’t love just listening to playlists, they also love creating them. To date, over 4 billion playlists have been created and shared by Spotify users. People create playlists for all sorts of reasons: some playlists group together music categorically (e.g., by genre, artist, year, or city), by mood, theme, or occasion (e.g., romantic, sad, holiday), or for a particular purpose (e.g., focus, workout). Some playlists are even made to land a dream job, or to send a message to someone special.

    The other thing we love here at Spotify is playlist research. By learning from the playlists that people create, we can learn all sorts of things about the deep relationship between people and music. Why do certain songs go together? What is the difference between “Beach Vibes” and “Forest Vibes”? And what words do people use to describe which playlists?

    By learning more about nature of playlists, we may also be able to suggest other tracks that a listener would enjoy in the context of a given playlist. This can make playlist creation easier, and ultimately help people find more of the music they love.

    Dataset

    To enable this type of research at scale, in 2018 we sponsored the RecSys Challenge 2018, which introduced the Million Playlist Dataset (MPD) to the research community. Sampled from the over 4 billion public playlists on Spotify, this dataset of 1 million playlists consist of over 2 million unique tracks by nearly 300,000 artists, and represents the largest public dataset of music playlists in the world. The dataset includes public playlists created by US Spotify users between January 2010 and November 2017. The challenge ran from January to July 2018, and received 1,467 submissions from 410 teams. A summary of the challenge and the top scoring submissions was published in the ACM Transactions on Intelligent Systems and Technology.

    In September 2020, we re-released the dataset as an open-ended challenge on AIcrowd.com. The dataset can now be downloaded by registered participants from the Resources page.

    Each playlist in the MPD contains a playlist title, the track list (including track IDs and metadata), and other metadata fields (last edit time, number of playlist edits, and more). All data is anonymized to protect user privacy. Playlists are sampled with some randomization, are manually filtered for playlist quality and to remove offensive content, and have some dithering and fictitious tracks added to them. As such, the dataset is not representative of the true distribution of playlists on the Spotify platform, and must not be interpreted as such in any research or analysis performed on the dataset.

    Dataset Contains

    1000 examples of each scenario:

    Title only (no tracks) Title and first track Title and first 5 tracks First 5 tracks only Title and first 10 tracks First 10 tracks only Title and first 25 tracks Title and 25 random tracks Title and first 100 tracks Title and 100 random tracks

    Download Link

    Full Details: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge
    Download Link: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge/dataset_files

  13. Z

    Dataset: Music Industry Professionals' Perspectives on Music Streaming...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christine Bauer (2023). Dataset: Music Industry Professionals' Perspectives on Music Streaming Services and Recommendation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8185735
    Explore at:
    Dataset updated
    Aug 11, 2023
    Dataset provided by
    Karlijn Dinnissen
    Marloes Vredenborg
    Christine Bauer
    Isabella Saccardi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Questionnaire response data set Here, we include the data retrieved from participants at Eurosonic Noorderslag 2023, as described in the paper cited above. When using, analyzing, or publishing this data in any way, please make sure to attribute it to the authors and cite it accordingly.

    We include the data in .xlsx, .csv format (semicolon-separated, and .tsv format (tab-separated). We suggest using the Excel file, as its layout makes it more easily readable.

    The complete question list as used in the questionnaire is published separately on https://doi.org/10.5281/zenodo.8121151.

    Paper title Looking at the FAccTs: Exploring Music Industry Professionals’ Perspectives on Music Streaming Services and Recommendations

    Paper abstract Music recommender systems, commonly integrated into streaming services, help listeners find music. Previous research on such systems has focused on providing the best possible recommendations for these services' consumers, as well as on fairness for artists who release their music on streaming services. While those insights are imperative, another group of stakeholders has been omitted so far: the many other professionals working in the music industry. They, too, are (in)directly affected by music streaming services. Therefore, this work explores the perspective of music industry professionals. We present a study that addresses the role of streaming services and recommender systems in their jobs. Results indicate this role is significant. Furthermore, participants feel that music recommender systems lack transparency and are insufficiently controllable, for both customers and artists. Finally, participants desire that music streaming services take charge of increasing recommendation diversity, and variety in consumers' listening behavior and taste.

    Citation Karlijn Dinnissen, Isabella Saccardi, Marloes Vredenborg, and Christine Bauer. 2023. Looking at the FAccTs: Exploring Music Industry Professionals’ Perspectives on Music Streaming Services and Recommendations. In 2nd International Conference of the ACM Greek SIGCHI Chapter (CHIGREECE 2023), September 27–28, 2023, Athens, Greece. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3609987.3610011

  14. h

    music-recommender-data

    • huggingface.co
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TezBytes (2025). music-recommender-data [Dataset]. https://huggingface.co/datasets/tezbytes/music-recommender-data
    Explore at:
    Dataset updated
    Jun 1, 2025
    Dataset authored and provided by
    TezBytes
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    tezbytes/music-recommender-data dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. #nowplaying

    • zenodo.org
    • explore.openaire.eu
    • +1more
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eva Zangerle; Eva Zangerle (2020). #nowplaying [Dataset]. http://doi.org/10.5281/zenodo.2594483
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Eva Zangerle; Eva Zangerle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains a dump of the #nowplaying dataset which contains so-called listening events of users who publish the music they are currently listening to on Twitter. In particular, this dataset includes tracks which have been tweeted using the hashtags #nowplaying, #listento or #listeningto. In this dataset, we provide the track and artist of a listening event and metadata on the tweet (date sent, user, source). Furthermore, we provide a mapping of tracks to its respective Musicbrainz identifiers. The dataset features a total of 126 mio listening events.

    This archive contains the nowplaying.csv file, the main file which contains the following fields:

    • user id (each user is identified by a unique hash value)
    • source of the tweet (how it was sent; as provided by the Twitter API)
    • timestamp of the time the tweet underlying the listening event was sent
    • track title
    • artist name
    • musicbrainz identifier of the recording (cf. https://musicbrainz.org/)

    In case you make use of our dataset in a scientific setting, we kindly ask you to cite the following paper:


    Eva Zangerle, Martin Pichl, Wolfgang Gassler, and Günther Specht. 2014. #nowplaying Music Dataset: Extracting Listening Behavior from Twitter. In Proceedings of the First International Workshop on Internet-Scale Multimedia Management (WISMM '14). ACM, New York, NY, USA, 21-26.

    If you have any questions or suggestions regarding the dataset, please do not hesitate to contact Eva Zangerle (eva.zangerle@uibk.ac.at).

  16. Data from: Music Recommendation

    • kaggle.com
    Updated Jun 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aryan Mahawar (2024). Music Recommendation [Dataset]. https://www.kaggle.com/datasets/aryanmahawar/music-recommendation/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aryan Mahawar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Aryan Mahawar

    Released under Apache 2.0

    Contents

  17. f

    Data from: Beyond the Big Five Personality Traits for Music Recommendation...

    • figshare.com
    xls
    Updated Jan 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mariusz Kleć; Alicja Wieczorkowska; Włodzimierz Strus; Krzysztof Szklanny (2023). Beyond the Big Five Personality Traits for Music Recommendation Systems - dataset [Dataset]. http://doi.org/10.6084/m9.figshare.19678962.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 11, 2023
    Dataset provided by
    figshare
    Authors
    Mariusz Kleć; Alicja Wieczorkowska; Włodzimierz Strus; Krzysztof Szklanny
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The aim of the paper "Beyond the Big Five Personality Traits for Music Recommendation Systems" is to investigate the influence of personality traits, characterized by the BFI (Big Five Inventory) and its significant revision called BFI-2, on music recommendation error. The BFI-2 describes the lower-order facets of the Big Five personality traits. We performed experiments with 279 participants, using an application (called Music Master) we developed for music listening and ranking, and for collecting personality profiles of the users. Additionally, 29-dimensional vectors of audio features were extracted to describe the music files.

    In our paper, we used this data set to test several hypotheses about the influence of personality traits and the audio features on music recommendation error. The experiments have showed that every combination of Big-Five personality traits produces worse results than using lower-order personality facets. Additionally, we found a small subset of personality facets that yielded the lowest recommendation error. This finding allows condensing the personality questionnaire to only the most essential questions.

    The EXCEL file contains 5278 entries created for 279 participants. Each entry includes the preferences (expressed using the 5-point Likert scale) that refer to listening to music's cognitive aspect are denoted as Q1. The motivational and interpersonal aspects are denoted as Q2 and Q3, respectively. The following 20 variables (columns) contain 20 dimensional, extended Big Five personality traits values. The last 29 columns contain the values of low-level audio features, including emotions extracted from the audio files. The EXCEL file is ready to be saved in CSV and imported into memory using a suitable programming language (e.g. Python, R, Java, Matlab and others) for further processing, i.e. for creating user-item matrixes for collaborating filtering and evaluating its performance with the usage of proposed new rating types (motivational and interpersonal ones) described the article.

    The usage of the data set requires citing the paper.

  18. f

    Music Recomendation Systems

    • figshare.com
    txt
    Updated Jul 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adriana Huante (2020). Music Recomendation Systems [Dataset]. http://doi.org/10.6084/m9.figshare.11933157.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 8, 2020
    Dataset provided by
    figshare
    Authors
    Adriana Huante
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Training dataset for music recommendation system. The last 30 columns represent the labels, where:1 = liked and saved the song0.6 = liked but didn't save the song 0 = didn't like the song

  19. Z

    Datasets from the RecSys 2020 article "Carousel Personalization in Music...

    • data.niaid.nih.gov
    Updated Oct 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salha, Guillaume (2020). Datasets from the RecSys 2020 article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4048677
    Explore at:
    Dataset updated
    Oct 12, 2020
    Dataset provided by
    Bontempelli, Théo
    Bendada, Walid
    Salha, Guillaume
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We publicly release the anonymized user_features.csv and playlist_features.csv datasets, from the music streaming platform Deezer, as described in the article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" published in the proceedings of the 14th ACM Conference on Recommender Systems (RecSys 2020). The paper is available here.

    These datasets are used in the GitHub repository deezer/carousel_bandits to reproduce experiments from the article.

    Please cite our paper if you use our code or data in your work.

  20. h

    lastfm-1k

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew, lastfm-1k [Dataset]. https://huggingface.co/datasets/matthewfranglen/lastfm-1k
    Explore at:
    Authors
    Matthew
    Description

    Dataset Description

    This dataset is ideal for training a recommendation system that incorporates time and country information.

      Task Summary
    

    A recommender system, or a recommendation system, is a subclass of information filtering system that provides suggestions for items that are most pertinent to a particular user. Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may… See the full description on the dataset page: https://huggingface.co/datasets/matthewfranglen/lastfm-1k.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Viet Anh TRAN; Viet Anh TRAN (2024). Deezer listening events dataset [Dataset]. http://doi.org/10.5281/zenodo.13890194
Organization logo

Deezer listening events dataset

Explore at:
application/gzipAvailable download formats
Dataset updated
Oct 4, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Viet Anh TRAN; Viet Anh TRAN
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

What does this dataset contain?

This dataset contains over 700 million time-stamped listening events collected from 3.4M anonymised users on the music streaming service Deezer, occurred between March and August 2022. It includes 50k anonymised songs, among the most popular ones on the service as well as their pre-trained embedding vectors, calculated by our internal model. All files are in parquet format which could be read by using pandas.read_parquet function.

What could this dataset be used for?

This dataset could be used for collaborative filtering as well as sequential recommendation (including both next-item and next-session recommendations).

Citation

If you use this dataset, please cite following paper:

@inproceedings{tran-recsys2024,
 title={Transformers Meet ACT-R: Repeat-Aware and Sequential Listening Session Recommendation},
 author={Viet-Anh Tran, Guillaume Salha-Galvan, Bruno Sguerra and Romain Hennequin},
 booktitle = {Proceedings of the 18th ACM Conference on Recommender Systems},
 year = {2024}
}
Search
Clear search
Close search
Google apps
Main menu