Saved datasets
Last updated
Download format
Croissant
Croissant is a format for Machine Learning datasets
Learn more about this at mlcommons.org/croissant.
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Topic
Provider
Free
Cost to access
Described as free to access or have a license that allows redistribution.
100+ datasets found
  1. 🎶 Song Dataset: 10,000 Apple Music Tracks

    • kaggle.com
    zip
    Updated Feb 11, 2024
  2. m

    Music Dataset: Lyrics and Metadata from 1950 to 2019

    • data.mendeley.com
    Updated Aug 24, 2020
    + more versions
  3. Spotify Global Music Dataset (2009–2025)

    • kaggle.com
    zip
    Updated Nov 11, 2025
  4. MusicCaps

    • huggingface.co
    • kaggle.com
    Updated Jan 26, 2023
  5. Z

    MuMu: Multimodal Music Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 6, 2022
  6. Ludwig Music Dataset (Moods and Subgenres)

    • kaggle.com
    zip
    Updated Apr 16, 2022
  7. h

    free-music-archive-full

    • huggingface.co
    Updated Sep 13, 2024
    + more versions
  8. h

    Music-Instruct

    • huggingface.co
    Updated Sep 15, 2023
  9. h

    MusicBench

    • huggingface.co
    Updated Nov 14, 2023
    + more versions
  10. lastfm Music Recommendation Dataset

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Feb 15, 2022
  11. m

    Music Dataset: Lyrics and Metadata from 1950 to 2019

    • data.mendeley.com
    • narcis.nl
    Updated Oct 23, 2020
  12. h

    Music-Dataset

    • huggingface.co
    Updated Nov 8, 2025
    + more versions
  13. h

    ai-vs-human-music-dataset

    • huggingface.co
    Updated Sep 28, 2025
    + more versions
  14. h

    FMA-music-dataset

    • huggingface.co
    Updated Feb 24, 2026
  15. Song Describer Dataset

    • zenodo.org
    • dataverse.csuc.cat
    • +2more
    csv, pdf, tsv, txt +1
    Updated Jul 10, 2024
  16. Z

    MGD: Music Genre Dataset

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated May 28, 2021
  17. Spotify Electronic Music Dataset

    • kaggle.com
    zip
    Updated Mar 15, 2022
  18. Data from: MusicOSet: An Enhanced Open Dataset for Music Data Mining

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    bin, zip
    Updated Jun 7, 2021
  19. h

    music-analysis-dataset

    • huggingface.co
    Updated Jun 1, 2025
  20. h

    my-music-dataset

    • huggingface.co
    Updated Mar 21, 2025
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Oramas, Sergio (2022). MuMu: Multimodal Music Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_831188

MuMu: Multimodal Music Dataset

Explore at:
Dataset updated
Dec 6, 2022
Dataset provided by
Universitat Pompeu Fabra
Authors
Oramas, Sergio
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

MuMu is a Multimodal Music dataset with multi-label genre annotations that combines information from the Amazon Reviews dataset and the Million Song Dataset (MSD). The former contains millions of album customer reviews and album metadata gathered from Amazon.com. The latter is a collection of metadata and precomputed audio features for a million songs.

To map the information from both datasets we use MusicBrainz. This process yields the final set of 147,295 songs, which belong to 31,471 albums. For the mapped set of albums, there are 447,583 customer reviews from the Amazon Dataset. The dataset have been used for multi-label music genre classification experiments in the related publication. In addition to genre annotations, this dataset provides further information about each album, such as genre annotations, average rating, selling rank, similar products, and cover image url. For every text review it also provides helpfulness score of the reviews, average rating, and summary of the review.

The mapping between the three datasets (Amazon, MusicBrainz and MSD), genre annotations, metadata, data splits, text reviews and links to images are available here. Images and audio files can not be released due to copyright issues.

MuMu dataset (mapping, metadata, annotations and text reviews)

Data splits and multimodal feature embeddings for ISMIR multi-label classification experiments

These data can be used together with the Tartarus deep learning library https://github.com/sergiooramas/tartarus.

NOTE: This version provides simplified files with metadata and splits.

Scientific References

Please cite the following papers if using MuMu dataset or Tartarus library.

Oramas, S., Barbieri, F., Nieto, O., and Serra, X (2018). Multimodal Deep Learning for Music Genre Classification, Transactions of the International Society for Music Information Retrieval, V(1).

Oramas S., Nieto O., Barbieri F., & Serra X. (2017). Multi-label Music Genre Classification from audio, text and images using Deep Features. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017). https://arxiv.org/abs/1707.04916

Search
Clear search
Close search
Google apps
Main menu