19 datasets found
  1. MusicCaps

    • huggingface.co
    • kaggle.com
    Updated Jan 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2023). MusicCaps [Dataset]. https://huggingface.co/datasets/google/MusicCaps
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 27, 2023
    Dataset authored and provided by
    Googlehttp://google.com/
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for MusicCaps

      Dataset Summary
    

    The MusicCaps dataset contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians. An aspect list is for example "pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead", while the caption consists of multiple sentences about the music, e.g., "A low sounding male voice is rapping over a fast paced drums… See the full description on the dataset page: https://huggingface.co/datasets/google/MusicCaps.

  2. h

    LP-MusicCaps-MSD

    • huggingface.co
    Updated Aug 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    seungheon.doh (2023). LP-MusicCaps-MSD [Dataset]. https://huggingface.co/datasets/seungheondoh/LP-MusicCaps-MSD
    Explore at:
    Dataset updated
    Aug 1, 2023
    Authors
    seungheon.doh
    Description

    ======================================

    !important: Be careful when using caption_attribute_prediction (We don't recommend to use)!

      Dataset Card for LP-MusicCaps-MSD
    
    
    
    
    
      Dataset Summary
    

    LP-MusicCaps is a Large Language Model based Pseudo Music Caption dataset for text-to-music and music-to-text tasks. We construct the music-to-caption pairs with tag-to-caption generation (using three existing multi-label tag datasets and four task… See the full description on the dataset page: https://huggingface.co/datasets/seungheondoh/LP-MusicCaps-MSD.

  3. Musiccaps

    • kaggle.com
    zip
    Updated Mar 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pinal03 (2024). Musiccaps [Dataset]. https://www.kaggle.com/datasets/pinal03/musiccaps/data
    Explore at:
    zip(16582342991 bytes)Available download formats
    Dataset updated
    Mar 4, 2024
    Authors
    Pinal03
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Musiccaps dataset with WAV files. AudioLM trained Soundstream model and Semantic Transformer model checkpoints.

  4. h

    MusicCaps

    • huggingface.co
    Updated Sep 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph (2025). MusicCaps [Dataset]. https://huggingface.co/datasets/EvanSirius/MusicCaps
    Explore at:
    Dataset updated
    Sep 15, 2025
    Authors
    Joseph
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    EvanSirius/MusicCaps dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. MusicCaps-Spectrogram

    • kaggle.com
    zip
    Updated Apr 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brijesh Giri (2024). MusicCaps-Spectrogram [Dataset]. https://www.kaggle.com/datasets/mischiefhat/musiccaps-spectrogram/suggestions?status=pending
    Explore at:
    zip(2103831170 bytes)Available download formats
    Dataset updated
    Apr 6, 2024
    Authors
    Brijesh Giri
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Brijesh Giri

    Released under Apache 2.0

    Contents

  6. Final MusicCaps Wavs

    • kaggle.com
    zip
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kianoosh Vadaei (2025). Final MusicCaps Wavs [Dataset]. https://www.kaggle.com/kianooshvadaei/final-musiccaps-wavs
    Explore at:
    zip(15154930554 bytes)Available download formats
    Dataset updated
    Oct 29, 2025
    Authors
    Kianoosh Vadaei
    Description

    Dataset

    This dataset was created by Kianoosh Vadaei

    Contents

  7. SunoCaps

    • kaggle.com
    zip
    Updated Jun 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miguel Civit (2024). SunoCaps [Dataset]. https://www.kaggle.com/datasets/miguelcivit/sunocaps/discussion
    Explore at:
    zip(602884968 bytes)Available download formats
    Dataset updated
    Jun 21, 2024
    Authors
    Miguel Civit
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Suno AI generated music using the Musicaps caption/aspect list and tagged for prompt alignment and main emotion.

    M. Civit, V. Drai-Zerbib, D. Lizcano, M.J. Escalona, SunoCaps: A Novel Dataset of Text-Prompt Based AI-Generated Music with Emotion Annotations, Data in Brief, 2024, 110743, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2024.110743. (https://www.sciencedirect.com/science/article/pii/S2352340924007078) Abstract: The SunoCaps dataset aims to provide an innovative contribution to music data. Expert description of human-made musical pieces, from the widely used MusicCaps dataset, are used as prompts for generating complete songs for this dataset. This Automatic Music Generation is done with the state-of-the-art Suno generator of audio-based music. A subset of 64 pieces from MusicCaps is currently included, with a total of 256 generated entries. This total stems from generating four different variations for each human piece; two versions based on the original caption and two versions based on the original aspect description. As an AI-generated music dataset, SunoCaps also includes expert-based information on prompt alignment, with the main differences between prompt and final generation annotated. Furthermore, annotations describing the main discrete emotions induced by the piece. This dataset can have an array of implementations, such as creating and improving music generation validation tools, training systems for multi-layered architectures and the optimization of music emotion estimation systems. Keywords: Data; Automatic Music Generation; Emotion feature; Artificial Intelligence; Prompt alignment; Generative AI

  8. h

    MusicCaps

    • huggingface.co
    Updated Jan 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziyuan Zhao (2023). MusicCaps [Dataset]. https://huggingface.co/datasets/zhaoziyuan78/MusicCaps
    Explore at:
    Dataset updated
    Jan 27, 2023
    Authors
    Ziyuan Zhao
    Description

    zhaoziyuan78/MusicCaps dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. Data from: WikiMuTe: A web-sourced dataset of semantic descriptions for...

    • zenodo.org
    csv
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benno Weck; Benno Weck; Holger Kirchhoff; Holger Kirchhoff; Peter Grosche; Peter Grosche; Serra Xavier; Serra Xavier (2024). WikiMuTe: A web-sourced dataset of semantic descriptions for music audio [Dataset]. http://doi.org/10.5281/zenodo.10223363
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benno Weck; Benno Weck; Holger Kirchhoff; Holger Kirchhoff; Peter Grosche; Peter Grosche; Serra Xavier; Serra Xavier
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    This upload contains the supplementary material for our paper presented at the MMM2024 conference.

    Dataset

    The dataset contains rich text descriptions for music audio files collected from Wikipedia articles.

    The audio files are freely accessible and available for download through the URLs provided in the dataset.

    Example

    A few hand-picked, simplified examples of the dataset.

    file

    aspects

    sentences

    🔈 Bongo sound.wav

    ['bongoes', 'percussion instrument', 'cumbia', 'drums']

    ['a loop of bongoes playing a cumbia beat at 99 bpm']

    🔈 Example of double tracking in a pop-rock song (3 guitar tracks).ogg

    ['bass', 'rock', 'guitar music', 'guitar', 'pop', 'drums']

    ['a pop-rock song']

    🔈 OriginalDixielandJassBand-JazzMeBlues.ogg

    ['jazz standard', 'instrumental', 'jazz music', 'jazz']

    ['Considered to be a jazz standard', 'is an jazz composition']

    🔈 Colin Ross - Etherea.ogg

    ['chirping birds', 'ambient percussion', 'new-age', 'flute', 'recorder', 'single instrument', 'woodwind']

    ['features a single instrument with delayed echo, as well as ambient percussion and chirping birds', 'a new-age composition for recorder']

    🔈 Belau rekid (instrumental).oga

    ['instrumental', 'brass band']

    ['an instrumental brass band performance']

    ...

    ...

    ...

    Dataset structure

    We provide three variants of the dataset in the data folder.

    All are described in the paper.

    1. all.csv contains all the data we collected, without any filtering.
    2. filtered_sf.csv contains the data obtained using the self-filtering method.
    3. filtered_mc.csv contains the data obtained using the MusicCaps dataset method.

    File structure

    Each CSV file contains the following columns:

    • file: the name of the audio file
    • pageid: the ID of the Wikipedia article where the text was collected from
    • aspects: the short-form (tag) description texts collected from the Wikipedia articles
    • sentences: the long-form (caption) description texts collected from the Wikipedia articles
    • audio_url: the URL of the audio file
    • url: the URL of the Wikipedia article where the text was collected from

    Citation

    If you use this dataset in your research, please cite the following paper:

    @inproceedings{wikimute,
    title = {WikiMuTe: {A} Web-Sourced Dataset of Semantic Descriptions for Music Audio},
    author = {Weck, Benno and Kirchhoff, Holger and Grosche, Peter and Serra, Xavier},
    booktitle = "MultiMedia Modeling",
    year = "2024",
    publisher = "Springer Nature Switzerland",
    address = "Cham",
    pages = "42--56",
    doi = {10.1007/978-3-031-56435-2_4},
    url = {https://doi.org/10.1007/978-3-031-56435-2_4},
    }

    License

    The data is available under the Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.

    Each entry in the dataset contains a URL linking to the article, where the text data was collected from.

  10. h

    MusicCaps-Wavs

    • huggingface.co
    Updated Nov 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kianoosh Vadaei (2025). MusicCaps-Wavs [Dataset]. https://huggingface.co/datasets/Kia-vadaei/MusicCaps-Wavs
    Explore at:
    Dataset updated
    Nov 10, 2025
    Authors
    Kianoosh Vadaei
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Kia-vadaei/MusicCaps-Wavs dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    Musiccaps-image-aligned

    • huggingface.co
    Updated Nov 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kianoosh Vadaei (2025). Musiccaps-image-aligned [Dataset]. https://huggingface.co/datasets/Kia-vadaei/Musiccaps-image-aligned
    Explore at:
    Dataset updated
    Nov 10, 2025
    Authors
    Kianoosh Vadaei
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Kia-vadaei/Musiccaps-image-aligned dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. MusicCaps_musicdata

    • kaggle.com
    zip
    Updated Nov 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    recha_wine (2023). MusicCaps_musicdata [Dataset]. https://www.kaggle.com/datasets/rechawine/musiccaps-musicdata
    Explore at:
    zip(16606421553 bytes)Available download formats
    Dataset updated
    Nov 16, 2023
    Authors
    recha_wine
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by recha_wine

    Released under Apache 2.0

    Contents

  13. h

    LP-MusicCaps-MTT

    • huggingface.co
    Updated Oct 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrei Blahovici (2024). LP-MusicCaps-MTT [Dataset]. https://huggingface.co/datasets/AndreiBlahovici/LP-MusicCaps-MTT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 27, 2024
    Authors
    Andrei Blahovici
    Description

    AndreiBlahovici/LP-MusicCaps-MTT dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    MusicBench

    • huggingface.co
    Updated Nov 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustango (2023). MusicBench [Dataset]. https://huggingface.co/datasets/Z873bliwf988hj/MusicBench
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 16, 2023
    Authors
    Mustango
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    MusicBench Dataset

    The MusicBench dataset is a music audio-text pair dataset that was designed for text-to-music generation purpose and released along with Mustango text-to-music model. MusicBench is based on the MusicCaps dataset, which it expands from 5,521 samples to 52,768 training and 400 test samples!

      Dataset Details
    

    MusicBench expands MusicCaps by:

    Including music features of chords, beats, tempo, and key that are extracted from the audio. Describing these music… See the full description on the dataset page: https://huggingface.co/datasets/Z873bliwf988hj/MusicBench.

  15. C

    MuChoMusic dataset

    • dataverse.csuc.cat
    pdf, tsv, txt
    Updated Oct 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benno Weck; Benno Weck; Ilaria Manco; Ilaria Manco; Emmanouil Benetos; Emmanouil Benetos; Elio Quinton; Elio Quinton; George Fazekas; George Fazekas; Dmitry Bogdanov; Dmitry Bogdanov (2025). MuChoMusic dataset [Dataset]. http://doi.org/10.34810/data2642
    Explore at:
    pdf(61027), txt(11323), tsv(301553)Available download formats
    Dataset updated
    Oct 13, 2025
    Dataset provided by
    CORA.Repositori de Dades de Recerca
    Authors
    Benno Weck; Benno Weck; Ilaria Manco; Ilaria Manco; Emmanouil Benetos; Emmanouil Benetos; Elio Quinton; Elio Quinton; George Fazekas; George Fazekas; Dmitry Bogdanov; Dmitry Bogdanov
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models MuChoMusic is a benchmark designed to evaluate music understanding in multimodal language models focused on audio. It includes 1,187 multiple-choice questions validated by human annotators, based on 644 music tracks from two publicly available music datasets. These questions cover a wide variety of genres and assess knowledge and reasoning across several musical concepts and their cultural and functional contexts. The benchmark provides a holistic evaluation of five open-source models, revealing challenges such as over-reliance on the language modality and highlighting the need for better multimodal integration. Note on Audio Files This dataset comes without audio files. The audio files can be downloaded from two datasets: SongDescriberDataset (SDD) and MusicCaps. Please see the code repository for more information on how to download the audio. Citation If you use this dataset, please cite our paper: @inproceedings{weck2024muchomusic, title={MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models}, author={Weck, Benno and Manco, Ilaria and Benetos, Emmanouil and Quinton, Elio and Fazekas, György and Bogdanov, Dmitry}, booktitle = {Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR)}, year={2024} } Weck B, Manco I, Benetos E, Quinton E, Fazekas G, Bogdanov D. MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models. In: Kaneshiro B, Mysore G, Nieto O, Donahue C, Huang CZA, Lee JH, McFee B, McCallum M, editors. Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR2024); 2024 November 10-14; San Francisco, USA.

  16. h

    lp-music-caps-magnatagatune-3k-musicfm-embedding

    • huggingface.co
    Updated Nov 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mulab-mir (2024). lp-music-caps-magnatagatune-3k-musicfm-embedding [Dataset]. https://huggingface.co/datasets/mulab-mir/lp-music-caps-magnatagatune-3k-musicfm-embedding
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 8, 2024
    Dataset authored and provided by
    mulab-mir
    Description

    mulab-mir/lp-music-caps-magnatagatune-3k-musicfm-embedding dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    music-audio-pseudo-captions

    • huggingface.co
    Updated Aug 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    seungheon.doh (2023). music-audio-pseudo-captions [Dataset]. https://huggingface.co/datasets/seungheondoh/music-audio-pseudo-captions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2023
    Authors
    seungheon.doh
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Music-Audio-Pseudo Captions

    Pseudo Music and Audio Captions from LP-MusicCaps, Music Negation/Temporal Ordering WavCaps

      Dataset Summary
    

    Compared to other domains, music and audio domains cannot obtain well-written web caption data, and caption annotation is expensive. Therefore, we use the Music (LP-MusicCaps), (Music Negation/Temporal Ordering) and Audio (Wavcaps) datasets created with ChatGPT to re-organize them in the form of instructions, input… See the full description on the dataset page: https://huggingface.co/datasets/seungheondoh/music-audio-pseudo-captions.

  18. h

    GraySpectrogram

    • huggingface.co
    Updated Oct 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ML (2023). GraySpectrogram [Dataset]. https://huggingface.co/datasets/mickylan2367/GraySpectrogram
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2023
    Authors
    ML
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Google/Music-Capsの音声データをスペクトログラム化したデータ。

    Music Cpasとは:https://huggingface.co/datasets/google/MusicCaps GrayScaleじゃないほうもあるから見てね(⋈◍>◡<◍)。✧♡(これ)

      基本情報
    

    sampling_rate: int = 44100 20秒のwavファイル -> 1600×800のpngファイルへ変換 librosaの規格により、画像の縦軸:(0-10000?Hz), 画像の横軸:(0-40秒) 詳しくはlibrosa.specshow() -> https://librosa.org/doc/main/auto_examples/plot_display.html

      使い方
    
    
    
    
    
      0: データセットをダウンロード
    

    from datasets import load_dataset data = load_dataset("mickylan2367/spectrogram") data… See the full description on the dataset page: https://huggingface.co/datasets/mickylan2367/GraySpectrogram.

  19. h

    MusicCaps-ru

    • huggingface.co
    Updated Jun 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dmitry Balobin (2023). MusicCaps-ru [Dataset]. https://huggingface.co/datasets/d0rj/MusicCaps-ru
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 30, 2023
    Authors
    Dmitry Balobin
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    MusicCaps-ru

    Translated version of google/MusicCaps into Russian.

  20. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Google (2023). MusicCaps [Dataset]. https://huggingface.co/datasets/google/MusicCaps
Organization logo

MusicCaps

google/MusicCaps

Explore at:
275 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 27, 2023
Dataset authored and provided by
Googlehttp://google.com/
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dataset Card for MusicCaps

  Dataset Summary

The MusicCaps dataset contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians. An aspect list is for example "pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead", while the caption consists of multiple sentences about the music, e.g., "A low sounding male voice is rapping over a fast paced drums… See the full description on the dataset page: https://huggingface.co/datasets/google/MusicCaps.

Search
Clear search
Close search
Google apps
Main menu