19 datasets found

MusicCaps
huggingface.co
kaggle.com
Updated Jan 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2023). MusicCaps [Dataset]. https://huggingface.co/datasets/google/MusicCaps
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 27, 2023
Dataset authored and provided by
Googlehttp://google.com/
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for MusicCaps

Dataset Summary

The MusicCaps dataset contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians. An aspect list is for example "pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead", while the caption consists of multiple sentences about the music, e.g., "A low sounding male voice is rapping over a fast paced drums… See the full description on the dataset page: https://huggingface.co/datasets/google/MusicCaps.
h
LP-MusicCaps-MSD
huggingface.co
Updated Aug 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
seungheon.doh (2023). LP-MusicCaps-MSD [Dataset]. https://huggingface.co/datasets/seungheondoh/LP-MusicCaps-MSD
Explore at:
Dataset updated
Aug 1, 2023
Authors
seungheon.doh
Description
======================================

!important: Be careful when using caption_attribute_prediction (We don't recommend to use)!

Dataset Card for LP-MusicCaps-MSD Dataset Summary

LP-MusicCaps is a Large Language Model based Pseudo Music Caption dataset for text-to-music and music-to-text tasks. We construct the music-to-caption pairs with tag-to-caption generation (using three existing multi-label tag datasets and four task… See the full description on the dataset page: https://huggingface.co/datasets/seungheondoh/LP-MusicCaps-MSD.
Musiccaps
kaggle.com
zip
Updated Mar 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pinal03 (2024). Musiccaps [Dataset]. https://www.kaggle.com/datasets/pinal03/musiccaps/data
Explore at:
zip(16582342991 bytes)Available download formats
Dataset updated
Mar 4, 2024
Authors
Pinal03
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Musiccaps dataset with WAV files. AudioLM trained Soundstream model and Semantic Transformer model checkpoints.
h
MusicCaps
huggingface.co
Updated Sep 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph (2025). MusicCaps [Dataset]. https://huggingface.co/datasets/EvanSirius/MusicCaps
Explore at:
Dataset updated
Sep 15, 2025
Authors
Joseph
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
EvanSirius/MusicCaps dataset hosted on Hugging Face and contributed by the HF Datasets community
MusicCaps-Spectrogram
kaggle.com
zip
Updated Apr 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brijesh Giri (2024). MusicCaps-Spectrogram [Dataset]. https://www.kaggle.com/datasets/mischiefhat/musiccaps-spectrogram/suggestions?status=pending
Explore at:
zip(2103831170 bytes)Available download formats
Dataset updated
Apr 6, 2024
Authors
Brijesh Giri
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Brijesh Giri

Released under Apache 2.0

Contents
Final MusicCaps Wavs
kaggle.com
zip
Updated Oct 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kianoosh Vadaei (2025). Final MusicCaps Wavs [Dataset]. https://www.kaggle.com/kianooshvadaei/final-musiccaps-wavs
Explore at:
zip(15154930554 bytes)Available download formats
Dataset updated
Oct 29, 2025
Authors
Kianoosh Vadaei
Description
Dataset

This dataset was created by Kianoosh Vadaei

Contents
SunoCaps
kaggle.com
zip
Updated Jun 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miguel Civit (2024). SunoCaps [Dataset]. https://www.kaggle.com/datasets/miguelcivit/sunocaps/discussion
Explore at:
zip(602884968 bytes)Available download formats
Dataset updated
Jun 21, 2024
Authors
Miguel Civit
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Suno AI generated music using the Musicaps caption/aspect list and tagged for prompt alignment and main emotion.

M. Civit, V. Drai-Zerbib, D. Lizcano, M.J. Escalona, SunoCaps: A Novel Dataset of Text-Prompt Based AI-Generated Music with Emotion Annotations, Data in Brief, 2024, 110743, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2024.110743. (https://www.sciencedirect.com/science/article/pii/S2352340924007078) Abstract: The SunoCaps dataset aims to provide an innovative contribution to music data. Expert description of human-made musical pieces, from the widely used MusicCaps dataset, are used as prompts for generating complete songs for this dataset. This Automatic Music Generation is done with the state-of-the-art Suno generator of audio-based music. A subset of 64 pieces from MusicCaps is currently included, with a total of 256 generated entries. This total stems from generating four different variations for each human piece; two versions based on the original caption and two versions based on the original aspect description. As an AI-generated music dataset, SunoCaps also includes expert-based information on prompt alignment, with the main differences between prompt and final generation annotated. Furthermore, annotations describing the main discrete emotions induced by the piece. This dataset can have an array of implementations, such as creating and improving music generation validation tools, training systems for multi-layered architectures and the optimization of music emotion estimation systems. Keywords: Data; Automatic Music Generation; Emotion feature; Artificial Intelligence; Prompt alignment; Generative AI
h
MusicCaps
huggingface.co
Updated Jan 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ziyuan Zhao (2023). MusicCaps [Dataset]. https://huggingface.co/datasets/zhaoziyuan78/MusicCaps
Explore at:
Dataset updated
Jan 27, 2023
Authors
Ziyuan Zhao
Description
zhaoziyuan78/MusicCaps dataset hosted on Hugging Face and contributed by the HF Datasets community

Data from: WikiMuTe: A web-sourced dataset of semantic descriptions for...

zenodo.org

csv

Updated Apr 17, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Benno Weck; Benno Weck; Holger Kirchhoff; Holger Kirchhoff; Peter Grosche; Peter Grosche; Serra Xavier; Serra Xavier (2024). WikiMuTe: A web-sourced dataset of semantic descriptions for music audio [Dataset]. http://doi.org/10.5281/zenodo.10223363

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.10223363

Dataset updated

Apr 17, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Benno Weck; Benno Weck; Holger Kirchhoff; Holger Kirchhoff; Peter Grosche; Peter Grosche; Serra Xavier; Serra Xavier

License

Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically

Description

This upload contains the supplementary material for our paper presented at the MMM2024 conference.

Dataset

The dataset contains rich text descriptions for music audio files collected from Wikipedia articles.

The audio files are freely accessible and available for download through the URLs provided in the dataset.

Example

A few hand-picked, simplified examples of the dataset.

file	aspects	sentences
🔈 Bongo sound.wav	['bongoes', 'percussion instrument', 'cumbia', 'drums']	['a loop of bongoes playing a cumbia beat at 99 bpm']
🔈 Example of double tracking in a pop-rock song (3 guitar tracks).ogg	['bass', 'rock', 'guitar music', 'guitar', 'pop', 'drums']	['a pop-rock song']
🔈 OriginalDixielandJassBand-JazzMeBlues.ogg	['jazz standard', 'instrumental', 'jazz music', 'jazz']	['Considered to be a jazz standard', 'is an jazz composition']
🔈 Colin Ross - Etherea.ogg	['chirping birds', 'ambient percussion', 'new-age', 'flute', 'recorder', 'single instrument', 'woodwind']	['features a single instrument with delayed echo, as well as ambient percussion and chirping birds', 'a new-age composition for recorder']
🔈 Belau rekid (instrumental).oga	['instrumental', 'brass band']	['an instrumental brass band performance']
...	...	...

Dataset structure

We provide three variants of the dataset in the data folder.

All are described in the paper.

all.csv contains all the data we collected, without any filtering.
filtered_sf.csv contains the data obtained using the self-filtering method.
filtered_mc.csv contains the data obtained using the MusicCaps dataset method.

File structure

Each CSV file contains the following columns:

file: the name of the audio file
pageid: the ID of the Wikipedia article where the text was collected from
aspects: the short-form (tag) description texts collected from the Wikipedia articles
sentences: the long-form (caption) description texts collected from the Wikipedia articles
audio_url: the URL of the audio file
url: the URL of the Wikipedia article where the text was collected from

Citation

If you use this dataset in your research, please cite the following paper:

@inproceedings{wikimute,
  title = {WikiMuTe: {A} Web-Sourced Dataset of Semantic Descriptions for Music Audio},
  author = {Weck, Benno and Kirchhoff, Holger and Grosche, Peter and Serra, Xavier},
  booktitle = "MultiMedia Modeling",
  year = "2024",
  publisher = "Springer Nature Switzerland",
  address = "Cham",
  pages = "42--56",
  doi = {10.1007/978-3-031-56435-2_4},
  url = {https://doi.org/10.1007/978-3-031-56435-2_4},
}

License

The data is available under the Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.

Each entry in the dataset contains a URL linking to the article, where the text data was collected from.

h
MusicCaps-Wavs
huggingface.co
Updated Nov 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kianoosh Vadaei (2025). MusicCaps-Wavs [Dataset]. https://huggingface.co/datasets/Kia-vadaei/MusicCaps-Wavs
Explore at:
Dataset updated
Nov 10, 2025
Authors
Kianoosh Vadaei
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Kia-vadaei/MusicCaps-Wavs dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Musiccaps-image-aligned
huggingface.co
Updated Nov 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kianoosh Vadaei (2025). Musiccaps-image-aligned [Dataset]. https://huggingface.co/datasets/Kia-vadaei/Musiccaps-image-aligned
Explore at:
Dataset updated
Nov 10, 2025
Authors
Kianoosh Vadaei
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Kia-vadaei/Musiccaps-image-aligned dataset hosted on Hugging Face and contributed by the HF Datasets community
MusicCaps_musicdata
kaggle.com
zip
Updated Nov 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
recha_wine (2023). MusicCaps_musicdata [Dataset]. https://www.kaggle.com/datasets/rechawine/musiccaps-musicdata
Explore at:
zip(16606421553 bytes)Available download formats
Dataset updated
Nov 16, 2023
Authors
recha_wine
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by recha_wine

Released under Apache 2.0

Contents
h
LP-MusicCaps-MTT
huggingface.co
Updated Oct 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrei Blahovici (2024). LP-MusicCaps-MTT [Dataset]. https://huggingface.co/datasets/AndreiBlahovici/LP-MusicCaps-MTT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 27, 2024
Authors
Andrei Blahovici
Description
AndreiBlahovici/LP-MusicCaps-MTT dataset hosted on Hugging Face and contributed by the HF Datasets community
h
MusicBench
huggingface.co
Updated Nov 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mustango (2023). MusicBench [Dataset]. https://huggingface.co/datasets/Z873bliwf988hj/MusicBench
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 16, 2023
Authors
Mustango
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
MusicBench Dataset

The MusicBench dataset is a music audio-text pair dataset that was designed for text-to-music generation purpose and released along with Mustango text-to-music model. MusicBench is based on the MusicCaps dataset, which it expands from 5,521 samples to 52,768 training and 400 test samples!

Dataset Details

MusicBench expands MusicCaps by:

Including music features of chords, beats, tempo, and key that are extracted from the audio. Describing these music… See the full description on the dataset page: https://huggingface.co/datasets/Z873bliwf988hj/MusicBench.
C
MuChoMusic dataset
dataverse.csuc.cat
pdf, tsv, txt
Updated Oct 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benno Weck; Benno Weck; Ilaria Manco; Ilaria Manco; Emmanouil Benetos; Emmanouil Benetos; Elio Quinton; Elio Quinton; George Fazekas; George Fazekas; Dmitry Bogdanov; Dmitry Bogdanov (2025). MuChoMusic dataset [Dataset]. http://doi.org/10.34810/data2642
Explore at:
pdf(61027), txt(11323), tsv(301553)Available download formats
Unique identifier
https://doi.org/10.34810/data2642
Dataset updated
Oct 13, 2025
Dataset provided by
CORA.Repositori de Dades de Recerca
Authors
Benno Weck; Benno Weck; Ilaria Manco; Ilaria Manco; Emmanouil Benetos; Emmanouil Benetos; Elio Quinton; Elio Quinton; George Fazekas; George Fazekas; Dmitry Bogdanov; Dmitry Bogdanov
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models MuChoMusic is a benchmark designed to evaluate music understanding in multimodal language models focused on audio. It includes 1,187 multiple-choice questions validated by human annotators, based on 644 music tracks from two publicly available music datasets. These questions cover a wide variety of genres and assess knowledge and reasoning across several musical concepts and their cultural and functional contexts. The benchmark provides a holistic evaluation of five open-source models, revealing challenges such as over-reliance on the language modality and highlighting the need for better multimodal integration. Note on Audio Files This dataset comes without audio files. The audio files can be downloaded from two datasets: SongDescriberDataset (SDD) and MusicCaps. Please see the code repository for more information on how to download the audio. Citation If you use this dataset, please cite our paper: @inproceedings{weck2024muchomusic, title={MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models}, author={Weck, Benno and Manco, Ilaria and Benetos, Emmanouil and Quinton, Elio and Fazekas, György and Bogdanov, Dmitry}, booktitle = {Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR)}, year={2024} } Weck B, Manco I, Benetos E, Quinton E, Fazekas G, Bogdanov D. MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models. In: Kaneshiro B, Mysore G, Nieto O, Donahue C, Huang CZA, Lee JH, McFee B, McCallum M, editors. Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR2024); 2024 November 10-14; San Francisco, USA.
h
lp-music-caps-magnatagatune-3k-musicfm-embedding
huggingface.co
Updated Nov 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mulab-mir (2024). lp-music-caps-magnatagatune-3k-musicfm-embedding [Dataset]. https://huggingface.co/datasets/mulab-mir/lp-music-caps-magnatagatune-3k-musicfm-embedding
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 8, 2024
Dataset authored and provided by
mulab-mir
Description
mulab-mir/lp-music-caps-magnatagatune-3k-musicfm-embedding dataset hosted on Hugging Face and contributed by the HF Datasets community
h
music-audio-pseudo-captions
huggingface.co
Updated Aug 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
seungheon.doh (2023). music-audio-pseudo-captions [Dataset]. https://huggingface.co/datasets/seungheondoh/music-audio-pseudo-captions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 19, 2023
Authors
seungheon.doh
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for Music-Audio-Pseudo Captions

Pseudo Music and Audio Captions from LP-MusicCaps, Music Negation/Temporal Ordering WavCaps

Dataset Summary

Compared to other domains, music and audio domains cannot obtain well-written web caption data, and caption annotation is expensive. Therefore, we use the Music (LP-MusicCaps), (Music Negation/Temporal Ordering) and Audio (Wavcaps) datasets created with ChatGPT to re-organize them in the form of instructions, input… See the full description on the dataset page: https://huggingface.co/datasets/seungheondoh/music-audio-pseudo-captions.
h
GraySpectrogram
huggingface.co
Updated Oct 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ML (2023). GraySpectrogram [Dataset]. https://huggingface.co/datasets/mickylan2367/GraySpectrogram
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 22, 2023
Authors
ML
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Google/Music-Capsの音声データをスペクトログラム化したデータ。

Music Cpasとは：https://huggingface.co/datasets/google/MusicCaps GrayScaleじゃないほうもあるから見てね(⋈◍＞◡＜◍)。✧♡（これ）

基本情報

sampling_rate: int = 44100 20秒のwavファイル -> 1600×800のpngファイルへ変換 librosaの規格により、画像の縦軸：(0-10000?Hz), 画像の横軸：(0-40秒) 詳しくはlibrosa.specshow() -> https://librosa.org/doc/main/auto_examples/plot_display.html

使い方 0: データセットをダウンロード

from datasets import load_dataset data = load_dataset("mickylan2367/spectrogram") data… See the full description on the dataset page: https://huggingface.co/datasets/mickylan2367/GraySpectrogram.
h
MusicCaps-ru
huggingface.co
Updated Jun 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dmitry Balobin (2023). MusicCaps-ru [Dataset]. https://huggingface.co/datasets/d0rj/MusicCaps-ru
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2023
Authors
Dmitry Balobin
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
MusicCaps-ru

Translated version of google/MusicCaps into Russian.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Google (2023). MusicCaps [Dataset]. https://huggingface.co/datasets/google/MusicCaps

MusicCaps

google/MusicCaps

Explore at:

275 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 27, 2023

Dataset authored and provided by

Googlehttp://google.com/

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dataset Card for MusicCaps

  Dataset Summary

The MusicCaps dataset contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians. An aspect list is for example "pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead", while the caption consists of multiple sentences about the music, e.g., "A low sounding male voice is rapping over a fast paced drums… See the full description on the dataset page: https://huggingface.co/datasets/google/MusicCaps.

Clear search

Close search

Google apps

Main menu

MusicCaps

LP-MusicCaps-MSD

!important: Be careful when using caption_attribute_prediction (We don't recommend to use)!

Musiccaps

MusicCaps

MusicCaps-Spectrogram

Dataset

Contents

Final MusicCaps Wavs

Dataset

Contents

SunoCaps

MusicCaps

Data from: WikiMuTe: A web-sourced dataset of semantic descriptions for...

Dataset

Example

Dataset structure

File structure

Citation

License

MusicCaps-Wavs

Musiccaps-image-aligned

MusicCaps_musicdata

Dataset

Contents

LP-MusicCaps-MTT

MusicBench

MuChoMusic dataset

lp-music-caps-magnatagatune-3k-musicfm-embedding

music-audio-pseudo-captions

GraySpectrogram

MusicCaps-ru

MusicCapsSee More Versions

google/MusicCaps

MusicCaps