Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Card for MusicCaps
Dataset Summary
The MusicCaps dataset contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians. An aspect list is for example "pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead", while the caption consists of multiple sentences about the music, e.g., "A low sounding male voice is rapping over a fast paced drums… See the full description on the dataset page: https://huggingface.co/datasets/google/MusicCaps.
Facebook
Twitter======================================
Dataset Card for LP-MusicCaps-MSD
Dataset Summary
LP-MusicCaps is a Large Language Model based Pseudo Music Caption dataset for text-to-music and music-to-text tasks. We construct the music-to-caption pairs with tag-to-caption generation (using three existing multi-label tag datasets and four task… See the full description on the dataset page: https://huggingface.co/datasets/seungheondoh/LP-MusicCaps-MSD.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Musiccaps dataset with WAV files. AudioLM trained Soundstream model and Semantic Transformer model checkpoints.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
EvanSirius/MusicCaps dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Brijesh Giri
Released under Apache 2.0
Facebook
TwitterThis dataset was created by Kianoosh Vadaei
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Suno AI generated music using the Musicaps caption/aspect list and tagged for prompt alignment and main emotion.
M. Civit, V. Drai-Zerbib, D. Lizcano, M.J. Escalona, SunoCaps: A Novel Dataset of Text-Prompt Based AI-Generated Music with Emotion Annotations, Data in Brief, 2024, 110743, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2024.110743. (https://www.sciencedirect.com/science/article/pii/S2352340924007078) Abstract: The SunoCaps dataset aims to provide an innovative contribution to music data. Expert description of human-made musical pieces, from the widely used MusicCaps dataset, are used as prompts for generating complete songs for this dataset. This Automatic Music Generation is done with the state-of-the-art Suno generator of audio-based music. A subset of 64 pieces from MusicCaps is currently included, with a total of 256 generated entries. This total stems from generating four different variations for each human piece; two versions based on the original caption and two versions based on the original aspect description. As an AI-generated music dataset, SunoCaps also includes expert-based information on prompt alignment, with the main differences between prompt and final generation annotated. Furthermore, annotations describing the main discrete emotions induced by the piece. This dataset can have an array of implementations, such as creating and improving music generation validation tools, training systems for multi-layered architectures and the optimization of music emotion estimation systems. Keywords: Data; Automatic Music Generation; Emotion feature; Artificial Intelligence; Prompt alignment; Generative AI
Facebook
Twitterzhaoziyuan78/MusicCaps dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
This upload contains the supplementary material for our paper presented at the MMM2024 conference.
The dataset contains rich text descriptions for music audio files collected from Wikipedia articles.
The audio files are freely accessible and available for download through the URLs provided in the dataset.
A few hand-picked, simplified examples of the dataset.
|
file |
aspects |
sentences |
|
['bongoes', 'percussion instrument', 'cumbia', 'drums'] |
['a loop of bongoes playing a cumbia beat at 99 bpm'] | |
|
🔈 Example of double tracking in a pop-rock song (3 guitar tracks).ogg |
['bass', 'rock', 'guitar music', 'guitar', 'pop', 'drums'] |
['a pop-rock song'] |
|
['jazz standard', 'instrumental', 'jazz music', 'jazz'] |
['Considered to be a jazz standard', 'is an jazz composition'] | |
|
['chirping birds', 'ambient percussion', 'new-age', 'flute', 'recorder', 'single instrument', 'woodwind'] |
['features a single instrument with delayed echo, as well as ambient percussion and chirping birds', 'a new-age composition for recorder'] | |
|
['instrumental', 'brass band'] |
['an instrumental brass band performance'] | |
|
... |
... |
... |
We provide three variants of the dataset in the data folder.
All are described in the paper.
all.csv contains all the data we collected, without any filtering.filtered_sf.csv contains the data obtained using the self-filtering method.filtered_mc.csv contains the data obtained using the MusicCaps dataset method.Each CSV file contains the following columns:
file: the name of the audio filepageid: the ID of the Wikipedia article where the text was collected fromaspects: the short-form (tag) description texts collected from the Wikipedia articlessentences: the long-form (caption) description texts collected from the Wikipedia articlesaudio_url: the URL of the audio fileurl: the URL of the Wikipedia article where the text was collected fromIf you use this dataset in your research, please cite the following paper:
@inproceedings{wikimute,title = {WikiMuTe: {A} Web-Sourced Dataset of Semantic Descriptions for Music Audio},author = {Weck, Benno and Kirchhoff, Holger and Grosche, Peter and Serra, Xavier},booktitle = "MultiMedia Modeling",year = "2024",publisher = "Springer Nature Switzerland",address = "Cham",pages = "42--56",doi = {10.1007/978-3-031-56435-2_4},url = {https://doi.org/10.1007/978-3-031-56435-2_4},}
The data is available under the Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.
Each entry in the dataset contains a URL linking to the article, where the text data was collected from.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Kia-vadaei/MusicCaps-Wavs dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Kia-vadaei/Musiccaps-image-aligned dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by recha_wine
Released under Apache 2.0
Facebook
TwitterAndreiBlahovici/LP-MusicCaps-MTT dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
MusicBench Dataset
The MusicBench dataset is a music audio-text pair dataset that was designed for text-to-music generation purpose and released along with Mustango text-to-music model. MusicBench is based on the MusicCaps dataset, which it expands from 5,521 samples to 52,768 training and 400 test samples!
Dataset Details
MusicBench expands MusicCaps by:
Including music features of chords, beats, tempo, and key that are extracted from the audio. Describing these music… See the full description on the dataset page: https://huggingface.co/datasets/Z873bliwf988hj/MusicBench.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models MuChoMusic is a benchmark designed to evaluate music understanding in multimodal language models focused on audio. It includes 1,187 multiple-choice questions validated by human annotators, based on 644 music tracks from two publicly available music datasets. These questions cover a wide variety of genres and assess knowledge and reasoning across several musical concepts and their cultural and functional contexts. The benchmark provides a holistic evaluation of five open-source models, revealing challenges such as over-reliance on the language modality and highlighting the need for better multimodal integration. Note on Audio Files This dataset comes without audio files. The audio files can be downloaded from two datasets: SongDescriberDataset (SDD) and MusicCaps. Please see the code repository for more information on how to download the audio. Citation If you use this dataset, please cite our paper: @inproceedings{weck2024muchomusic, title={MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models}, author={Weck, Benno and Manco, Ilaria and Benetos, Emmanouil and Quinton, Elio and Fazekas, György and Bogdanov, Dmitry}, booktitle = {Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR)}, year={2024} } Weck B, Manco I, Benetos E, Quinton E, Fazekas G, Bogdanov D. MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models. In: Kaneshiro B, Mysore G, Nieto O, Donahue C, Huang CZA, Lee JH, McFee B, McCallum M, editors. Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR2024); 2024 November 10-14; San Francisco, USA.
Facebook
Twittermulab-mir/lp-music-caps-magnatagatune-3k-musicfm-embedding dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for Music-Audio-Pseudo Captions
Pseudo Music and Audio Captions from LP-MusicCaps, Music Negation/Temporal Ordering WavCaps
Dataset Summary
Compared to other domains, music and audio domains cannot obtain well-written web caption data, and caption annotation is expensive. Therefore, we use the Music (LP-MusicCaps), (Music Negation/Temporal Ordering) and Audio (Wavcaps) datasets created with ChatGPT to re-organize them in the form of instructions, input… See the full description on the dataset page: https://huggingface.co/datasets/seungheondoh/music-audio-pseudo-captions.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Google/Music-Capsの音声データをスペクトログラム化したデータ。
Music Cpasとは:https://huggingface.co/datasets/google/MusicCaps GrayScaleじゃないほうもあるから見てね(⋈◍>◡<◍)。✧♡(これ)
基本情報
sampling_rate: int = 44100 20秒のwavファイル -> 1600×800のpngファイルへ変換 librosaの規格により、画像の縦軸:(0-10000?Hz), 画像の横軸:(0-40秒) 詳しくはlibrosa.specshow() -> https://librosa.org/doc/main/auto_examples/plot_display.html
使い方
0: データセットをダウンロード
from datasets import load_dataset data = load_dataset("mickylan2367/spectrogram") data… See the full description on the dataset page: https://huggingface.co/datasets/mickylan2367/GraySpectrogram.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
MusicCaps-ru
Translated version of google/MusicCaps into Russian.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Card for MusicCaps
Dataset Summary
The MusicCaps dataset contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians. An aspect list is for example "pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead", while the caption consists of multiple sentences about the music, e.g., "A low sounding male voice is rapping over a fast paced drums… See the full description on the dataset page: https://huggingface.co/datasets/google/MusicCaps.