100+ datasets found

Z
MuMu: Multimodal Music Dataset
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Dec 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oramas, Sergio (2022). MuMu: Multimodal Music Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_831188
Explore at:
Dataset updated
Dec 6, 2022
Dataset authored and provided by
Oramas, Sergio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MuMu is a Multimodal Music dataset with multi-label genre annotations that combines information from the Amazon Reviews dataset and the Million Song Dataset (MSD). The former contains millions of album customer reviews and album metadata gathered from Amazon.com. The latter is a collection of metadata and precomputed audio features for a million songs.

To map the information from both datasets we use MusicBrainz. This process yields the final set of 147,295 songs, which belong to 31,471 albums. For the mapped set of albums, there are 447,583 customer reviews from the Amazon Dataset. The dataset have been used for multi-label music genre classification experiments in the related publication. In addition to genre annotations, this dataset provides further information about each album, such as genre annotations, average rating, selling rank, similar products, and cover image url. For every text review it also provides helpfulness score of the reviews, average rating, and summary of the review.

The mapping between the three datasets (Amazon, MusicBrainz and MSD), genre annotations, metadata, data splits, text reviews and links to images are available here. Images and audio files can not be released due to copyright issues.

MuMu dataset (mapping, metadata, annotations and text reviews)

Data splits and multimodal feature embeddings for ISMIR multi-label classification experiments

These data can be used together with the Tartarus deep learning library https://github.com/sergiooramas/tartarus.

NOTE: This version provides simplified files with metadata and splits.

Scientific References

Please cite the following papers if using MuMu dataset or Tartarus library.

Oramas, S., Barbieri, F., Nieto, O., and Serra, X (2018). Multimodal Deep Learning for Music Genre Classification, Transactions of the International Society for Music Information Retrieval, V(1).

Oramas S., Nieto O., Barbieri F., & Serra X. (2017). Multi-label Music Genre Classification from audio, text and images using Deep Features. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017). https://arxiv.org/abs/1707.04916
Z
MGD: Music Genre Dataset
data.niaid.nih.gov
zenodo.org
Updated May 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mariana O. Silva (2021). MGD: Music Genre Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4778562
Explore at:
Dataset updated
May 28, 2021
Dataset provided by
Danilo B. Seufitelli
Gabriel P. Oliveira
Mirella M. Moro
Mariana O. Silva
Anisio Lacerda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MGD: Music Genre Dataset

Over recent years, the world has seen a dramatic change in the way people consume music, moving from physical records to streaming services. Since 2017, such services have become the main source of revenue within the global recorded music market. Therefore, this dataset is built by using data from Spotify. It provides a weekly chart of the 200 most streamed songs for each country and territory it is present, as well as an aggregated global chart.

Considering that countries behave differently when it comes to musical tastes, we use chart data from global and regional markets from January 2017 to December 2019, considering eight of the top 10 music markets according to IFPI: United States (1st), Japan (2nd), United Kingdom (3rd), Germany (4th), France (5th), Canada (8th), Australia (9th), and Brazil (10th).

We also provide information about the hit songs and artists present in the charts, such as all collaborating artists within a song (since the charts only provide the main ones) and their respective genres, which is the core of this work. MGD also provides data about musical collaboration, as we build collaboration networks based on artist partnerships in hit songs. Therefore, this dataset contains:

Genre Networks: Success-based genre collaboration networks

Genre Mapping: Genre mapping from Spotify genres to super-genres

Artist Networks: Success-based artist collaboration networks

Artists: Some artist data

Hit Songs: Hit Song data and features

Charts: Enhanced data from Spotify Weekly Top 200 Charts

This dataset was originally built for a conference paper at ISMIR 2020. If you make use of the dataset, please also cite the following paper:

Gabriel P. Oliveira, Mariana O. Silva, Danilo B. Seufitelli, Anisio Lacerda, and Mirella M. Moro. Detecting Collaboration Profiles in Success-based Music Genre Networks. In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR 2020), 2020.

@inproceedings{ismir/OliveiraSSLM20, title = {Detecting Collaboration Profiles in Success-based Music Genre Networks}, author = {Gabriel P. Oliveira and Mariana O. Silva and Danilo B. Seufitelli and Anisio Lacerda and Mirella M. Moro}, booktitle = {21st International Society for Music Information Retrieval Conference} pages = {726--732}, year = {2020} }
Data from: MusicOSet: An Enhanced Open Dataset for Music Data Mining
zenodo.org
data.niaid.nih.gov
bin, zip
Updated Jun 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota (2021). MusicOSet: An Enhanced Open Dataset for Music Data Mining [Dataset]. http://doi.org/10.5281/zenodo.4904639
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4904639
Dataset updated
Jun 7, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MusicOSet is an open and enhanced dataset of musical elements (artists, songs and albums) based on musical popularity classification. Provides a directly accessible collection of data suitable for numerous tasks in music data mining (e.g., data visualization, classification, clustering, similarity search, MIR, HSS and so forth). To create MusicOSet, the potential information sources were divided into three main categories: music popularity sources, metadata sources, and acoustic and lyrical features sources. Data from all three categories were initially collected between January and May 2019. Nevertheless, the update and enhancement of the data happened in June 2019.

The attractive features of MusicOSet include:

Integration and centralization of different musical data sources

Calculation of popularity scores and classification of hits and non-hits musical elements, varying from 1962 to 2018

Enriched metadata for music, artists, and albums from the US popular music industry

Availability of acoustic and lyrical resources

Unrestricted access in two formats: SQL database and compressed .csv files

| Data | # Records | |:-----------------:|:---------:| | Songs | 20,405 | | Artists | 11,518 | | Albums | 26,522 | | Lyrics | 19,664 | | Acoustic Features | 20,405 | | Genres | 1,561 |
m
Music Dataset: Lyrics and Metadata from 1950 to 2019
data.mendeley.com
Updated Aug 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luan Moura (2020). Music Dataset: Lyrics and Metadata from 1950 to 2019 [Dataset]. http://doi.org/10.17632/3t9vbwxgr5.2
Explore at:
Unique identifier
https://doi.org/10.17632/3t9vbwxgr5.2
Dataset updated
Aug 24, 2020
Authors
Luan Moura
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset provides a list of lyrics from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc. We also provide some informations as lyrics which can be used to natural language processing.

The audio data was scraped using Echo Nest® API integrated engine with spotipy Python’s package. The spotipy API permits the user to search for specific genres, artists,songs, release date, etc. To obtain the lyrics we used the Lyrics Genius® API as baseURL for requesting data based on the song title and artist name.
h
youtube-music-hits
huggingface.co
Updated Nov 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akbar Gherbal (2024). youtube-music-hits [Dataset]. https://huggingface.co/datasets/akbargherbal/youtube-music-hits
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 14, 2024
Authors
Akbar Gherbal
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
YouTube
Description
YouTube Music Hits Dataset

A collection of YouTube music video data sourced from Wikidata, focusing on videos with significant viewership metrics.

Dataset Description Overview

24,329 music videos View range: 1M to 5.5B views Temporal range: 1977-2024

Features

youtubeId: YouTube video identifier itemLabel: Video/song title performerLabel: Artist/band name youtubeViews: View count year: Release year genreLabel: Musical genre(s)

View… See the full description on the dataset page: https://huggingface.co/datasets/akbargherbal/youtube-music-hits.
Spotify Dataset
brightdata.com
.json, .csv, .xlsx
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data, Spotify Dataset [Dataset]. https://brightdata.com/products/datasets/spotify
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Gain valuable insights into music trends, artist popularity, and streaming analytics with our comprehensive Spotify Dataset. Designed for music analysts, marketers, and businesses, this dataset provides structured and reliable data from Spotify to enhance market research, content strategy, and audience engagement.

Dataset Features

Track Information: Access detailed data on songs, including track name, artist, album, genre, and release date. Streaming Popularity: Extract track popularity scores, listener engagement metrics, and ranking trends. Artist & Album Insights: Analyze artist performance, album releases, and genre trends over time. Related Searches & Recommendations: Track related search terms and suggested content for deeper audience insights. Historical & Real-Time Data: Retrieve historical streaming data or access continuously updated records for real-time trend analysis.

Customizable Subsets for Specific Needs Our Spotify Dataset is fully customizable, allowing you to filter data based on track popularity, artist, genre, release date, or listener engagement. Whether you need broad coverage for industry analysis or focused data for content optimization, we tailor the dataset to your needs.

Popular Use Cases

Market Analysis & Trend Forecasting: Identify emerging music trends, genre popularity, and listener preferences. Artist & Label Performance Tracking: Monitor artist rankings, album success, and audience engagement. Competitive Intelligence: Analyze competitor music strategies, playlist placements, and streaming performance. AI & Machine Learning Applications: Use structured music data to train AI models for recommendation engines, playlist curation, and predictive analytics. Advertising & Sponsorship Insights: Identify high-performing tracks and artists for targeted advertising and sponsorship opportunities.

Whether you're optimizing music marketing, analyzing streaming trends, or enhancing content strategies, our Spotify Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
m
Music Dataset: Lyrics and Metadata from 1950 to 2019
data.mendeley.com
narcis.nl
Updated Oct 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luan Moura (2020). Music Dataset: Lyrics and Metadata from 1950 to 2019 [Dataset]. http://doi.org/10.17632/3t9vbwxgr5.3
Explore at:
Unique identifier
https://doi.org/10.17632/3t9vbwxgr5.3
Dataset updated
Oct 23, 2020
Authors
Luan Moura
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was studied on Temporal Analysis and Visualisation of Music paper, in the following link:

https://sol.sbc.org.br/index.php/eniac/article/view/12155

This dataset provides a list of lyrics from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc. We also provide some informations as lyrics which can be used to natural language processing.

The audio data was scraped using Echo Nest® API integrated engine with spotipy Python’s package. The spotipy API permits the user to search for specific genres, artists,songs, release date, etc. To obtain the lyrics we used the Lyrics Genius® API as baseURL for requesting data based on the song title and artist name.
Music Genre fMRI Dataset
openneuro.org
Updated Aug 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomoya Nakai; Naoko Koide-Majima; Shinji Nishimoto (2023). Music Genre fMRI Dataset [Dataset]. http://doi.org/10.18112/openneuro.ds003720.v1.0.1
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds003720.v1.0.1
Dataset updated
Aug 23, 2023
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Tomoya Nakai; Naoko Koide-Majima; Shinji Nishimoto
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Music Genre fMRI Dataset by Tomoya Nakai, Naoko Koide-Majima, and Shinji Nishimoto

References:

Nakai, Koide-Majima, and Nishimoto (2021). Correspondence of categorical and feature-based representations of music in the human brain. Brain and Behavior. 11(1), e01936. https://doi.org/10.1002/brb3.1936

Nakai, Koide-Majima, and Nishimoto (2022). Music genre neuroimaging dataset. Data in Brief. 40, 107675. https://doi.org/10.1016/j.dib.2021.107675

We measured brain activity using functional MRI while five subjects (“sub-001”, …, “sub-005”) listened to music stimuli of 10 different genres.

The entire folder consists of subject-wise subfolders (“sub-001”,…). Each subject’s folder contains the following subfolders:

anat: T1-weighted structural images

func: functional signals (multi-band echo-planar images)

Each subject performed 18 runs consisting of 12 training runs and 6 test runs. The training and test data were assigned with the following notations:

Training data: sub-00*_task-Training_run-**_bold.json

Test data: sub-00*_task-Test_run-**_bold.json

Each *_event.tsv file contains following information:

Onset: stimulus onset

Genre: genre type (out of 10 genres)

Track: index to identify the original track

Start: onset of excerpt from the original track (second)

End: offset of excerpt from the original track (second)

The duration of all stimuli is 15s.　For each clip, 2 s of fade-in and fade-out effects were applied, and the overall signal intensity was normalized in terms of the root mean square.

For the training runs, the 1st stimulus (0-15s) is the same as the last stimulus of the previous run (600-615s). For the test runs, the1st stimulus (0-15s) is the same as the last stimulus of the same run (600-615s).

Preprocessed data are available from Zenodo (https://doi.org/10.5281/zenodo.8275363). Experimental stimuli can be generated using GTZAN_Preprocess.py included in the same repository.

The original music stimuli (GTZAN dataset) can be found here: https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification

Caution This dataset can be used for research purposes only. The data were anonymized, and users shall not perform analyses to re-identify individual subjects.
P
JVS-MuSiC Dataset
paperswithcode.com
Updated Jun 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). JVS-MuSiC Dataset [Dataset]. https://paperswithcode.com/dataset/jvs-music
Explore at:
Dataset updated
Jun 3, 2024
Description
JVS-MuSiC is a Japanese multispeaker singing-voice corpus called "JVS-MuSiC" with the aim to analyze and synthesize a variety of voices. The corpus consists of 100 singers' recordings of the same song, Katatsumuri, which is a Japanese children's song. It also includes another song that is different for each singer.
Music Dataset
brightdata.com
.json, .csv, .xlsx
Updated Jan 6, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2017). Music Dataset [Dataset]. https://brightdata.com/products/datasets/music
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Jan 6, 2017
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Unlock powerful insights with our custom music datasets, offering access to millions of records from popular music platforms like Spotify, SoundCloud, Amazon Music, YouTube Music, and more. These datasets provide comprehensive data points such as track titles, artists, albums, genres, release dates, play counts, playlist details, popularity scores, user-generated tags, and much more, allowing you to analyze music trends, listener behavior, and industry patterns with precision. Use these datasets to optimize your music strategies by identifying trending tracks, analyzing artist performance, understanding playlist dynamics, and tracking audience preferences across platforms. Gain valuable insights into streaming habits, regional popularity, and emerging genres to make data-driven decisions that enhance your marketing campaigns, content creation, and audience engagement. Whether you’re a music producer, marketer, data analyst, or researcher, our music datasets empower you with the data needed to stay ahead in the ever-evolving music industry. Available in various formats such as JSON, CSV, and Parquet, and delivered via flexible options like API, S3, or email, these datasets ensure seamless integration into your workflows.
P
taste-music-dataset Dataset
paperswithcode.com
Updated Mar 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matteo Spanio; Massimiliano Zampini; Antonio Rodà; Franco Pierucci (2025). taste-music-dataset Dataset [Dataset]. https://paperswithcode.com/dataset/taste-music-dataset
Explore at:
Dataset updated
Mar 3, 2025
Authors
Matteo Spanio; Massimiliano Zampini; Antonio Rodà; Franco Pierucci
Description
This dataset is a patched version of The Taste & Affect Music Database by D. Guedes et al. It is a set of captions that describe 100 musical pieces and associate with them gustatory keywords on the basis of Guedes findings.

🎧 200K+ Spotify Songs Light Dataset

kaggle.com

Updated Apr 16, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

DevDope (2025). 🎧 200K+ Spotify Songs Light Dataset [Dataset]. https://www.kaggle.com/datasets/devdope/200k-spotify-songs-light-dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 16, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

DevDope

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

This dataset was part of the Top 200 projects in the NVIDIA Llama-Index Contest, supporting the Abracadabra project — a Retrieval-Augmented Generation (RAG) system for intelligent playlist creation using LLMs.

Overview

This lightweight version of the music dataset offers a simplified yet powerful subset of features for each track, making it ideal for:

Quick experiments
Visualizations
Emotion-based music analysis

🧠 How Emotions Were Extracted

Emotions in the emotion column were automatically generated using the Hugging Face model:

🔗 mrm8488/t5-base-finetuned-emotion

📊 Column Descriptions

📌 Included Features

Column	Description	Example
`artist`	Name of the artist or music group.	Steven Wilson
`song`	Title of the song.	The Raven That Refused to Sing
`emotion`	Dominant emotion extracted from lyrics using a fine-tuned language model.	sadness
`variance`	Variability measure across audio features of the song.	0.83
`Genre`	Primary musical genre.	progressive
`Release Date`	Year of release.	2013
`Key`	Musical key.	F Maj
`Tempo`	Tempo in BPM.	137
`Loudness`	Average volume level in decibels (usually negative).	-13.07
`Explicit`	Indicates whether the track contains explicit content.	No
`Popularity`	Popularity score (0–100).	38
`Energy`	Perceived energy level (0–100).	23
`Danceability`	How danceable the track is (0–100).	29
`Positiveness`	Positivity or valence score (0–100).	7
`Speechiness`	Presence of spoken words (0–100).	3
`Liveness`	Likelihood of the track being a live performance (0–100).	6
`Acousticness`	Acoustic quality score (0–100).	20
`Instrumentalness`	Likelihood that the track is instrumental (0–100).	34

🔗 Full Dataset Available

Looking for the full version with:

30+ features per track
Complete lyrics
Playlist context labels
Similar song recommendations

👉 Check it out on Kaggle

Enjoy exploring the sound of data 🎶

MusicCaps
huggingface.co
Updated Jan 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2023). MusicCaps [Dataset]. https://huggingface.co/datasets/google/MusicCaps
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 27, 2023
Dataset authored and provided by
Googlehttp://google.com/
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for MusicCaps

Dataset Summary

The MusicCaps dataset contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians. An aspect list is for example "pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead", while the caption consists of multiple sentences about the music, e.g., "A low sounding male voice is rapping over a fast paced drums… See the full description on the dataset page: https://huggingface.co/datasets/google/MusicCaps.
Song Describer Dataset
zenodo.org
huggingface.co
+1more
csv, pdf, tsv, txt +1
Updated Jul 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ilaria Manco; Ilaria Manco; Benno Weck; Benno Weck; Dmitry Bogdanov; Dmitry Bogdanov; Philip Tovstogan; Philip Tovstogan; Minz Won; Minz Won (2024). Song Describer Dataset [Dataset]. http://doi.org/10.5281/zenodo.10072001
Explore at:
tsv, csv, zip, txt, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10072001
Dataset updated
Jul 10, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ilaria Manco; Ilaria Manco; Benno Weck; Benno Weck; Dmitry Bogdanov; Dmitry Bogdanov; Philip Tovstogan; Philip Tovstogan; Minz Won; Minz Won
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation
A retro-futurist drum machine groove drenched in bubbly synthetic sound effects and a hint of an acid bassline.
The Song Describer Dataset (SDD) contains ~1.1k captions for 706 permissively licensed music recordings. It is designed for use in evaluation of models that address music-and-language (M&L) tasks such as music captioning, text-to-music generation and music-language retrieval. More information about the data, collection method and validation is provided in the paper describing the dataset.
If you use this dataset, please cite our paper:
The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation, Manco, Ilaria and Weck, Benno and Doh, Seungheon and Won, Minz and Zhang, Yixiao and Bogdanov, Dmitry and Wu, Yusong and Chen, Ke and Tovstogan, Philip and Benetos, Emmanouil and Quinton, Elio and Fazekas, György and Nam, Juhan, Machine Learning for Audio Workshop at NeurIPS 2023, 2023
Ways to discover new music worldwide 2022, by age
statista.com
Updated Aug 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Ways to discover new music worldwide 2022, by age [Dataset]. https://www.statista.com/statistics/1273351/new-music-discovery-by-age-worldwide/
Explore at:
Dataset updated
Aug 2, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2021
Area covered
Worldwide
Description
According to a study on music consumption worldwide in 2022, younger generations tended to find new songs via music apps and social media, while older generations also used the radio as a format to discover new audio content.
Indian Folk Music Dataset
zenodo.org
data.niaid.nih.gov
bin
Updated May 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yeshwant Singh; Yeshwant Singh; Lilapati Waikhom; Lilapati Waikhom; Vivek Meena; Vivek Meena; Anupam Biswas; Anupam Biswas (2022). Indian Folk Music Dataset [Dataset]. http://doi.org/10.5281/zenodo.6584021
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6584021
Dataset updated
May 27, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yeshwant Singh; Yeshwant Singh; Lilapati Waikhom; Lilapati Waikhom; Vivek Meena; Vivek Meena; Anupam Biswas; Anupam Biswas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is a collection of mel-spectrogram features extracted from Indian folk music containing the following 15 folk styles:
Bauls, Bhavageethe, Garba, Kajri, Maand, Sohar, Tamang Selo, Veeragase, Bhatiali, Bihu, Gidha, Lavani, Naatupura Paatu, Sufi, Uttarakhandi.

The number of recordings varies from 16 to 50 in the mentioned folk styles representing the scarcity of availability of given folk styles on the Internet. There are at least 4 artists and a maximum of 22. Overall there are 125 artists (34 female + 91 male) in these 15 folk styles.

There is a total of 606 recordings in the dataset, with a total duration of 54.45 hrs.
Mel-spectrogram is extracted from a 3-second segment with each song's 1/2 second sliding window. Extracted mel-spectrogram for each segment is annotated with folk_style, state, artist, gender, song, source, no_of_artists, folk_style_id, state_id, artist_id, gender_id.
_
This project was funded under the grant number: ECR/2018/000204 by the Science & Engineering Research Board (SERB).
H
Music and emotion dataset (Primary Musical Cues)
dataverse.harvard.edu
datamed.org
Updated Jan 18, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tuomas Eerola (2016). Music and emotion dataset (Primary Musical Cues) [Dataset]. http://doi.org/10.7910/DVN/IFOBRN
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/IFOBRN
Dataset updated
Jan 18, 2016
Dataset provided by
Harvard Dataverse
Authors
Tuomas Eerola
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Stimulus materials, design matrix, and mean ratings for music and emotion study using optimal design in factorial manipulation of musical features (Eerola, Friberg & Bresin, 2013).
AAM: Artificial Audio Multitracks Dataset
zenodo.org
data.niaid.nih.gov
zip
Updated Mar 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fabian Ostermann; Fabian Ostermann; Igor Vatolkin; Igor Vatolkin (2023). AAM: Artificial Audio Multitracks Dataset [Dataset]. http://doi.org/10.5281/zenodo.5794629
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5794629
Dataset updated
Mar 24, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Fabian Ostermann; Fabian Ostermann; Igor Vatolkin; Igor Vatolkin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains 3,000 artificial music audio tracks with rich annotations. It is based on real instrument samples and generated by algorithmic composition with respect to music theory.

It provides full mixes of the songs as well as single instrument tracks. The midis used for generation are also available. The annotation files include: Onsets, Pitches, Instruments, Keys, Tempos, Segments, Melody instrument, Beats, and Chords.

A presentation paper was published open-access in EURASIP Journal on Audio, Speech, and Music Processing.

Current development and source code of the generator tool can be found on GitHub.

For a tiny version for demonstration and testing purposes see: zenodo.6771120
h
MusicBench
huggingface.co
Updated Nov 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AMAAI Lab (2023). MusicBench [Dataset]. https://huggingface.co/datasets/amaai-lab/MusicBench
Explore at:
Dataset updated
Nov 16, 2023
Dataset authored and provided by
AMAAI Lab
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
MusicBench Dataset

The MusicBench dataset is a music audio-text pair dataset that was designed for text-to-music generation purpose and released along with Mustango text-to-music model. MusicBench is based on the MusicCaps dataset, which it expands from 5,521 samples to 52,768 training and 400 test samples!

Dataset Details

MusicBench expands MusicCaps by:

Including music features of chords, beats, tempo, and key that are extracted from the audio. Describing these music… See the full description on the dataset page: https://huggingface.co/datasets/amaai-lab/MusicBench.
E
Arab-Andalusian music corpus
live.european-language-grid.eu
zenodo.org
audio mp3
Updated Sep 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Arab-Andalusian music corpus [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7417
Explore at:
audio mp3Available download formats
Dataset updated
Sep 30, 2021
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This repository contains Arab-Andalusian corpus collected in the CompMusic project.The following files are available for 164 concert recordings (overall playable time more than 125 hours):- Audio in mp3 format (44.1kHz or 48 kHz sampling, 128 Kbps and higher, mono or stereo)- Score in music xml format (manual transcriptions by the first author)- Automatically computed pitch (text format) and pitch distribution (json format) descriptorsThe meta data of the recordings (title, form, mizan, nawba and tab) are provided in separate json files. The corresponding MusicBrainz collection is available at this link. Metadata is subject to improvements as it is collected via crowdsourcing on MusicBrainz. We gather and share new versions of the meta data (for the same audio content) at this link.The lyrics for the recordings are available from the Arab-Andalusian music lyrics dataset.For more information, please refer to http://compmusic.upf.edu/corporaA scientific publication making use of this database is available here: Nawba Recognition for Arab-Andalusian Music Using Templates From Music Scores.

Facebook

Twitter

Click to copy link

Link copied

Cite

Oramas, Sergio (2022). MuMu: Multimodal Music Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_831188

MuMu: Multimodal Music Dataset

Explore at:

Dataset updated

Dec 6, 2022

Dataset authored and provided by

Oramas, Sergio

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

MuMu is a Multimodal Music dataset with multi-label genre annotations that combines information from the Amazon Reviews dataset and the Million Song Dataset (MSD). The former contains millions of album customer reviews and album metadata gathered from Amazon.com. The latter is a collection of metadata and precomputed audio features for a million songs.

To map the information from both datasets we use MusicBrainz. This process yields the final set of 147,295 songs, which belong to 31,471 albums. For the mapped set of albums, there are 447,583 customer reviews from the Amazon Dataset. The dataset have been used for multi-label music genre classification experiments in the related publication. In addition to genre annotations, this dataset provides further information about each album, such as genre annotations, average rating, selling rank, similar products, and cover image url. For every text review it also provides helpfulness score of the reviews, average rating, and summary of the review.

The mapping between the three datasets (Amazon, MusicBrainz and MSD), genre annotations, metadata, data splits, text reviews and links to images are available here. Images and audio files can not be released due to copyright issues.

MuMu dataset (mapping, metadata, annotations and text reviews)

Data splits and multimodal feature embeddings for ISMIR multi-label classification experiments

These data can be used together with the Tartarus deep learning library https://github.com/sergiooramas/tartarus.

NOTE: This version provides simplified files with metadata and splits.

Scientific References

Please cite the following papers if using MuMu dataset or Tartarus library.

Oramas, S., Barbieri, F., Nieto, O., and Serra, X (2018). Multimodal Deep Learning for Music Genre Classification, Transactions of the International Society for Music Information Retrieval, V(1).

Oramas S., Nieto O., Barbieri F., & Serra X. (2017). Multi-label Music Genre Classification from audio, text and images using Deep Features. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017). https://arxiv.org/abs/1707.04916

Clear search

Close search

Google apps

Main menu

MuMu: Multimodal Music Dataset

MGD: Music Genre Dataset

Data from: MusicOSet: An Enhanced Open Dataset for Music Data Mining

Music Dataset: Lyrics and Metadata from 1950 to 2019

youtube-music-hits

Spotify Dataset

Music Dataset: Lyrics and Metadata from 1950 to 2019

Music Genre fMRI Dataset

Music Genre fMRI Dataset by Tomoya Nakai, Naoko Koide-Majima, and Shinji Nishimoto

JVS-MuSiC Dataset

Music Dataset

taste-music-dataset Dataset

🎧 200K+ Spotify Songs Light Dataset

Overview

🧠 How Emotions Were Extracted

📊 Column Descriptions

📌 Included Features

🔗 Full Dataset Available

MusicCaps

Song Describer Dataset

The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

Ways to discover new music worldwide 2022, by age

Indian Folk Music Dataset

Music and emotion dataset (Primary Musical Cues)

AAM: Artificial Audio Multitracks Dataset

MusicBench

Arab-Andalusian music corpus

MuMu: Multimodal Music Dataset