100+ datasets found
  1. Music Dataset: Song Information and Lyrics

    • kaggle.com
    zip
    Updated May 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suraj (2023). Music Dataset: Song Information and Lyrics [Dataset]. https://www.kaggle.com/datasets/suraj520/music-dataset-song-information-and-lyrics
    Explore at:
    zip(1992670 bytes)Available download formats
    Dataset updated
    May 22, 2023
    Authors
    Suraj
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Dataset's Purpose: This dataset's goal is to give a complete collection of music facts and lyrics for study and development. It aspires to be a useful resource for a variety of applications such as music analysis, natural language processing, sentiment analysis, recommendation systems, and others. This dataset, which combines song information and lyrics, can help academics, developers, and music fans examine and analyse the link between listeners' preferences and lyrical content.

    Dataset Description:

    The music dataset contains around 660 songs, each with its own set of characteristics. The following characteristics are included in the dataset:

    Name: The title of the song. Lyrics: The lyrics of the song. Singer: The name of the singer or artist who performed the song. Movie: The movie or album associated with the song (if applicable). Genre: The genre or genres to which the song belongs. Rating: The rating or popularity score of the song from Spotify.

    The dataset is intended to give a wide variety of songs from various genres, performers, and films. It includes popular songs from numerous ages and places, as well as a wide spectrum of musical styles. The lyrics were obtained from publically accessible services such as Spotify and Soundcloud, and were converted from audio to text using speech recognition algorithms. While every attempt has been taken to assure correctness, please keep in mind that owing to the limits of the data sources and voice recognition algorithms, there may be inaccuracies or missing lyrics encountered upon transcribing.

    Use Cases in Research and Development:

    This music dataset has several research and development applications. Among the possible applications are:

    1. Music Analysis: By analysing the links between song elements such as genre, vocalist, and rating, researchers can acquire insights into the features and patterns of various music genres.
    2. Natural Language Processing (NLP): NLP researchers may use the lyrics to create language models, sentiment analysis algorithms, topic modelling approaches, and other text-based music studies.
    3. Recommendation Systems: Using the information, developers may create recommendation systems that offer music based on user preferences, lyrics sentiment, or genre similarities.
    4. Music Generating Machine Learning Models: The dataset may be used to train machine learning models for generating new lyrics or making music compositions.
    5. Music Sentiment Analysis: To get insights into the emotional components of music and its influence on listeners, researchers might analyse the feelings conveyed in song lyrics.
    6. Movie Soundtracks Analysis: Researchers can explore the association between song attributes and their use in movie soundtracks by investigating the movie attribute.

    Overall, the goal of this music dataset is to provide a rich resource for academics, developers, and music fans to investigate the complicated relationships between song features, lyrics, and numerous research and development applications in the music domain.

  2. m

    Music Dataset: Lyrics and Metadata from 1950 to 2019

    • data.mendeley.com
    Updated Aug 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luan Moura (2020). Music Dataset: Lyrics and Metadata from 1950 to 2019 [Dataset]. http://doi.org/10.17632/3t9vbwxgr5.2
    Explore at:
    Dataset updated
    Aug 24, 2020
    Authors
    Luan Moura
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides a list of lyrics from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc. We also provide some informations as lyrics which can be used to natural language processing.

    The audio data was scraped using Echo Nest® API integrated engine with spotipy Python’s package. The spotipy API permits the user to search for specific genres, artists,songs, release date, etc. To obtain the lyrics we used the Lyrics Genius® API as baseURL for requesting data based on the song title and artist name.

  3. h

    youtube-music-hits

    • huggingface.co
    Updated Nov 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akbar Gherbal (2024). youtube-music-hits [Dataset]. https://huggingface.co/datasets/akbargherbal/youtube-music-hits
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 14, 2024
    Authors
    Akbar Gherbal
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    YouTube
    Description

    YouTube Music Hits Dataset

    A collection of YouTube music video data sourced from Wikidata, focusing on videos with significant viewership metrics.

      Dataset Description
    
    
    
    
    
      Overview
    

    24,329 music videos View range: 1M to 5.5B views Temporal range: 1977-2024

      Features
    

    youtubeId: YouTube video identifier itemLabel: Video/song title performerLabel: Artist/band name youtubeViews: View count year: Release year genreLabel: Musical genre(s)

      View… See the full description on the dataset page: https://huggingface.co/datasets/akbargherbal/youtube-music-hits.
    
  4. h

    music_genre

    • huggingface.co
    Updated Sep 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CCMUSIC Database (2023). music_genre [Dataset]. https://huggingface.co/datasets/ccmusic-database/music_genre
    Explore at:
    Dataset updated
    Sep 30, 2023
    Dataset authored and provided by
    CCMUSIC Database
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Dataset Card for Music Genre

    The Default dataset comprises approximately 1,700 musical pieces in .mp3 format, sourced from the NetEase music. The lengths of these pieces range from 270 to 300 seconds. All are sampled at the rate of 22,050 Hz. As the website providing the audio music includes style labels for the downloaded music, there are no specific annotators involved. Validation is achieved concurrently with the downloading process. They are categorized into a total of 16… See the full description on the dataset page: https://huggingface.co/datasets/ccmusic-database/music_genre.

  5. Z

    MuMu: Multimodal Music Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oramas, Sergio (2022). MuMu: Multimodal Music Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_831188
    Explore at:
    Dataset updated
    Dec 6, 2022
    Dataset provided by
    Universitat Pompeu Fabra
    Authors
    Oramas, Sergio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MuMu is a Multimodal Music dataset with multi-label genre annotations that combines information from the Amazon Reviews dataset and the Million Song Dataset (MSD). The former contains millions of album customer reviews and album metadata gathered from Amazon.com. The latter is a collection of metadata and precomputed audio features for a million songs.

    To map the information from both datasets we use MusicBrainz. This process yields the final set of 147,295 songs, which belong to 31,471 albums. For the mapped set of albums, there are 447,583 customer reviews from the Amazon Dataset. The dataset have been used for multi-label music genre classification experiments in the related publication. In addition to genre annotations, this dataset provides further information about each album, such as genre annotations, average rating, selling rank, similar products, and cover image url. For every text review it also provides helpfulness score of the reviews, average rating, and summary of the review.

    The mapping between the three datasets (Amazon, MusicBrainz and MSD), genre annotations, metadata, data splits, text reviews and links to images are available here. Images and audio files can not be released due to copyright issues.

    MuMu dataset (mapping, metadata, annotations and text reviews)

    Data splits and multimodal feature embeddings for ISMIR multi-label classification experiments

    These data can be used together with the Tartarus deep learning library https://github.com/sergiooramas/tartarus.

    NOTE: This version provides simplified files with metadata and splits.

    Scientific References

    Please cite the following papers if using MuMu dataset or Tartarus library.

    Oramas, S., Barbieri, F., Nieto, O., and Serra, X (2018). Multimodal Deep Learning for Music Genre Classification, Transactions of the International Society for Music Information Retrieval, V(1).

    Oramas S., Nieto O., Barbieri F., & Serra X. (2017). Multi-label Music Genre Classification from audio, text and images using Deep Features. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017). https://arxiv.org/abs/1707.04916

  6. Worldwide Music Artists Dataset (with image)

    • kaggle.com
    zip
    Updated Aug 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harsh D. Prajapati (2024). Worldwide Music Artists Dataset (with image) [Dataset]. https://www.kaggle.com/datasets/harshdprajapati/worldwide-music-artists-dataset-with-image
    Explore at:
    zip(5859852 bytes)Available download formats
    Dataset updated
    Aug 18, 2024
    Authors
    Harsh D. Prajapati
    Description

    🎵 Worldwide Music Artists Dataset (with image) 🎤

    Welcome to the Worldwide Music Artists Dataset—your go-to resource for exploring the global music scene! 🌍🎶

    This dataset features 100,000+ music artists from around the world, complete with: 📝 Name: Discover artists from every genre and corner of the globe. 🎸 Genres: Whether it's pop, rock, jazz, or classical, find your favorite styles. 📸 Profile Image: Visualize each artist with their unique profile picture. 📍 Location: See where your favorite artists hail from.

    Whether you're a music enthusiast, a data scientist or a developer this dataset is perfect for your next project! 🚀

  7. Data from: MusicOSet: An Enhanced Open Dataset for Music Data Mining

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    bin, zip
    Updated Jun 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota (2021). MusicOSet: An Enhanced Open Dataset for Music Data Mining [Dataset]. http://doi.org/10.5281/zenodo.4904639
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jun 7, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MusicOSet is an open and enhanced dataset of musical elements (artists, songs and albums) based on musical popularity classification. Provides a directly accessible collection of data suitable for numerous tasks in music data mining (e.g., data visualization, classification, clustering, similarity search, MIR, HSS and so forth). To create MusicOSet, the potential information sources were divided into three main categories: music popularity sources, metadata sources, and acoustic and lyrical features sources. Data from all three categories were initially collected between January and May 2019. Nevertheless, the update and enhancement of the data happened in June 2019.

    The attractive features of MusicOSet include:

    • Integration and centralization of different musical data sources
    • Calculation of popularity scores and classification of hits and non-hits musical elements, varying from 1962 to 2018
    • Enriched metadata for music, artists, and albums from the US popular music industry
    • Availability of acoustic and lyrical resources
    • Unrestricted access in two formats: SQL database and compressed .csv files
    |    Data    | # Records |
    |:-----------------:|:---------:|
    | Songs       | 20,405  |
    | Artists      | 11,518  |
    | Albums      | 26,522  |
    | Lyrics      | 19,664  |
    | Acoustic Features | 20,405  |
    | Genres      | 1,561   |
  8. MusicCaps

    • huggingface.co
    • kaggle.com
    Updated Jan 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2023). MusicCaps [Dataset]. https://huggingface.co/datasets/google/MusicCaps
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 27, 2023
    Dataset authored and provided by
    Googlehttp://google.com/
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for MusicCaps

      Dataset Summary
    

    The MusicCaps dataset contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians. An aspect list is for example "pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead", while the caption consists of multiple sentences about the music, e.g., "A low sounding male voice is rapping over a fast paced drums… See the full description on the dataset page: https://huggingface.co/datasets/google/MusicCaps.

  9. Indian Regional Music Dataset

    • zenodo.org
    bin
    Updated May 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yeshwant Singh; Yeshwant Singh; Anupam Biswas; Anupam Biswas (2022). Indian Regional Music Dataset [Dataset]. http://doi.org/10.5281/zenodo.5825830
    Explore at:
    binAvailable download formats
    Dataset updated
    May 27, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yeshwant Singh; Yeshwant Singh; Anupam Biswas; Anupam Biswas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    This dataset is a collection of mel-spectrogram features extracted from Indian regional music containing the following languages:
    Hindi, Gujarati, Marathi, Konkani, Bengali, Oriya, Kashmiri, Assamese, Nepali, Konyak, Manipuri, Khasi & Jaintia, Tamil, Malayalam, Punjabi, Telugu, Kannada.

    Five recordings are collected for each language for four artists (2Male + 2Female) each. 2 artists out of 4 for each language are old veteran performers, and the remaining 2 are contemporary performers. Overall, the dataset includes 17 languages, 68 artists (34 Males and 34 Females). There are 340 recordings in the dataset, with a total duration of 29.3 hrs.

    Mel-spectrogram is extracted from a 1-second segment with a 1/2 second sliding window for each song. Extracted mel-spectrogram for each segment is annotated with language, location, local_song_index, global_song_index, language_id, location_id, artist_id, gender_id.

    _

    This project was funded under the grant number: ECR/2018/000204 by the Science & Engineering Research Board (SERB).

  10. Music-Generation-Dataset

    • kaggle.com
    zip
    Updated Dec 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aluru V N M Hemateja (2021). Music-Generation-Dataset [Dataset]. https://www.kaggle.com/datasets/ahemateja19bec1025/musicgenerationdataset
    Explore at:
    zip(402474 bytes)Available download formats
    Dataset updated
    Dec 14, 2021
    Authors
    Aluru V N M Hemateja
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is the dataset that I gathered from different sources for Music Generation.

    There are around 30 music midi files and there is a handling code in the Notebook named "Official" at the end.

    Please use the dataset wisely and only for good purposes.

    Please upvote the notebook and dataset if you like this.

  11. MSMD - Multimodal Sheet Music Dataset

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthias Dorfer; Hajič, Jan, jr.; Andreas Arzt; Harald Frostel; Gerhard Widmer; Matthias Dorfer; Hajič, Jan, jr.; Andreas Arzt; Harald Frostel; Gerhard Widmer (2020). MSMD - Multimodal Sheet Music Dataset [Dataset]. http://doi.org/10.5281/zenodo.2597505
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Matthias Dorfer; Hajič, Jan, jr.; Andreas Arzt; Harald Frostel; Gerhard Widmer; Matthias Dorfer; Hajič, Jan, jr.; Andreas Arzt; Harald Frostel; Gerhard Widmer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MSMD is a synthetic dataset of 497 pieces of (classical) music that contains both audio and score representations of the pieces aligned at a fine-grained level (344,742 pairs of noteheads aligned to their audio/MIDI counterpart). It can be used for training and evaluating multimodal models that enable crossing from one modality to the other, such as retrieving sheet music using recordings or following a performance in the score image.

    Please find further information and a corresponding Python package on this Github page: https://github.com/CPJKU/msmd

    If you use this dataset, please cite:
    [1] Matthias Dorfer, Jan Hajič jr., Andreas Arzt, Harald Frostel, Gerhard Widmer.
    Learning Audio-Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification (PDF).
    Transactions of the International Society for Music Information Retrieval, issue 1, 2018.

  12. Data from: Indian Folk Music Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated May 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yeshwant Singh; Yeshwant Singh; Lilapati Waikhom; Lilapati Waikhom; Vivek Meena; Vivek Meena; Anupam Biswas; Anupam Biswas (2022). Indian Folk Music Dataset [Dataset]. http://doi.org/10.5281/zenodo.6584021
    Explore at:
    binAvailable download formats
    Dataset updated
    May 27, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yeshwant Singh; Yeshwant Singh; Lilapati Waikhom; Lilapati Waikhom; Vivek Meena; Vivek Meena; Anupam Biswas; Anupam Biswas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is a collection of mel-spectrogram features extracted from Indian folk music containing the following 15 folk styles:
    Bauls, Bhavageethe, Garba, Kajri, Maand, Sohar, Tamang Selo, Veeragase, Bhatiali, Bihu, Gidha, Lavani, Naatupura Paatu, Sufi, Uttarakhandi.

    The number of recordings varies from 16 to 50 in the mentioned folk styles representing the scarcity of availability of given folk styles on the Internet. There are at least 4 artists and a maximum of 22. Overall there are 125 artists (34 female + 91 male) in these 15 folk styles.

    There is a total of 606 recordings in the dataset, with a total duration of 54.45 hrs.
    Mel-spectrogram is extracted from a 3-second segment with each song's 1/2 second sliding window. Extracted mel-spectrogram for each segment is annotated with folk_style, state, artist, gender, song, source, no_of_artists, folk_style_id, state_id, artist_id, gender_id.
    _
    This project was funded under the grant number: ECR/2018/000204 by the Science & Engineering Research Board (SERB).

  13. Music Genre fMRI Dataset

    • openneuro.org
    Updated Jul 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomoya Nakai; Naoko Koide-Majima; Shinji Nishimoto (2021). Music Genre fMRI Dataset [Dataset]. http://doi.org/10.18112/openneuro.ds003720.v1.0.0
    Explore at:
    Dataset updated
    Jul 14, 2021
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Tomoya Nakai; Naoko Koide-Majima; Shinji Nishimoto
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    README

    Music Genre fMRI Dataset by Tomoya Nakai, Naoko Koide-Majima, and Shinji Nishimoto

    References: Nakai, Koide-Majima, and Nishimoto (2021). Correspondence of categorical and feature-based representations of music in the human brain. Brain and Behavior. 11(1), e01936. https://doi.org/10.1002/brb3.1936

    We measured brain activity using functional MRI while five subjects (“sub-001”, …, “sub-005”) listened to music stimuli of 10 different genres.

    The entire folder consists of subject-wise subfolders (“sub-001”,…). Each subject’s folder contains the following subfolders: 1) anat: T1-weighted structural images 2) func: functional signals (multi-band echo-planar images)

    Each subject performed 18 runs consisting of 12 training runs and 6 test runs. The training and test data were assigned with the following notations: Training data: sub-00*_task-Training_run-**_bold.json Test data: sub-00*_task-Test_run-**_bold.json

    Each *_event.tsv file contains following information: onset: stimulus onset Duration: stimulus duration genre: genre type (out of 10 genres) track: index to identify the original track start: onset of excerpt from the original track (second) end: offset of excerpt from the original track (second)

    The duration of all stimuli is 15s. For each clip, 2 s of fade-in and fade-out effects were applied, and the overall signal intensity was normalized in terms of the root mean square.

    For the training runs, the 1st stimulus (0-15s) is the same as the last stimulus of the previous run (600-615s). For the test runs, the1st stimulus (0-15s) is the same as the last stimulus of the same run (600-615s).

    The original music stimuli (GTZAN dataset) can be found here: http://marsyas.info/downloads/datasets.html

    Caution This dataset can be used for research purposes only. The data were anonymized, and users shall not perform analyses to re-identify individual subjects.

  14. h

    Music-Dataset

    • huggingface.co
    Updated Nov 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI Generative Research Lab (2025). Music-Dataset [Dataset]. https://huggingface.co/datasets/AIGenLab/Music-Dataset
    Explore at:
    Dataset updated
    Nov 8, 2025
    Dataset authored and provided by
    AI Generative Research Lab
    Description

    AIGenLab/Music-Dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    music-dataset-1

    • huggingface.co
    Updated Jan 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Healthydater (2024). music-dataset-1 [Dataset]. https://huggingface.co/datasets/Healthydater/music-dataset-1
    Explore at:
    Dataset updated
    Jan 21, 2024
    Dataset authored and provided by
    Healthydater
    Description

    Healthydater/music-dataset-1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. c

    Music : 1950 to 2019 Dataset

    • cubig.ai
    zip
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Music : 1950 to 2019 Dataset [Dataset]. https://cubig.ai/store/products/395/music-1950-to-2019-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 29, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Music Dataset : 1950 to 2019 is a large-scale music dataset that includes various musical metadata such as sadness, danceability, loudness, acoustics, and more, along with song-specific lyrics from 1950 to 2019.

    2) Data Utilization (1) Music Dataset : 1950 to 2019 has characteristics that: • The dataset consists of more than 30 numerical and categorical variables, including artist name, song name, release year, lyrics, song length, emotion (sad, etc.), danceability, volume, acousticity, instrument use, energy, and subject matter, and provides both lyric text and musical characteristics. (2) Music Dataset : 1950 to 2019 can be used to: • Analysis of Music Trends and Emotional Changes: By analyzing changes in major music characteristics such as sadness, danceability, and volume by year in time series, you can explore music trends and emotional changes by period. • Lyrics-based Natural Language Processing and Genre Classification: Using song-specific lyrics and metadata, it can be used for various text and music data fusion analysis such as natural language processing-based emotion analysis, music genre classification, and recommendation system.

  17. h

    free-music-archive-full

    • huggingface.co
    Updated Sep 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Paine (2024). free-music-archive-full [Dataset]. https://huggingface.co/datasets/benjamin-paine/free-music-archive-full
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 13, 2024
    Authors
    Benjamin Paine
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    FMA: A Dataset for Music Analysis

    Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson. International Society for Music Information Retrieval Conference (ISMIR), 2017.

    We introduce the Free Music Archive (FMA), an open and easily accessible dataset suitable for evaluating several tasks in MIR, a field concerned with browsing, searching, and organizing large music collections. The community's growing interest in feature and end-to-end learning is however restrained… See the full description on the dataset page: https://huggingface.co/datasets/benjamin-paine/free-music-archive-full.

  18. ACMUS YouTube Music Dataset

    • kaggle.com
    zip
    Updated Feb 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). ACMUS YouTube Music Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/acmus-youtube-music-dataset
    Explore at:
    zip(8111 bytes)Available download formats
    Dataset updated
    Feb 12, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    YouTube
    Description

    ACMUS YouTube Music Dataset

    Annotated Music for Instrumental Format Recognition and Vocal Classification

    By [source]

    About this dataset

    The ACMUS YouTube Music Set is an annotated collection of music from YouTube videos, designed to support the exploration of cutting-edge computational methods for two key tasks: Instrumental Format Identification and Vocal Music Classification. Encompassing a wide range of genres and eras, this multi-dimensional dataset contains information such as File Name, Title, Genre, Composer or Artist?, Sampling Rate, Channels, Bit Depth, Duration (sec), Original File (if applicable), Collection from which it was taken from , Observations made about the audio file (if any), Number of Instruments present in the audio file, Presence or absence of Guitar/Bandola/Tiple/Bass/Percussion/, Tempo and Language travelled. Additionally this dataset tackles one step further by including vocal classification based on presence or absence in Female Voice / Male Voice files. This is a great resource for anyone exploring Artificial Intelligence techniques related to music recognition and vocal classification

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    • Review the columns of information that are included in the dataset: These include File name, Title, Genre, Composer or artist?, Sampling rate, Channels, Bit depth, Duration (sec), Collection, Nr. of instruments, Guitar, Bandola, Tiple Bass Female Voice Male Voice Percussion Tempo Language Artist/Performer Filename Composer Original File Last called (Date) Number of instruments and Female voice and Male voice.
    • Start by exploring the audio file properties first: these include File name Title Genre Sampling rate Channels Bit depth Duration (sec) Collection Nr. of instruments Guitar Bandola Tiple Bass Female Voice Male Voice Percussion Tempo Language Artist/Performer Filename Composer Original File Number of Instruments Female Voice and Male Voice
    • Make sure you have a clear understanding about each column before you proceed: This includes all the features associated with each audio file such as title genre composition artist sampling rate bit depth duration in seconds number of tracks original file date uploaded collection observations guitar bandola tiple bass female voice male voice percussion tempo language artist or performer filename composer etc
    • Establish relationships between different data points by using visualization tools like graphs tables scatter plots etc.: Visualize all related audio file properties like their genre type their channel compositions artist names original files last call dates collections observed noise levels at 64 Hz and 128 Hz identifying cover versions instrumental versions etc
      5 Update your research regularly with new findings by revisiting your visualizations comparing features between different formats running clustering algorithms for classification to better group music files accordingly

    Research Ideas

    • Using the Instrumental Format Recognition and Vocal Music Classification tasks with Machine Learning algorithms to create an automated music labeler. The data in this dataset could be used to create a tool that can identify various instruments in an audio file and also classify music as either vocal or instrumental, which can help streamline the process of cataloguing and labeling new music tracks.
    • This dataset could be used for training computer vision models for automatic instrument recognition from video files. By feeding the dataset into a convolutional neural network, algorithms can be developed to detect different types of instruments from video streams and differentiate between vocal or instrumental pieces.
    • This dataset could be used for audio source separation research, which is the process of isolating individual audio sources from a mix of sounds within an audio clip or recording. Source separation research often relies on datasets such as this one for providing labeled data about instrumentation and pitch levels that allow researchers to develop algorithms capable of separating multiple sound sources within a single mixture signal

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. [See Other Inf...

  19. Spotify Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated Apr 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Spotify Dataset [Dataset]. https://brightdata.com/products/datasets/spotify
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Apr 10, 2024
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Gain valuable insights into music trends, artist popularity, and streaming analytics with our comprehensive Spotify Dataset. Designed for music analysts, marketers, and businesses, this dataset provides structured and reliable data from Spotify to enhance market research, content strategy, and audience engagement.

    Dataset Features

    Track Information: Access detailed data on songs, including track name, artist, album, genre, and release date. Streaming Popularity: Extract track popularity scores, listener engagement metrics, and ranking trends. Artist & Album Insights: Analyze artist performance, album releases, and genre trends over time. Related Searches & Recommendations: Track related search terms and suggested content for deeper audience insights. Historical & Real-Time Data: Retrieve historical streaming data or access continuously updated records for real-time trend analysis.

    Customizable Subsets for Specific Needs Our Spotify Dataset is fully customizable, allowing you to filter data based on track popularity, artist, genre, release date, or listener engagement. Whether you need broad coverage for industry analysis or focused data for content optimization, we tailor the dataset to your needs.

    Popular Use Cases

    Market Analysis & Trend Forecasting: Identify emerging music trends, genre popularity, and listener preferences. Artist & Label Performance Tracking: Monitor artist rankings, album success, and audience engagement. Competitive Intelligence: Analyze competitor music strategies, playlist placements, and streaming performance. AI & Machine Learning Applications: Use structured music data to train AI models for recommendation engines, playlist curation, and predictive analytics. Advertising & Sponsorship Insights: Identify high-performing tracks and artists for targeted advertising and sponsorship opportunities.

    Whether you're optimizing music marketing, analyzing streaming trends, or enhancing content strategies, our Spotify Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.

  20. Z

    MGD: Music Genre Dataset

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated May 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel P. Oliveira; Mariana O. Silva; Danilo B. Seufitelli; Anisio Lacerda; Mirella M. Moro (2021). MGD: Music Genre Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4778562
    Explore at:
    Dataset updated
    May 28, 2021
    Dataset provided by
    Universidade Federal de Minas Gerais
    Authors
    Gabriel P. Oliveira; Mariana O. Silva; Danilo B. Seufitelli; Anisio Lacerda; Mirella M. Moro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MGD: Music Genre Dataset

    Over recent years, the world has seen a dramatic change in the way people consume music, moving from physical records to streaming services. Since 2017, such services have become the main source of revenue within the global recorded music market. Therefore, this dataset is built by using data from Spotify. It provides a weekly chart of the 200 most streamed songs for each country and territory it is present, as well as an aggregated global chart.

    Considering that countries behave differently when it comes to musical tastes, we use chart data from global and regional markets from January 2017 to December 2019, considering eight of the top 10 music markets according to IFPI: United States (1st), Japan (2nd), United Kingdom (3rd), Germany (4th), France (5th), Canada (8th), Australia (9th), and Brazil (10th).

    We also provide information about the hit songs and artists present in the charts, such as all collaborating artists within a song (since the charts only provide the main ones) and their respective genres, which is the core of this work. MGD also provides data about musical collaboration, as we build collaboration networks based on artist partnerships in hit songs. Therefore, this dataset contains:

    Genre Networks: Success-based genre collaboration networks

    Genre Mapping: Genre mapping from Spotify genres to super-genres

    Artist Networks: Success-based artist collaboration networks

    Artists: Some artist data

    Hit Songs: Hit Song data and features

    Charts: Enhanced data from Spotify Weekly Top 200 Charts

    This dataset was originally built for a conference paper at ISMIR 2020. If you make use of the dataset, please also cite the following paper:

    Gabriel P. Oliveira, Mariana O. Silva, Danilo B. Seufitelli, Anisio Lacerda, and Mirella M. Moro. Detecting Collaboration Profiles in Success-based Music Genre Networks. In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR 2020), 2020.

    @inproceedings{ismir/OliveiraSSLM20, title = {Detecting Collaboration Profiles in Success-based Music Genre Networks}, author = {Gabriel P. Oliveira and Mariana O. Silva and Danilo B. Seufitelli and Anisio Lacerda and Mirella M. Moro}, booktitle = {21st International Society for Music Information Retrieval Conference} pages = {726--732}, year = {2020} }

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Oramas, Sergio (2022). MuMu: Multimodal Music Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_831188

MuMu: Multimodal Music Dataset

Explore at:
Dataset updated
Dec 6, 2022
Dataset provided by
Universitat Pompeu Fabra
Authors
Oramas, Sergio
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

MuMu is a Multimodal Music dataset with multi-label genre annotations that combines information from the Amazon Reviews dataset and the Million Song Dataset (MSD). The former contains millions of album customer reviews and album metadata gathered from Amazon.com. The latter is a collection of metadata and precomputed audio features for a million songs.

To map the information from both datasets we use MusicBrainz. This process yields the final set of 147,295 songs, which belong to 31,471 albums. For the mapped set of albums, there are 447,583 customer reviews from the Amazon Dataset. The dataset have been used for multi-label music genre classification experiments in the related publication. In addition to genre annotations, this dataset provides further information about each album, such as genre annotations, average rating, selling rank, similar products, and cover image url. For every text review it also provides helpfulness score of the reviews, average rating, and summary of the review.

The mapping between the three datasets (Amazon, MusicBrainz and MSD), genre annotations, metadata, data splits, text reviews and links to images are available here. Images and audio files can not be released due to copyright issues.

MuMu dataset (mapping, metadata, annotations and text reviews)

Data splits and multimodal feature embeddings for ISMIR multi-label classification experiments

These data can be used together with the Tartarus deep learning library https://github.com/sergiooramas/tartarus.

NOTE: This version provides simplified files with metadata and splits.

Scientific References

Please cite the following papers if using MuMu dataset or Tartarus library.

Oramas, S., Barbieri, F., Nieto, O., and Serra, X (2018). Multimodal Deep Learning for Music Genre Classification, Transactions of the International Society for Music Information Retrieval, V(1).

Oramas S., Nieto O., Barbieri F., & Serra X. (2017). Multi-label Music Genre Classification from audio, text and images using Deep Features. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017). https://arxiv.org/abs/1707.04916

Search
Clear search
Close search
Google apps
Main menu