9 datasets found
  1. o

    Spotify Million Song Dataset

    • opendatabay.com
    • huggingface.co
    .undefined
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Spotify Million Song Dataset [Dataset]. https://www.opendatabay.com/data/dataset/db3c0ef7-dfe6-4d65-a588-ee33c43a002e
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 6, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Data Science and Analytics
    Description

    This is Spotify Million Song Dataset. This dataset contains song names, artists names, link to the song and lyrics. This dataset can be used for recommending songs, classifying or clustering songs.

    Original Data Source: Spotify Million Song Dataset

  2. Playlist2vec: Spotify Million Playlist Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Jun 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piyush Papreja; Piyush Papreja (2021). Playlist2vec: Spotify Million Playlist Dataset [Dataset]. http://doi.org/10.5281/zenodo.5002584
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 22, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Piyush Papreja; Piyush Papreja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was created using Spotify developer API. It consists of user-created as well as Spotify-curated playlists.
    The dataset consists of 1 million playlists, 3 million unique tracks, 3 million unique albums, and 1.3 million artists.
    The data is stored in a SQL database, with the primary entities being songs, albums, artists, and playlists.
    Each of the aforementioned entities are represented by unique IDs (Spotify URI).
    Data is stored into following tables:

    • album
    • artist
    • track
    • playlist
    • track_artist1
    • track_playlist1

    album

    | id | name | uri |

    id: Album ID as provided by Spotify
    name: Album Name as provided by Spotify
    uri: Album URI as provided by Spotify


    artist

    | id | name | uri |

    id: Artist ID as provided by Spotify
    name: Artist Name as provided by Spotify
    uri: Artist URI as provided by Spotify


    track

    | id | name | duration | popularity | explicit | preview_url | uri | album_id |

    id: Track ID as provided by Spotify
    name: Track Name as provided by Spotify
    duration: Track Duration (in milliseconds) as provided by Spotify
    popularity: Track Popularity as provided by Spotify
    explicit: Whether the track has explicit lyrics or not. (true or false)
    preview_url: A link to a 30 second preview (MP3 format) of the track. Can be null
    uri: Track Uri as provided by Spotify
    album_id: Album Id to which the track belongs


    playlist

    | id | name | followers | uri | total_tracks |

    id: Playlist ID as provided by Spotify
    name: Playlist Name as provided by Spotify
    followers: Playlist Followers as provided by Spotify
    uri: Playlist Uri as provided by Spotify
    total_tracks: Total number of tracks in the playlist.

    track_artist1

    | track_id | artist_id |

    Track-Artist association table

    track_playlist1

    | track_id | playlist_id |

    Track-Playlist association table

    - - - - - SETUP - - - - -


    The data is in the form of a SQL dump. The download size is about 10 GB, and the database populated from it comes out to about 35GB.

    spotifydbdumpschemashare.sql contains the schema for the database (for reference):
    spotifydbdumpshare.sql is the actual data dump.


    Setup steps:
    1. Create database

    - - - - - PAPER - - - - -


    The description of this dataset can be found in the following paper:

    Papreja P., Venkateswara H., Panchanathan S. (2020) Representation, Exploration and Recommendation of Playlists. In: Cellier P., Driessens K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Communications in Computer and Information Science, vol 1168. Springer, Cham

  3. h

    genius-lyrics

    • huggingface.co
    Updated Aug 25, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bruno Kreiner (2017). genius-lyrics [Dataset]. https://huggingface.co/datasets/brunokreiner/genius-lyrics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 25, 2017
    Authors
    Bruno Kreiner
    Description

    Dataset Card for Dataset Name

      Dataset Description
    
    
    
    
    
      Dataset Summary
    

    This dataset consists of roughly 480k english (classified using nltk language classifier) lyrics with some more meta data. The id corresponds to the spotify id. The meta data was taken from the million playlist challenge @ AICrowd. The lyrics were crawled using "[song name] [artist name]" as string using the lyricsgenius python package which uses the genius.com search function. There is no… See the full description on the dataset page: https://huggingface.co/datasets/brunokreiner/genius-lyrics.

  4. Data from: MusicOSet: An Enhanced Open Dataset for Music Data Mining

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Jun 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota (2021). MusicOSet: An Enhanced Open Dataset for Music Data Mining [Dataset]. http://doi.org/10.5281/zenodo.4904639
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jun 7, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MusicOSet is an open and enhanced dataset of musical elements (artists, songs and albums) based on musical popularity classification. Provides a directly accessible collection of data suitable for numerous tasks in music data mining (e.g., data visualization, classification, clustering, similarity search, MIR, HSS and so forth). To create MusicOSet, the potential information sources were divided into three main categories: music popularity sources, metadata sources, and acoustic and lyrical features sources. Data from all three categories were initially collected between January and May 2019. Nevertheless, the update and enhancement of the data happened in June 2019.

    The attractive features of MusicOSet include:

    • Integration and centralization of different musical data sources
    • Calculation of popularity scores and classification of hits and non-hits musical elements, varying from 1962 to 2018
    • Enriched metadata for music, artists, and albums from the US popular music industry
    • Availability of acoustic and lyrical resources
    • Unrestricted access in two formats: SQL database and compressed .csv files
    |    Data    | # Records |
    |:-----------------:|:---------:|
    | Songs       | 20,405  |
    | Artists      | 11,518  |
    | Albums      | 26,522  |
    | Lyrics      | 19,664  |
    | Acoustic Features | 20,405  |
    | Genres      | 1,561   |
  5. h

    Spotify-Lyrics-Summarized

    • huggingface.co
    Updated Apr 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tran Quang Huyen (2025). Spotify-Lyrics-Summarized [Dataset]. https://huggingface.co/datasets/tqhuyen/Spotify-Lyrics-Summarized
    Explore at:
    Dataset updated
    Apr 13, 2025
    Authors
    Tran Quang Huyen
    Description

    tqhuyen/Spotify-Lyrics-Summarized dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. Kendrick Lamar Dataset (Track features + Lyrics)

    • kaggle.com
    Updated May 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CeeBloop (2024). Kendrick Lamar Dataset (Track features + Lyrics) [Dataset]. https://www.kaggle.com/datasets/ceebloop/kendrick-lamar-albumslyrics-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 19, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    CeeBloop
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    NOTE: This dataset only contains songs featured on studio albums + the 4 diss tracks aimed at Drake. - Audio features obtained from Spotify / Lyrics obtained from Genius - Track 6:16 in LA has no audio features since it wasn't released on DSPs

    FEATURES:

    • track_name
    • album
    • release_date: Datetime format (YYYY-MM-DD)
    • duration_ms: Duration of track in milliseconds
    • popularity: Popularity rating of track on Spotify
    • speechiness: Describes the presence of spoken words in track (from Spotify)
    • danceability: Describes how suitable a track is for dancing based on a combination of musical elements (from Spotify)
    • tempo: The overall estimated tempo of a track in beats per minute (BPM) (from Spotify)
    • lyrics: Includes all lyrics preformed by Kendrick (excludes feature verses)
  7. h

    spotify-20k-semantic

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ekansh Gupta, spotify-20k-semantic [Dataset]. https://huggingface.co/datasets/egupta/spotify-20k-semantic
    Explore at:
    Authors
    Ekansh Gupta
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Github Repo for processing this dataset and others

    Check out my github repo for processing this data here

      Dataset Files: Description and Construction
    

    This dataset contains .pkl files derived from 20k playlists in the Spotify million playlist dataset (MPD). These files contain the genre information of artists and lyric embeddings for the majority of the songs computed using MiniLM.

      File 1: embedding_genre.pkl
    
    
    
    
    
      Description
    

    This file is a pickled list… See the full description on the dataset page: https://huggingface.co/datasets/egupta/spotify-20k-semantic.

  8. h

    poplyrics-1k

    • huggingface.co
    Updated Sep 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashutosh Sharma (2024). poplyrics-1k [Dataset]. https://huggingface.co/datasets/ashuwhy/poplyrics-1k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 27, 2024
    Authors
    Ashutosh Sharma
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Pop Lyrics Dataset

    This dataset contains up to 1,000 pop songs with their lyrics, songwriters, genres, and other relevant metadata. The data was collected from Spotify and Genius.

      Dataset Structure
    

    track_name: Name of the song. album: Album name. release_date: Release date of the song. song_length: Duration of the song. popularity: Popularity score from Spotify. songwriters: List of songwriters. artist: Name of the artist. lyrics: Cleaned lyrics of the song. genre: List… See the full description on the dataset page: https://huggingface.co/datasets/ashuwhy/poplyrics-1k.

  9. h

    Lyrics1M

    • huggingface.co
    Updated Sep 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BIG data (2024). Lyrics1M [Dataset]. https://huggingface.co/datasets/bigdata-pw/Lyrics1M
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 14, 2024
    Dataset authored and provided by
    BIG data
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    Dataset Card for Lyrics1M

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    Lyrics for approximately 1 million tracks. Entries include unique ID, artist, track title, lyrics and language. This is a subset of bigdata-pw/Spotify, filtered for popularity >= 20 and deduplicated by track title.

    Curated by: hlky License: Open Data Commons Attribution License (ODC-By) v1.0

      Citation Information
    

    @misc{Lyrics1M, author = {hlky}, title = {Lyrics1M}, year = {2024}… See the full description on the dataset page: https://huggingface.co/datasets/bigdata-pw/Lyrics1M.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Datasimple (2025). Spotify Million Song Dataset [Dataset]. https://www.opendatabay.com/data/dataset/db3c0ef7-dfe6-4d65-a588-ee33c43a002e

Spotify Million Song Dataset

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
.undefinedAvailable download formats
Dataset updated
Jun 6, 2025
Dataset authored and provided by
Datasimple
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered
Data Science and Analytics
Description

This is Spotify Million Song Dataset. This dataset contains song names, artists names, link to the song and lyrics. This dataset can be used for recommending songs, classifying or clustering songs.

Original Data Source: Spotify Million Song Dataset

Search
Clear search
Close search
Google apps
Main menu