CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is Spotify Million Song Dataset. This dataset contains song names, artists names, link to the song and lyrics. This dataset can be used for recommending songs, classifying or clustering songs.
Original Data Source: Spotify Million Song Dataset
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was created using Spotify developer API. It consists of user-created as well as Spotify-curated playlists.
The dataset consists of 1 million playlists, 3 million unique tracks, 3 million unique albums, and 1.3 million artists.
The data is stored in a SQL database, with the primary entities being songs, albums, artists, and playlists.
Each of the aforementioned entities are represented by unique IDs (Spotify URI).
Data is stored into following tables:
album
| id | name | uri |
id: Album ID as provided by Spotify
name: Album Name as provided by Spotify
uri: Album URI as provided by Spotify
artist
| id | name | uri |
id: Artist ID as provided by Spotify
name: Artist Name as provided by Spotify
uri: Artist URI as provided by Spotify
track
| id | name | duration | popularity | explicit | preview_url | uri | album_id |
id: Track ID as provided by Spotify
name: Track Name as provided by Spotify
duration: Track Duration (in milliseconds) as provided by Spotify
popularity: Track Popularity as provided by Spotify
explicit: Whether the track has explicit lyrics or not. (true or false)
preview_url: A link to a 30 second preview (MP3 format) of the track. Can be null
uri: Track Uri as provided by Spotify
album_id: Album Id to which the track belongs
playlist
| id | name | followers | uri | total_tracks |
id: Playlist ID as provided by Spotify
name: Playlist Name as provided by Spotify
followers: Playlist Followers as provided by Spotify
uri: Playlist Uri as provided by Spotify
total_tracks: Total number of tracks in the playlist.
track_artist1
| track_id | artist_id |
Track-Artist association table
track_playlist1
| track_id | playlist_id |
Track-Playlist association table
- - - - - SETUP - - - - -
The data is in the form of a SQL dump. The download size is about 10 GB, and the database populated from it comes out to about 35GB.
spotifydbdumpschemashare.sql contains the schema for the database (for reference):
spotifydbdumpshare.sql is the actual data dump.
Setup steps:
1. Create database
- - - - - PAPER - - - - -
The description of this dataset can be found in the following paper:
Papreja P., Venkateswara H., Panchanathan S. (2020) Representation, Exploration and Recommendation of Playlists. In: Cellier P., Driessens K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Communications in Computer and Information Science, vol 1168. Springer, Cham
Dataset Card for Dataset Name
Dataset Description
Dataset Summary
This dataset consists of roughly 480k english (classified using nltk language classifier) lyrics with some more meta data. The id corresponds to the spotify id. The meta data was taken from the million playlist challenge @ AICrowd. The lyrics were crawled using "[song name] [artist name]" as string using the lyricsgenius python package which uses the genius.com search function. There is no… See the full description on the dataset page: https://huggingface.co/datasets/brunokreiner/genius-lyrics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MusicOSet is an open and enhanced dataset of musical elements (artists, songs and albums) based on musical popularity classification. Provides a directly accessible collection of data suitable for numerous tasks in music data mining (e.g., data visualization, classification, clustering, similarity search, MIR, HSS and so forth). To create MusicOSet, the potential information sources were divided into three main categories: music popularity sources, metadata sources, and acoustic and lyrical features sources. Data from all three categories were initially collected between January and May 2019. Nevertheless, the update and enhancement of the data happened in June 2019.
The attractive features of MusicOSet include:
| Data | # Records |
|:-----------------:|:---------:|
| Songs | 20,405 |
| Artists | 11,518 |
| Albums | 26,522 |
| Lyrics | 19,664 |
| Acoustic Features | 20,405 |
| Genres | 1,561 |
tqhuyen/Spotify-Lyrics-Summarized dataset hosted on Hugging Face and contributed by the HF Datasets community
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
NOTE: This dataset only contains songs featured on studio albums + the 4 diss tracks aimed at Drake. - Audio features obtained from Spotify / Lyrics obtained from Genius - Track 6:16 in LA has no audio features since it wasn't released on DSPs
track_name
album
release_date
: Datetime format (YYYY-MM-DD)duration_ms
: Duration of track in millisecondspopularity
: Popularity rating of track on Spotifyspeechiness
: Describes the presence of spoken words in track (from Spotify)danceability
: Describes how suitable a track is for dancing based on a combination of musical elements (from Spotify)tempo
: The overall estimated tempo of a track in beats per minute (BPM) (from Spotify)lyrics
: Includes all lyrics preformed by Kendrick (excludes feature verses)MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Github Repo for processing this dataset and others
Check out my github repo for processing this data here
Dataset Files: Description and Construction
This dataset contains .pkl files derived from 20k playlists in the Spotify million playlist dataset (MPD). These files contain the genre information of artists and lyric embeddings for the majority of the songs computed using MiniLM.
File 1: embedding_genre.pkl
Description
This file is a pickled list… See the full description on the dataset page: https://huggingface.co/datasets/egupta/spotify-20k-semantic.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Pop Lyrics Dataset
This dataset contains up to 1,000 pop songs with their lyrics, songwriters, genres, and other relevant metadata. The data was collected from Spotify and Genius.
Dataset Structure
track_name: Name of the song. album: Album name. release_date: Release date of the song. song_length: Duration of the song. popularity: Popularity score from Spotify. songwriters: List of songwriters. artist: Name of the artist. lyrics: Cleaned lyrics of the song. genre: List… See the full description on the dataset page: https://huggingface.co/datasets/ashuwhy/poplyrics-1k.
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Dataset Card for Lyrics1M
Dataset Details
Dataset Description
Lyrics for approximately 1 million tracks. Entries include unique ID, artist, track title, lyrics and language. This is a subset of bigdata-pw/Spotify, filtered for popularity >= 20 and deduplicated by track title.
Curated by: hlky License: Open Data Commons Attribution License (ODC-By) v1.0
Citation Information
@misc{Lyrics1M, author = {hlky}, title = {Lyrics1M}, year = {2024}… See the full description on the dataset page: https://huggingface.co/datasets/bigdata-pw/Lyrics1M.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is Spotify Million Song Dataset. This dataset contains song names, artists names, link to the song and lyrics. This dataset can be used for recommending songs, classifying or clustering songs.
Original Data Source: Spotify Million Song Dataset