9 datasets found

o
Spotify Million Song Dataset
opendatabay.com
huggingface.co
.undefined
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Spotify Million Song Dataset [Dataset]. https://www.opendatabay.com/data/dataset/db3c0ef7-dfe6-4d65-a588-ee33c43a002e
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 6, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
This is Spotify Million Song Dataset. This dataset contains song names, artists names, link to the song and lyrics. This dataset can be used for recommending songs, classifying or clustering songs.

Original Data Source: Spotify Million Song Dataset
Playlist2vec: Spotify Million Playlist Dataset
zenodo.org
data.niaid.nih.gov
bin
Updated Jun 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Piyush Papreja; Piyush Papreja (2021). Playlist2vec: Spotify Million Playlist Dataset [Dataset]. http://doi.org/10.5281/zenodo.5002584
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5002584
Dataset updated
Jun 22, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Piyush Papreja; Piyush Papreja
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was created using Spotify developer API. It consists of user-created as well as Spotify-curated playlists.
The dataset consists of 1 million playlists, 3 million unique tracks, 3 million unique albums, and 1.3 million artists.
The data is stored in a SQL database, with the primary entities being songs, albums, artists, and playlists.
Each of the aforementioned entities are represented by unique IDs (Spotify URI).
Data is stored into following tables:

album

artist

track

playlist

track_artist1

track_playlist1

album

| id | name | uri |

id: Album ID as provided by Spotify
name: Album Name as provided by Spotify
uri: Album URI as provided by Spotify

artist

| id | name | uri |

id: Artist ID as provided by Spotify
name: Artist Name as provided by Spotify
uri: Artist URI as provided by Spotify

track

| id | name | duration | popularity | explicit | preview_url | uri | album_id |

id: Track ID as provided by Spotify
name: Track Name as provided by Spotify
duration: Track Duration (in milliseconds) as provided by Spotify
popularity: Track Popularity as provided by Spotify
explicit: Whether the track has explicit lyrics or not. (true or false)
preview_url: A link to a 30 second preview (MP3 format) of the track. Can be null
uri: Track Uri as provided by Spotify
album_id: Album Id to which the track belongs

playlist

| id | name | followers | uri | total_tracks |

id: Playlist ID as provided by Spotify
name: Playlist Name as provided by Spotify
followers: Playlist Followers as provided by Spotify
uri: Playlist Uri as provided by Spotify
total_tracks: Total number of tracks in the playlist.

track_artist1

| track_id | artist_id |

Track-Artist association table

track_playlist1

| track_id | playlist_id |

Track-Playlist association table

- - - - - SETUP - - - - -

The data is in the form of a SQL dump. The download size is about 10 GB, and the database populated from it comes out to about 35GB.

spotifydbdumpschemashare.sql contains the schema for the database (for reference):
spotifydbdumpshare.sql is the actual data dump.

Setup steps:
1. Create database

- - - - - PAPER - - - - -

The description of this dataset can be found in the following paper:

Papreja P., Venkateswara H., Panchanathan S. (2020) Representation, Exploration and Recommendation of Playlists. In: Cellier P., Driessens K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Communications in Computer and Information Science, vol 1168. Springer, Cham
h
genius-lyrics
huggingface.co
Updated Aug 25, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bruno Kreiner (2017). genius-lyrics [Dataset]. https://huggingface.co/datasets/brunokreiner/genius-lyrics
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 25, 2017
Authors
Bruno Kreiner
Description
Dataset Card for Dataset Name

Dataset Description Dataset Summary

This dataset consists of roughly 480k english (classified using nltk language classifier) lyrics with some more meta data. The id corresponds to the spotify id. The meta data was taken from the million playlist challenge @ AICrowd. The lyrics were crawled using "[song name] [artist name]" as string using the lyricsgenius python package which uses the genius.com search function. There is no… See the full description on the dataset page: https://huggingface.co/datasets/brunokreiner/genius-lyrics.
Data from: MusicOSet: An Enhanced Open Dataset for Music Data Mining
zenodo.org
data.niaid.nih.gov
bin, zip
Updated Jun 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota (2021). MusicOSet: An Enhanced Open Dataset for Music Data Mining [Dataset]. http://doi.org/10.5281/zenodo.4904639
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4904639
Dataset updated
Jun 7, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MusicOSet is an open and enhanced dataset of musical elements (artists, songs and albums) based on musical popularity classification. Provides a directly accessible collection of data suitable for numerous tasks in music data mining (e.g., data visualization, classification, clustering, similarity search, MIR, HSS and so forth). To create MusicOSet, the potential information sources were divided into three main categories: music popularity sources, metadata sources, and acoustic and lyrical features sources. Data from all three categories were initially collected between January and May 2019. Nevertheless, the update and enhancement of the data happened in June 2019.

The attractive features of MusicOSet include:

Integration and centralization of different musical data sources

Calculation of popularity scores and classification of hits and non-hits musical elements, varying from 1962 to 2018

Enriched metadata for music, artists, and albums from the US popular music industry

Availability of acoustic and lyrical resources

Unrestricted access in two formats: SQL database and compressed .csv files

| Data | # Records | |:-----------------:|:---------:| | Songs | 20,405 | | Artists | 11,518 | | Albums | 26,522 | | Lyrics | 19,664 | | Acoustic Features | 20,405 | | Genres | 1,561 |
h
Spotify-Lyrics-Summarized
huggingface.co
Updated Apr 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tran Quang Huyen (2025). Spotify-Lyrics-Summarized [Dataset]. https://huggingface.co/datasets/tqhuyen/Spotify-Lyrics-Summarized
Explore at:
Dataset updated
Apr 13, 2025
Authors
Tran Quang Huyen
Description
tqhuyen/Spotify-Lyrics-Summarized dataset hosted on Hugging Face and contributed by the HF Datasets community
Kendrick Lamar Dataset (Track features + Lyrics)
kaggle.com
Updated May 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CeeBloop (2024). Kendrick Lamar Dataset (Track features + Lyrics) [Dataset]. https://www.kaggle.com/datasets/ceebloop/kendrick-lamar-albumslyrics-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 19, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
CeeBloop
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
NOTE: This dataset only contains songs featured on studio albums + the 4 diss tracks aimed at Drake. - Audio features obtained from Spotify / Lyrics obtained from Genius - Track 6:16 in LA has no audio features since it wasn't released on DSPs

FEATURES:

track_name

album

release_date: Datetime format (YYYY-MM-DD)

duration_ms: Duration of track in milliseconds

popularity: Popularity rating of track on Spotify

speechiness: Describes the presence of spoken words in track (from Spotify)

danceability: Describes how suitable a track is for dancing based on a combination of musical elements (from Spotify)

tempo: The overall estimated tempo of a track in beats per minute (BPM) (from Spotify)

lyrics: Includes all lyrics preformed by Kendrick (excludes feature verses)
h
spotify-20k-semantic
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ekansh Gupta, spotify-20k-semantic [Dataset]. https://huggingface.co/datasets/egupta/spotify-20k-semantic
Explore at:
Authors
Ekansh Gupta
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Github Repo for processing this dataset and others

Check out my github repo for processing this data here

Dataset Files: Description and Construction

This dataset contains .pkl files derived from 20k playlists in the Spotify million playlist dataset (MPD). These files contain the genre information of artists and lyric embeddings for the majority of the songs computed using MiniLM.

File 1: embedding_genre.pkl Description

This file is a pickled list… See the full description on the dataset page: https://huggingface.co/datasets/egupta/spotify-20k-semantic.
h
poplyrics-1k
huggingface.co
Updated Sep 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ashutosh Sharma (2024). poplyrics-1k [Dataset]. https://huggingface.co/datasets/ashuwhy/poplyrics-1k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 27, 2024
Authors
Ashutosh Sharma
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Pop Lyrics Dataset

This dataset contains up to 1,000 pop songs with their lyrics, songwriters, genres, and other relevant metadata. The data was collected from Spotify and Genius.

Dataset Structure

track_name: Name of the song. album: Album name. release_date: Release date of the song. song_length: Duration of the song. popularity: Popularity score from Spotify. songwriters: List of songwriters. artist: Name of the artist. lyrics: Cleaned lyrics of the song. genre: List… See the full description on the dataset page: https://huggingface.co/datasets/ashuwhy/poplyrics-1k.
h
Lyrics1M
huggingface.co
Updated Sep 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BIG data (2024). Lyrics1M [Dataset]. https://huggingface.co/datasets/bigdata-pw/Lyrics1M
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 14, 2024
Dataset authored and provided by
BIG data
License
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Description
Dataset Card for Lyrics1M

Dataset Details Dataset Description

Lyrics for approximately 1 million tracks. Entries include unique ID, artist, track title, lyrics and language. This is a subset of bigdata-pw/Spotify, filtered for popularity >= 20 and deduplicated by track title.

Curated by: hlky License: Open Data Commons Attribution License (ODC-By) v1.0

Citation Information

@misc{Lyrics1M, author = {hlky}, title = {Lyrics1M}, year = {2024}… See the full description on the dataset page: https://huggingface.co/datasets/bigdata-pw/Lyrics1M.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Datasimple (2025). Spotify Million Song Dataset [Dataset]. https://www.opendatabay.com/data/dataset/db3c0ef7-dfe6-4d65-a588-ee33c43a002e

Spotify Million Song Dataset

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

.undefinedAvailable download formats

Dataset updated

Jun 6, 2025

Dataset authored and provided by

Datasimple

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered

Data Science and Analytics

Description

This is Spotify Million Song Dataset. This dataset contains song names, artists names, link to the song and lyrics. This dataset can be used for recommending songs, classifying or clustering songs.

Original Data Source: Spotify Million Song Dataset

Clear search

Close search

Google apps

Main menu

Spotify Million Song Dataset

Playlist2vec: Spotify Million Playlist Dataset

genius-lyrics

Data from: MusicOSet: An Enhanced Open Dataset for Music Data Mining

Spotify-Lyrics-Summarized

Kendrick Lamar Dataset (Track features + Lyrics)

FEATURES:

spotify-20k-semantic

poplyrics-1k

Lyrics1M

Spotify Million Song Dataset