100+ datasets found

Spotify dataset
kaggle.com
zip
Updated Jul 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sanjana chaudhari☑️ (2023). Spotify dataset [Dataset]. https://www.kaggle.com/datasets/sanjanchaudhari/spotify-dataset
Explore at:
zip(2045049 bytes)Available download formats
Dataset updated
Jul 20, 2023
Authors
Sanjana chaudhari☑️
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
Introduction to the Spotify Dataset

Overview of the Dataset Source and Purpose Description of the Data Collection Process Data Exploration

Understanding the Structure and Size of the Dataset Overview of the Features and Columns Key Features in the Spotify Dataset

Explanation of Important Columns (e.g., track name, artist, album, duration, popularity) Genre and Music Category Analysis

Categorizing Songs by Genre and Music Type Most Popular Genres on Spotify **Artist Analysis ** Identifying Top Artists Based on Popularity and Number of Songs Relationship between Artist and Song Attributes Song Duration Analysis

Distribution of Song Durations Impact of Song Duration on Popularity and Listener Engagement Song Popularity and Listener Engagement

Analyzing the Popularity Scores of Songs Correlation between Popularity and Other Song Features Audio Features Analysis

Examination of Audio Features (danceability, energy, instrumentalness, etc.) Clustering Songs Based on Audio Features Time-Based Analysis

Seasonal Trends in Song Releases and Popularity Time Series Analysis of Listening Patterns Collaborations and Featured Artists

Frequency of Collaborations and Featured Artists Impact of Collaborations on Song Popularity Recommendation Systems

Overview of Spotify's Recommendation Algorithms Building Simple Recommendation Models User Behavior and Playlist Analysis

Analysis of User-Generated Playlists Common Song Additions and Removals
Spotify Dataset
brightdata.com
.json, .csv, .xlsx
Updated Apr 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Spotify Dataset [Dataset]. https://brightdata.com/products/datasets/spotify
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Apr 10, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Gain valuable insights into music trends, artist popularity, and streaming analytics with our comprehensive Spotify Dataset. Designed for music analysts, marketers, and businesses, this dataset provides structured and reliable data from Spotify to enhance market research, content strategy, and audience engagement.

Dataset Features

Track Information: Access detailed data on songs, including track name, artist, album, genre, and release date. Streaming Popularity: Extract track popularity scores, listener engagement metrics, and ranking trends. Artist & Album Insights: Analyze artist performance, album releases, and genre trends over time. Related Searches & Recommendations: Track related search terms and suggested content for deeper audience insights. Historical & Real-Time Data: Retrieve historical streaming data or access continuously updated records for real-time trend analysis.

Customizable Subsets for Specific Needs Our Spotify Dataset is fully customizable, allowing you to filter data based on track popularity, artist, genre, release date, or listener engagement. Whether you need broad coverage for industry analysis or focused data for content optimization, we tailor the dataset to your needs.

Popular Use Cases

Market Analysis & Trend Forecasting: Identify emerging music trends, genre popularity, and listener preferences. Artist & Label Performance Tracking: Monitor artist rankings, album success, and audience engagement. Competitive Intelligence: Analyze competitor music strategies, playlist placements, and streaming performance. AI & Machine Learning Applications: Use structured music data to train AI models for recommendation engines, playlist curation, and predictive analytics. Advertising & Sponsorship Insights: Identify high-performing tracks and artists for targeted advertising and sponsorship opportunities.

Whether you're optimizing music marketing, analyzing streaming trends, or enhancing content strategies, our Spotify Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
Spotify dataset
kaggle.com
zip
Updated Jun 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gati Ambaliya (2024). Spotify dataset [Dataset]. https://www.kaggle.com/datasets/ambaliyagati/spotify-dataset-for-playing-around-with-sql
Explore at:
zip(309669 bytes)Available download formats
Dataset updated
Jun 17, 2024
Authors
Gati Ambaliya
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Description for Spotify Songs Dataset on Kaggle

Dataset Title: Spotify Songs Dataset

Description: This dataset contains a collection of songs fetched from the Spotify API, covering various genres including "acoustic", "afrobeat", "alt-rock", "alternative", "ambient", "anime", "black-metal", "bluegrass", "blues", "bossanova", "brazil", "breakbeat", "british", "cantopop", "chicago-house", "children", "chill", "classical", "club", "comedy", "country", "dance", "dancehall", "death-metal", "deep-house", "detroit-techno", "disco", "disney", "drum-and-bass", "dub", "dubstep", "edm", "electro", "electronic", "emo", "folk", "forro", "french", "funk", "garage", "german", "gospel", "goth", "grindcore", "groove", "grunge", "guitar", "happy", "hard-rock", "hardcore", "hardstyle", "heavy-metal", "hip-hop", "holidays", "honky-tonk", "house", "idm", "indian", "indie", "indie-pop", "industrial", "iranian", "j-dance", "j-idol", "j-pop", "j-rock", "jazz", "k-pop", "kids", "latin", "latino", "malay", "mandopop", "metal", "metal-misc", "metalcore", "minimal-techno", "movies", "mpb", "new-age", "new-release", "opera", "pagode", "party", "philippines-opm", "piano", "pop", "pop-film", "post-dubstep", "power-pop", "progressive-house", "psych-rock", "punk", "punk-rock", "r-n-b", "rainy-day", "reggae", "reggaeton", "road-trip", "rock", "rock-n-roll", "rockabilly", "romance", "sad", "salsa", "samba", "sertanejo", "show-tunes", "singer-songwriter", "ska", "sleep", "songwriter", "soul", "soundtracks", "spanish", "study", "summer", "swedish", "synth-pop", "tango", "techno", "trance", "trip-hop", "turkish", "work-out", "world-music". Each entry in the dataset provides detailed information about a song, including its name, artists, album, popularity, duration, and whether it is explicit.

Data Collection Method: The data was collected using the Spotify Web API through a Python script. The script performed searches for different genres and retrieved the top tracks for each genre. The fetched data was then compiled and saved into a CSV file.

Columns Description: id: Unique identifier for the track on Spotify. name: Name of the track. genre: genre of the song. artists: Names of the artists who performed the track, separated by commas if there are multiple artists. album: Name of the album the track belongs to. popularity: Popularity score of the track (0-100, where higher is more popular). duration_ms: Duration of the track in milliseconds. explicit: Boolean indicating whether the track contains explicit content.

Potential Uses: This dataset can be used for a variety of purposes, including but not limited to:

Music Analysis: Analyze the popularity and characteristics of songs across different genres.

Recommendation Systems: Develop and test music recommendation algorithms.

Trend Analysis: Study trends in music preferences and popularity over time.

Machine Learning: Train machine learning models for tasks like genre classification or popularity prediction. _ Acknowledgements: This dataset was created using the Spotify Web API. Special thanks to Spotify for providing access to their extensive music library through their API. _ License: This dataset is made available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. You are free to use, modify, and distribute this dataset, provided you give appropriate credit to the original creator. _
Playlist2vec: Spotify Million Playlist Dataset
zenodo.org
data.niaid.nih.gov
+1more
bin
Updated Jun 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Piyush Papreja; Piyush Papreja (2021). Playlist2vec: Spotify Million Playlist Dataset [Dataset]. http://doi.org/10.5281/zenodo.5002584
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5002584
Dataset updated
Jun 22, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Piyush Papreja; Piyush Papreja
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was created using Spotify developer API. It consists of user-created as well as Spotify-curated playlists.
The dataset consists of 1 million playlists, 3 million unique tracks, 3 million unique albums, and 1.3 million artists.
The data is stored in a SQL database, with the primary entities being songs, albums, artists, and playlists.
Each of the aforementioned entities are represented by unique IDs (Spotify URI).
Data is stored into following tables:

album

artist

track

playlist

track_artist1

track_playlist1

album

| id | name | uri |

id: Album ID as provided by Spotify
name: Album Name as provided by Spotify
uri: Album URI as provided by Spotify

artist

| id | name | uri |

id: Artist ID as provided by Spotify
name: Artist Name as provided by Spotify
uri: Artist URI as provided by Spotify

track

| id | name | duration | popularity | explicit | preview_url | uri | album_id |

id: Track ID as provided by Spotify
name: Track Name as provided by Spotify
duration: Track Duration (in milliseconds) as provided by Spotify
popularity: Track Popularity as provided by Spotify
explicit: Whether the track has explicit lyrics or not. (true or false)
preview_url: A link to a 30 second preview (MP3 format) of the track. Can be null
uri: Track Uri as provided by Spotify
album_id: Album Id to which the track belongs

playlist

| id | name | followers | uri | total_tracks |

id: Playlist ID as provided by Spotify
name: Playlist Name as provided by Spotify
followers: Playlist Followers as provided by Spotify
uri: Playlist Uri as provided by Spotify
total_tracks: Total number of tracks in the playlist.

track_artist1

| track_id | artist_id |

Track-Artist association table

track_playlist1

| track_id | playlist_id |

Track-Playlist association table

- - - - - SETUP - - - - -

The data is in the form of a SQL dump. The download size is about 10 GB, and the database populated from it comes out to about 35GB.

spotifydbdumpschemashare.sql contains the schema for the database (for reference):
spotifydbdumpshare.sql is the actual data dump.

Setup steps:
1. Create database

- - - - - PAPER - - - - -

The description of this dataset can be found in the following paper:

Papreja P., Venkateswara H., Panchanathan S. (2020) Representation, Exploration and Recommendation of Playlists. In: Cellier P., Driessens K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Communications in Computer and Information Science, vol 1168. Springer, Cham
h
spotify-tracks-dataset
huggingface.co
Updated Jun 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
maharshipandya (2023). spotify-tracks-dataset [Dataset]. https://huggingface.co/datasets/maharshipandya/spotify-tracks-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2023
Authors
maharshipandya
License
https://choosealicense.com/licenses/bsd/https://choosealicense.com/licenses/bsd/
Description
Content

This is a dataset of Spotify tracks over a range of 125 different genres. Each track has some audio features associated with it. The data is in CSV format which is tabular and can be loaded quickly.

Usage

The dataset can be used for:

Building a Recommendation System based on some user input or preference Classification purposes based on audio features and available genres Any other application that you can think of. Feel free to discuss!

Column… See the full description on the dataset page: https://huggingface.co/datasets/maharshipandya/spotify-tracks-dataset.
🎧 Spotify Global Streaming Data (2024)
kaggle.com
zip
Updated Apr 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Atharva Soundankar (2025). 🎧 Spotify Global Streaming Data (2024) [Dataset]. https://www.kaggle.com/datasets/atharvasoundankar/spotify-global-streaming-data-2024
Explore at:
zip(28022 bytes)Available download formats
Dataset updated
Apr 30, 2025
Authors
Atharva Soundankar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
📊 About the Dataset

This dataset captures the global music streaming trends on Spotify for the year 2024. It provides valuable insights into user preferences across various countries, top-performing artists and albums, streaming hours, and listener behavior patterns. It is designed to support data analysis, machine learning models, and business intelligence dashboards in the music and media industry.

With over 500 rows of clean, non-duplicated, and realistic entries from countries around the world, this dataset is ideal for uncovering:

Global music popularity patterns

Listener engagement across genres and demographics

Artist performance across countries

Revenue forecasting and content recommendations

--
h
spotify-million-song-dataset
huggingface.co
Updated Jun 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vishnu Priya VR (2024). spotify-million-song-dataset [Dataset]. https://huggingface.co/datasets/vishnupriyavr/spotify-million-song-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 16, 2024
Authors
Vishnu Priya VR
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Dataset Card for Spotify Million Song Dataset

Dataset Summary

This is Spotify Million Song Dataset. This dataset contains song names, artists names, link to the song and lyrics. This dataset can be used for recommending songs, classifying or clustering songs.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information Needed]

Dataset Structure Data Instances

[More Information Needed]

Data… See the full description on the dataset page: https://huggingface.co/datasets/vishnupriyavr/spotify-million-song-dataset.
My Spotify Data - Cleaned
kaggle.com
zip
Updated Jan 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Malinga Rajapaksha (2024). My Spotify Data - Cleaned [Dataset]. https://www.kaggle.com/datasets/malingarajapaksha/my-spotify-data-cleaned
Explore at:
zip(2952139 bytes)Available download formats
Dataset updated
Jan 26, 2024
Authors
Malinga Rajapaksha
Description
The dataset contains records of the user's Spotify streaming history, with each row representing a specific instance of a played track. The data includes various attributes providing insights into the user's music listening habits.

Columns:

ts (Timestamp):

The timestamp when the track was played.

platform:

The platform or device used for streaming (e.g., Windows 10).

ms_played:

The duration in milliseconds of how long the track was played.

conn_country:

The country code indicating the user's location during streaming (e.g., LK for Sri Lanka).

master_metadata_track_name:

The name of the track played.

master_metadata_album_artist_name:

The artist of the album to which the track belongs.

master_metadata_album_album_name:

The name of the album containing the track.

spotify_track_uri:

The unique Spotify URI for the track.

reason_start:

The reason for starting the track (e.g., play button clicked).

reason_end:

The reason for ending the track (e.g., track done).

shuffle:

Indicates whether shuffle mode was enabled (True/False).

offline:

Indicates whether the track was played offline (True/False).

offline_timestamp:

Timestamp indicating when the track was played offline (if applicable).

incognito_mode:

Indicates whether incognito mode was enabled (True/False).

Purpose:

This dataset is suitable for performing detailed Exploratory Data Analysis (EDA) to uncover patterns, trends, and insights into the user's music-listening behaviour. Potential analyses could include the distribution of listening durations, favourite artists and tracks, exploration of geographic listening patterns, and examination of usage patterns across different platforms.

Visualization tools such as Matplotlib and Seaborn could be utilized for a more in-depth analysis to create visual representations of the findings. This dataset aligns well with your interest in data science, offering opportunities to apply analytical techniques to real-world streaming data.
Data from: Spotify Playlists Dataset
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Pichl; Eva Zangerle; Eva Zangerle; Martin Pichl (2020). Spotify Playlists Dataset [Dataset]. http://doi.org/10.5281/zenodo.2594557
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2594557
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Martin Pichl; Eva Zangerle; Eva Zangerle; Martin Pichl
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is based on the subset of users in the #nowplaying dataset who publish their #nowplaying tweets via Spotify. In principle, the dataset holds users, their playlists and the tracks contained in these playlists.

The csv-file holding the dataset contains the following columns: "user_id", "artistname", "trackname", "playlistname", where

user_id is a hash of the user's Spotify user name

artistname is the name of the artist

trackname is the title of the track and

playlistname is the name of the playlist that contains this track.

The separator used is , each entry is enclosed by double quotes and the escape character used is \.

A description of the generation of the dataset and the dataset itself can be found in the following paper:

Pichl, Martin; Zangerle, Eva; Specht, Günther: "Towards a Context-Aware Music Recommendation Approach: What is Hidden in the Playlist Name?" in 15th IEEE International Conference on Data Mining Workshops (ICDM 2015), pp. 1360-1365, IEEE, Atlantic City, 2015.
c
Spotify Tracks Dataset
cubig.ai
zip
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Spotify Tracks Dataset [Dataset]. https://cubig.ai/store/products/276/spotify-tracks-dataset
Explore at:
zipAvailable download formats
Dataset updated
May 20, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Spotify Tracks Dataset contains information on tracks from over 125 music genres, including both audio features (e.g., danceability, energy, valence) and metadata (e.g., title, artist, genre).

2) Data Utilization (1) Characteristics of the Spotify Tracks Dataset: • The data is structured in a tabular format at the track level, where each column represents numerical or categorical features based on musical properties. This makes it suitable for recommendation systems, genre classification, and emotion analysis. • It includes multi-dimensional attributes grounded in music theory such as track duration, time signature, energy, loudness, tempo, and speechiness—enabling its use in music classification and clustering tasks.

(2) Applications of the Spotify Tracks Dataset: • Design of Music Recommendation Systems: It can be used to build content-based filtering systems or hybrid recommendation algorithms based on user preferences.
Z
spotify data
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Jul 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryan Hulke (2023). spotify data [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_8114617
Explore at:
Dataset updated
Jul 5, 2023
Dataset provided by
student
Authors
Ryan Hulke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
from kaggle
World's Spotify TOP-50 playlist musicality data
kaggle.com
zip
Updated Nov 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miquel Neck (2023). World's Spotify TOP-50 playlist musicality data [Dataset]. https://www.kaggle.com/datasets/miquelneck/worlds-spotify-top-50-playlist-musicality-data
Explore at:
zip(175413 bytes)Available download formats
Dataset updated
Nov 26, 2023
Authors
Miquel Neck
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
World
Description
Every week, Spotify updates its Top-50 playlists for each country. This dataset includes every country list of the 45th week of 2023 (6th November - 12th November). There are 73 available countries.

The dataset has a column for every musical aspect of each song, and also the name, country, artist and publication date of the track.

Data extracted from the Spotify Official API.

Columns

These features are created by Spotify to analyze tracks. Here I copy the definition of each column, based on Spotify's API documentation.

Danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

Acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

Duration_ms: The duration of the track in milliseconds.

Energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.

Instrumentalness: Predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal".

Key: The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1.

Liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.

Loudness: The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks.

Mode: Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.

Speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value.

Tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

Time_signature: An estimated time signature. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of "3/4", to "7/4".

Valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
Data from: Spotify Playlists
zenodo.org
csv
Updated Jan 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francesco Cambria; Francesco Cambria (2025). Spotify Playlists [Dataset]. http://doi.org/10.5281/zenodo.14728731
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14728731
Dataset updated
Jan 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Francesco Cambria; Francesco Cambria
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was constructed based on the data found in Kaggle from Spotify.

The files here reported can be used to build a property graph in Neo4J:

song.csv - contains all the data for the Song nodes.

artist.csv - contains the data for the Artist nodes.

playlist.csv - contains the data for the Playlist nodes.

user.csv - contains the data for the Playlist nodes (those creating Playlists).

genre.csv - contains the data for the Genre nodes (a category for the Artists).

type.csv - contains the data for the Type nodes (a category for the Playlists).

sing.csv - contains the data for the SING relationship from Artist to Song nodes.

created.csv - contains the data for the CREATED relationship from User to Playlist nodes.

in.csv - contains the data for the IN relationship from Song to Playlist nodes.

of_type.csv - contains the data for the OFTYPE relationship from Playlist to Type nodes.

labelled.csv - contains the data for the LABELLED relationship from Artist to Genre nodes.

This data was used as test dataset in the paper "MINE GRAPH RULE: A New GQL Operator for Mining Association Rules in Property Graph Databases".
Data from: MusicOSet: An Enhanced Open Dataset for Music Data Mining
zenodo.org
data.niaid.nih.gov
+1more
bin, zip
Updated Jun 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota (2021). MusicOSet: An Enhanced Open Dataset for Music Data Mining [Dataset]. http://doi.org/10.5281/zenodo.4904639
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4904639
Dataset updated
Jun 7, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MusicOSet is an open and enhanced dataset of musical elements (artists, songs and albums) based on musical popularity classification. Provides a directly accessible collection of data suitable for numerous tasks in music data mining (e.g., data visualization, classification, clustering, similarity search, MIR, HSS and so forth). To create MusicOSet, the potential information sources were divided into three main categories: music popularity sources, metadata sources, and acoustic and lyrical features sources. Data from all three categories were initially collected between January and May 2019. Nevertheless, the update and enhancement of the data happened in June 2019.

The attractive features of MusicOSet include:

Integration and centralization of different musical data sources

Calculation of popularity scores and classification of hits and non-hits musical elements, varying from 1962 to 2018

Enriched metadata for music, artists, and albums from the US popular music industry

Availability of acoustic and lyrical resources

Unrestricted access in two formats: SQL database and compressed .csv files

| Data | # Records | |:-----------------:|:---------:| | Songs | 20,405 | | Artists | 11,518 | | Albums | 26,522 | | Lyrics | 19,664 | | Acoustic Features | 20,405 | | Genres | 1,561 |
DATA-spotify-data-analysis
kaggle.com
zip
Updated Feb 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LIA GASPARIN (2024). DATA-spotify-data-analysis [Dataset]. https://www.kaggle.com/datasets/liagasparin/data-spotify-data-analysis
Explore at:
zip(88896013 bytes)Available download formats
Dataset updated
Feb 16, 2024
Authors
LIA GASPARIN
Description
Dataset

This dataset was created by LIA GASPARIN

Contents
c
Spotify Playlist ORIGINS Dataset
cubig.ai
zip
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Spotify Playlist ORIGINS Dataset [Dataset]. https://cubig.ai/store/products/402/spotify-playlist-origins-dataset
Explore at:
zipAvailable download formats
Dataset updated
Jun 5, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Spotify Playlist-ORIGINS Dataset is a dataset of Spotify playlists called ORIGINS, which individuals have made with their favorite songs since 2014.

2) Data Utilization (1) Spotify Playlist-ORIGINS Dataset has characteristics that: • This dataset contains detailed music information for each playlist, including song name, artist, album, genre, release year, track ID, and structured metadata such as name, description, and song order for each playlist. (2) Spotify Playlist-ORIGINS Dataset can be used to: • Playlist-based music recommendation and user preference analysis: It can be used to develop a machine learning/deep learning-based music recommendation system or to study user preference analysis using playlist and song information. • Music Trend and Genre Popularity Analysis: It analyzes release year, genre, and artist data and can be used to study the music industry and culture, including music trends by period and genre, and changes in popular artists and songs.
160k Spotify songs from 1921 to 2020 (Sorted)
kaggle.com
Updated Sep 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FCPercival (2022). 160k Spotify songs from 1921 to 2020 (Sorted) [Dataset]. https://www.kaggle.com/datasets/fcpercival/160k-spotify-songs-sorted
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 17, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
FCPercival
Description
This is an analysis of the data on Spotify tracks from 1921-2020 with Jupyter Notebook and Python Data Science tools.

About the Dataset

The Spotify dataset (titled data.csv) consists of 160,000+ tracks sorted by name, from 1921-2020 found in Spotify as of June 2020. Collected by Kaggle user and Turkish Data Scientist Yamaç Eren Ay, the data was retrieved and tabulated from the Spotify Web API. Each row in the dataset corresponds to a track, with variables such as the title, artist, and year located in their respective columns. Aside from the fundamental variables, musical elements of each track, such as the tempo, danceability, and key, were likewise extracted; the algorithm for these values were generated by Spotify based on a range of technical parameters.

Exploratory Data Analysis (EDA)

Studying the correlations between the variables in the Spotify data.

The evolution of different musical elements through the years.

The divide between explicit and non-explicit songs through the years.

Further Investigation and Inference (FII)

Determining if there is a significant difference in popularity between explicit and non-explicit songs.

Finding the most frequent emotions in Spotify tracks and analyzing their musical elements based on the track's mode and key.

Determining the classifications of the Spotify tracks through K-Means Clustering.

Project Directory Guide

Spotify Data.ipynb is the main notebook where the data is imported for EDA and FII.

data.csv is the dataset downloaded from Kaggle.

spotify_eda.html is the HTML file for the comprehensive EDA done using the Pandas Profiling module.

Project Notes

This is in partial fulfillment of the course Statistical Modelling and Simulation (CSMODEL).

Credits to gabminamedez for the original dataset.
Z
Spotify and Youtube
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guarisco, Marco; Sallustio, Marco; Rastelli, Salvatore (2023). Spotify and Youtube [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_10253414
Explore at:
Dataset updated
Dec 4, 2023
Authors
Guarisco, Marco; Sallustio, Marco; Rastelli, Salvatore
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
YouTube
Description
This is the statistics for the Top 10 songs of various spotify artists and their YouTube videos. The Creators above generated the data and uploaded it to Kaggle on February 6-7 2023. The license to use this data is "CC0: Public Domain", allowing the data to be copied, modified, distributed, and worked on without having to ask permission. The data is in numerical and textual CSV format as attached. This dataset contains the statistics and attributes of the top 10 songs of various artists in the world. As described by the creators above, it includes 26 variables for each of the songs collected from spotify. These variables are briefly described next:

Track: name of the song, as visible on the Spotify platform. Artist: name of the artist. Url_spotify: the Url of the artist. Album: the album in wich the song is contained on Spotify. Album_type: indicates if the song is relesead on Spotify as a single or contained in an album. Uri: a spotify link used to find the song through the API. Danceability: describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. Energy: is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. Key: the key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1. Loudness: the overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db. Speechiness: detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks. Acousticness: a confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic. Instrumentalness: predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0. Liveness: detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live. Valence: a measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). Tempo: the overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. Duration_ms: the duration of the track in milliseconds. Stream: number of streams of the song on Spotify. Url_youtube: url of the video linked to the song on Youtube, if it have any. Title: title of the videoclip on youtube. Channel: name of the channel that have published the video. Views: number of views. Likes: number of likes. Comments: number of comments. Description: description of the video on Youtube. Licensed: Indicates whether the video represents licensed content, which means that the content was uploaded to a channel linked to a YouTube content partner and then claimed by that partner. official_video: boolean value that indicates if the video found is the official video of the song. The data was last updated on February 7, 2023.
H
My Spotify Data
dataverse.harvard.edu
Updated Oct 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ty Mulholland (2022). My Spotify Data [Dataset]. http://doi.org/10.7910/DVN/FVCXKG
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/FVCXKG
Dataset updated
Oct 7, 2022
Dataset provided by
Harvard Dataverse
Authors
Ty Mulholland
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
My Spotify Data
Z
Data from: P4KxSpotify: A Dataset of Pitchfork Music Reviews and Spotify...
data-staging.niaid.nih.gov
data.niaid.nih.gov
+1more
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pinter, Anthony T.; Paul, Jacob M.; Jessie Smith; Brubaker, Jed R. (2020). P4KxSpotify: A Dataset of Pitchfork Music Reviews and Spotify Musical Features [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_3603329
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
University of Colorado Boulder
Authors
Pinter, Anthony T.; Paul, Jacob M.; Jessie Smith; Brubaker, Jed R.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
18,403 music reviews scraped from Pitchfork, including relevant metadata such as author, review date, record release year, score, and genre, along with those album's audio features pulled from Spotify's API.

Facebook

Twitter

Click to copy link

Link copied

Cite

Sanjana chaudhari☑️ (2023). Spotify dataset [Dataset]. https://www.kaggle.com/datasets/sanjanchaudhari/spotify-dataset

Spotify dataset

Explore at:

zip(2045049 bytes)Available download formats

Dataset updated

Jul 20, 2023

Authors

Sanjana chaudhari☑️

License

ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically

Description

Introduction to the Spotify Dataset

Overview of the Dataset Source and Purpose Description of the Data Collection Process Data Exploration

Understanding the Structure and Size of the Dataset Overview of the Features and Columns Key Features in the Spotify Dataset

Explanation of Important Columns (e.g., track name, artist, album, duration, popularity) Genre and Music Category Analysis

Categorizing Songs by Genre and Music Type Most Popular Genres on Spotify **Artist Analysis ** Identifying Top Artists Based on Popularity and Number of Songs Relationship between Artist and Song Attributes Song Duration Analysis

Distribution of Song Durations Impact of Song Duration on Popularity and Listener Engagement Song Popularity and Listener Engagement

Analyzing the Popularity Scores of Songs Correlation between Popularity and Other Song Features Audio Features Analysis

Examination of Audio Features (danceability, energy, instrumentalness, etc.) Clustering Songs Based on Audio Features Time-Based Analysis

Seasonal Trends in Song Releases and Popularity Time Series Analysis of Listening Patterns Collaborations and Featured Artists

Frequency of Collaborations and Featured Artists Impact of Collaborations on Song Popularity Recommendation Systems

Overview of Spotify's Recommendation Algorithms Building Simple Recommendation Models User Behavior and Playlist Analysis

Analysis of User-Generated Playlists Common Song Additions and Removals

Clear search

Close search

Google apps

Main menu

Spotify dataset

Spotify Dataset

Spotify dataset

Description for Spotify Songs Dataset on Kaggle

Dataset Title: Spotify Songs Dataset

Playlist2vec: Spotify Million Playlist Dataset

spotify-tracks-dataset

🎧 Spotify Global Streaming Data (2024)

📊 About the Dataset

spotify-million-song-dataset

My Spotify Data - Cleaned

The dataset contains records of the user's Spotify streaming history, with each row representing a specific instance of a played track. The data includes various attributes providing insights into the user's music listening habits.

Columns:

Purpose:

Data from: Spotify Playlists Dataset

Spotify Tracks Dataset

spotify data

World's Spotify TOP-50 playlist musicality data

Columns

Data from: Spotify Playlists

Data from: MusicOSet: An Enhanced Open Dataset for Music Data Mining

DATA-spotify-data-analysis

Dataset

Contents

Spotify Playlist ORIGINS Dataset

160k Spotify songs from 1921 to 2020 (Sorted)

About the Dataset

Exploratory Data Analysis (EDA)

Further Investigation and Inference (FII)

Project Directory Guide

Project Notes

Spotify and Youtube

My Spotify Data

Data from: P4KxSpotify: A Dataset of Pitchfork Music Reviews and Spotify...

Spotify dataset