36 datasets found

My Spotify Data - Cleaned
kaggle.com
zip
Updated Jan 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Malinga Rajapaksha (2024). My Spotify Data - Cleaned [Dataset]. https://www.kaggle.com/datasets/malingarajapaksha/my-spotify-data-cleaned
Explore at:
zip(2952139 bytes)Available download formats
Dataset updated
Jan 26, 2024
Authors
Malinga Rajapaksha
Description
The dataset contains records of the user's Spotify streaming history, with each row representing a specific instance of a played track. The data includes various attributes providing insights into the user's music listening habits.

Columns:

ts (Timestamp):

The timestamp when the track was played.

platform:

The platform or device used for streaming (e.g., Windows 10).

ms_played:

The duration in milliseconds of how long the track was played.

conn_country:

The country code indicating the user's location during streaming (e.g., LK for Sri Lanka).

master_metadata_track_name:

The name of the track played.

master_metadata_album_artist_name:

The artist of the album to which the track belongs.

master_metadata_album_album_name:

The name of the album containing the track.

spotify_track_uri:

The unique Spotify URI for the track.

reason_start:

The reason for starting the track (e.g., play button clicked).

reason_end:

The reason for ending the track (e.g., track done).

shuffle:

Indicates whether shuffle mode was enabled (True/False).

offline:

Indicates whether the track was played offline (True/False).

offline_timestamp:

Timestamp indicating when the track was played offline (if applicable).

incognito_mode:

Indicates whether incognito mode was enabled (True/False).

Purpose:

This dataset is suitable for performing detailed Exploratory Data Analysis (EDA) to uncover patterns, trends, and insights into the user's music-listening behaviour. Potential analyses could include the distribution of listening durations, favourite artists and tracks, exploration of geographic listening patterns, and examination of usage patterns across different platforms.

Visualization tools such as Matplotlib and Seaborn could be utilized for a more in-depth analysis to create visual representations of the findings. This dataset aligns well with your interest in data science, offering opportunities to apply analytical techniques to real-world streaming data.
Spotify Dataset
brightdata.com
.json, .csv, .xlsx
Updated Apr 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Spotify Dataset [Dataset]. https://brightdata.com/products/datasets/spotify
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Apr 10, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Gain valuable insights into music trends, artist popularity, and streaming analytics with our comprehensive Spotify Dataset. Designed for music analysts, marketers, and businesses, this dataset provides structured and reliable data from Spotify to enhance market research, content strategy, and audience engagement.

Dataset Features

Track Information: Access detailed data on songs, including track name, artist, album, genre, and release date. Streaming Popularity: Extract track popularity scores, listener engagement metrics, and ranking trends. Artist & Album Insights: Analyze artist performance, album releases, and genre trends over time. Related Searches & Recommendations: Track related search terms and suggested content for deeper audience insights. Historical & Real-Time Data: Retrieve historical streaming data or access continuously updated records for real-time trend analysis.

Customizable Subsets for Specific Needs Our Spotify Dataset is fully customizable, allowing you to filter data based on track popularity, artist, genre, release date, or listener engagement. Whether you need broad coverage for industry analysis or focused data for content optimization, we tailor the dataset to your needs.

Popular Use Cases

Market Analysis & Trend Forecasting: Identify emerging music trends, genre popularity, and listener preferences. Artist & Label Performance Tracking: Monitor artist rankings, album success, and audience engagement. Competitive Intelligence: Analyze competitor music strategies, playlist placements, and streaming performance. AI & Machine Learning Applications: Use structured music data to train AI models for recommendation engines, playlist curation, and predictive analytics. Advertising & Sponsorship Insights: Identify high-performing tracks and artists for targeted advertising and sponsorship opportunities.

Whether you're optimizing music marketing, analyzing streaming trends, or enhancing content strategies, our Spotify Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
My Spotify Data
kaggle.com
zip
Updated Dec 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nilay Gaitonde (2021). My Spotify Data [Dataset]. https://www.kaggle.com/nilaygaitonde/my-spotify-data
Explore at:
zip(92508 bytes)Available download formats
Dataset updated
Dec 15, 2021
Authors
Nilay Gaitonde
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Nilay Gaitonde

Released under CC0: Public Domain

Contents
MY SPOTIFY WRAPPED EDA
kaggle.com
zip
Updated Jan 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Imraan Virani (2022). MY SPOTIFY WRAPPED EDA [Dataset]. https://www.kaggle.com/imraanvirani/my-spotify-wrapped-eda
Explore at:
zip(60516 bytes)Available download formats
Dataset updated
Jan 1, 2022
Authors
Imraan Virani
Description
Dataset

This dataset was created by Imraan Virani

Contents

Spotify Songs for ML & Analysis (8700+ tracks)

kaggle.com

zip

Updated Nov 6, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

AlyAhmedTS13 (2025). Spotify Songs for ML & Analysis (8700+ tracks) [Dataset]. https://www.kaggle.com/datasets/alyahmedts13/spotify-songs-for-ml-and-analysis-over-8700-tracks

Explore at:

zip(1289021 bytes)Available download formats

Dataset updated

Nov 6, 2025

Authors

AlyAhmedTS13

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

🕹️ About Dataset

🎯 Context

What makes a song popular on Spotify?
Do artist popularity and follower count influence track success more than audio features?
How do album types and release dates shape listening trends?

These were the questions that inspired me to build this dataset.

Using Spotify’s API, I collected data on over 8,700 tracks, capturing detailed metadata about songs, artists, and albums. This dataset is ideal for exploring the intersection of music analytics, artist influence, and streaming behavior.

📦 Content

This dataset contains one CSV file with over 8,700 rows. Each row represents a unique track and includes metadata across three dimensions: track, artist, and album.

Column Name	Description
`track_id`	Unique identifier for the track
`track_number`	Track’s position on the album
`track_popularity`	Spotify popularity score (0–100)
`track_duration_ms`	Duration of the track in milliseconds
`explicit`	Whether the track contains explicit content
`artist_name`	Name of the performing artist
`artist_popularity`	Spotify popularity score for the artist
`artist_followers`	Number of Spotify followers for the artist
`album_id`	Unique identifier for the album
`album_name`	Name of the album
`album_release_date`	Original release date of the album
`artist_genres`	Genre tags associated with the artist
`album_total_tracks`	Total number of tracks on the album
`album_type`	Type of album (e.g., album, single, compilation)

🙏 Acknowledgements

All data was collected using the Spotify Web API.
This dataset is intended for educational and research purposes only.

💡 Inspiration

You can use this dataset to:

Analyze which artist traits correlate with track popularity
Explore genre trends across different album types and release years
Build machine learning models to predict song success
Visualize music trends using Power BI or Python
Compare artists, albums, or genres based on metadata

Cleaned Version

A cleaned version of the dataset (spotify_data_clean.csv) is now available. It includes:

Cleaning Process (SQL)

The cleaned dataset (spotify_data_clean.csv) was generated through a multi-step SQL pipeline designed to ensure consistency, completeness, and usability for analysis. Below is a summary of the transformations applied:

🔍 Null Handling & Imputation

Identified and removed rows with missing track_name.
Imputed missing artist_name, artist_popularity, artist_followers, and artist_genres using album-level joins (e.g., for albums like 1989).
Replaced remaining nulls with default values:
- 'N/A' for strings
- 0 for numeric fields
- '[]' for genre arrays (temporary placeholder)

✨ Standardization

Trimmed whitespace from key fields: track_name, artist_name, album_name, album_type, explicit.
Converted explicit values to uppercase (TRUE / FALSE).
Cleaned artist_genres using regex to remove brackets and quotes.

📅 Release Date Normalization

For year-only dates (e.g., 2020), appended -06-30 to estimate mid-year.
For year-month formats (e.g., 2020-07), appended -01 to complete the date.
Converted all dates to DATE format using STR_TO_DATE().

⏱ Duration Conversion

Added a new column track_duration_min by converting track_duration_ms to minutes.
Dropped the original track_duration_ms column after conversion.

🎵 Genre Enrichment

Populated missing artist_genres for well-known artists using manual overrides:
- Taylor Swift: country, pop, indie, folk
- Olivia Rodrigo: pop rock, alternative pop, pop punk
- Billie Eilish: alternative pop, electropop, dark pop
- (...and more for 10+ artists)
Remaining empty genres were replaced with 'N/A'.

🧹 Deduplication

Used ROW_NUMBER() over track_name, artist_name, album_name, and album_release_date to identify duplicates.
Removed duplicate rows and dropped the helper row_num column.

This SQL workflow ensures the dataset is clean, consistent, and ready for exploratory data analysis, genre modeling, and public sharing. All transformations were verified using sample queries and profiling tools.

Example Analysis

Explore genre trends and usage patterns in this companion notebook:
👉 Top Genres Using Pandas

🤝 Contribute

Feel free to fork the dataset or share your analyses!
If you clean, enrich, or expand the dataset, contributions are always welcome.

Full Spotify Streaming History
kaggle.com
zip
Updated Feb 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anshul Raj Verma (2025). Full Spotify Streaming History [Dataset]. https://www.kaggle.com/datasets/arvanshul/full-spotify-streaming-history
Explore at:
zip(1953408 bytes)Available download formats
Dataset updated
Feb 22, 2025
Authors
Anshul Raj Verma
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset contains my all Streaming History of Spotify yet from June-2020 to February-2025.

I have requested this dataset from Spotify itself.

Use case of the Dataset:

Analyse user's listening behaviour and extract interesting insights from it.

You can practice SQL queries on this dataset extensively.

It has datetime column so you can also perform some time series analysis (at some extent).
Spotify top artists by monthly listeners
kaggle.com
zip
Updated Sep 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
meer atif magsi (2023). Spotify top artists by monthly listeners [Dataset]. https://www.kaggle.com/datasets/meeratif/spotify-top-artists-by-monthly-listeners
Explore at:
zip(59910 bytes)Available download formats
Dataset updated
Sep 10, 2023
Authors
meer atif magsi
Description
Welcome to the "Spotify Top Artists by Monthly Listeners" dataset! Dive into the world of music streaming with this comprehensive collection of data, which offers valuable insights into the artists dominating the digital airwaves on Spotify, one of the world's leading music streaming platforms.

Columns:

Artist: The "Artist" column contains the names or unique identifiers of musical artists. Each row in this column represents a specific artist. This column serves as the primary identifier for the artists in the dataset.

Listeners: The "Listeners" column represents the number of listeners or fans for each artist. It quantifies the artist's fan base or the total number of people who have engaged with their music.

Daily Trend: The "Daily Trend" column contains data that reflects the daily trend or change in popularity of each artist. It may include metrics or values indicating whether an artist is gaining or losing listeners on a daily basis. This metric helps to track an artist's current momentum and popularity.

Peak: The "Peak" column signifies the highest point of popularity or the peak level of engagement that an artist has achieved within a specific timeframe. It provides insights into the artist's historical performance and when they were most widely appreciated.

PkListeners: The "PkListeners" column represents the number of listeners an artist had at the peak of their popularity. This metric offers a specific quantitative measure of an artist's highest level of engagement with their audience.

This dataset is a goldmine for music enthusiasts, data analysts, and researchers eager to explore the dynamics of popularity and musical diversity on Spotify. It provides a rich source of information for tracking artist trends, analyzing genre preferences, and gaining a deeper understanding of the global music landscape.

Whether you're interested in uncovering emerging artists, studying the impact of genres, or simply exploring the musical tastes of Spotify's user base, this dataset offers a robust foundation for insightful analyses and engaging visualizations.

Join us on a data-driven journey through the Spotify music ecosystem as we uncover the artists captivating the ears of millions of listeners, and let the data guide your exploration of the vibrant and ever-evolving world of music. Happy analyzing!
Eminem Album Trends
kaggle.com
zip
Updated May 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaivalya Powale (2020). Eminem Album Trends [Dataset]. https://www.kaggle.com/kaivalyapowale/eminem-album-trends
Explore at:
zip(15869 bytes)Available download formats
Dataset updated
May 23, 2020
Authors
Kaivalya Powale
Description
Eminem is one of the most influential hip-hop artists of all time, and the Rap God. I acquired this data using Spotify APs and supplemented it with other research to add to my own analysis. You can find my original analysis here: https://kaivalyapowale.com/2020/01/25/eminems-album-trends-and-music-to-be-murdered-by-2020/

My analysis was also published by top hip-hop websites: HipHop 24x7 - Data analysis reveals M2BMB is the most negative album Eminem Pro - Album's data analysis Eminem Pro - Eminem's albums are getting shorter

You can also check out visualizations on Tableau Public for some ideas: https://public.tableau.com/profile/kaivalya.powale#!/

Content

I have primarily used data from Spotify’s API using multiple endpoints for albums and tracks. I supplemented the data with stats from Billboard and calculations from this post.

Here's the explanation for all the audio features provided by Spotify!

I have researched data about album sales from multiple sources online. They are cited in my original analysis.

Acknowledgements

Here are the Spotify's Album endpoints. Charts data from Billboard. Swear data from this source.

Inspiration

I'd love to see new visualizations using this data or using the sales, swear, or duration for an analysis. It would be wonderful if someone compares this with other hip-hop greats.
Spotify Song Attributes
kaggle.com
zip
Updated Aug 4, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GeorgeMcIntire (2017). Spotify Song Attributes [Dataset]. https://www.kaggle.com/forums/f/5360/spotify-song-attributes
Explore at:
zip(100786 bytes)Available download formats
Dataset updated
Aug 4, 2017
Authors
GeorgeMcIntire
Description
Context

A dataset of 2017 songs with attributes from Spotify's API. Each song is labeled "1" meaning I like it and "0" for songs I don't like. I used this to data to see if I could build a classifier that could predict whether or not I would like a song.

I wrote an article about the project I used this data for. It includes code on how to grab this data from the Spotipy API wrapper and the methods behind my modeling. https://opendatascience.com/blog/a-machine-learning-deep-dive-into-my-spotify-data/

Content

Each row represents a song.

There are 16 columns. 13 of which are song attributes, one column for song name, one for artist, and a column called "target" which is the label for the song.

Here are the 13 track attributes: acousticness, danceability, duration_ms, energy, instrumentalness, key, liveness, loudness, mode, speechiness, tempo, time_signature, valence.

Information on what those traits mean can be found here: https://developer.spotify.com/web-api/get-audio-features/

Acknowledgements

I would like to thank Spotify for providing this readily accessible data.

Inspiration

I'm a music lover who's curious about why I love the music that I love.
Billie Eilish Spotify Analysis
kaggle.com
zip
Updated Jan 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Billie Eilish Spotify Analysis [Dataset]. https://www.kaggle.com/datasets/thedevastator/billie-eilish-spotify-analysis
Explore at:
zip(37509 bytes)Available download formats
Dataset updated
Jan 21, 2023
Authors
The Devastator
Description
Billie Eilish Spotify Analysis

Investigating Popularity, Danceability, and Lyrics

By Priyanka Dobhal [source]

About this dataset

This dataset contains information about the music of Billie Eilish on Spotify, including track name, acousticness, danceability, energy, instrumentalness, liveness, loudness, speechiness and tempo. It also includes data about the popularity of each song and the artist behind it. Each song is also uniquely identified using its URI. This dataset gives us insight into what characteristics make up Billie Eilish's music and how popular her songs are. With this dataset we can analyse what factors influence a song's popularity to better understand why some songs become hits while others don't get as much attention. We can also compare the features of her music to other artists' songs in order to find similarities and differences between them both in sound style and how much people listen to them

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains information about Billie Eilish's music on Spotify, including track name, acousticness, danceability, energy, instrumentalness, liveness, loudness, speechiness

Research Ideas

Analyzing the effect of musical attributes (e.g. acousticness, danceability, energy, etc.) on listeners' engagement with a specific artist's music.

Exploring the relationship between lyrical content and popularity of an artist's songs to discover potential trends in songwriting approaches that increase or decrease a song's chances of success.

Finding correlations between lyrical and musical elements to gain insights into popular music trends over time or within Billie Eilish’s discography specifically

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

Columns

File: Billie_Eilish_Spotify.csv | Column name | Description | |:---------------------|:--------------------------------------------------------------| | album | The name of the album the song is from. (String) | | track_number | The track number of the song on the album. (Integer) | | uri | The unique identifier for the song. (String) | | acousticness | A measure of how acoustic the song is. (Float) | | danceability | A measure of how suitable a song is for dancing. (Float) | | energy | A measure of the intensity and activity of a song. (Float) | | instrumentalness | A measure of how much of the song is instrumental. (Float) | | liveness | A measure of how much the song was performed live. (Float) | | loudness | A measure of the volume of the song. (Float) | | speechiness | A measure of how much the song contains spoken words. (Float) | | tempo | The speed of the song. (Float) | | valence | A measure of the positivity of the song. (Float) | | popularity | A measure of how popular the song is. (Integer) | | artist | The artist who produced and performs the song. (String) |

File: Billie_Eilish_Lyrics_to_words.csv | Column name | Description | |:-----------------|:--------------------------------------------------------| | album | The name of the album the song is from. (String) | | track_number | The track number of the song on the album. (Integer) | | uri | The unique identifier for the song. (String) | | artist | The artist who produced and performs the song. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original a...
Spotify Dataset 1921-2020, 600k+ Tracks
kaggle.com
zip
Updated Mar 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yamac Eren Ay (2022). Spotify Dataset 1921-2020, 600k+ Tracks [Dataset]. https://www.kaggle.com/datasets/yamaerenay/spotify-dataset-19212020-600k-tracks
Explore at:
zip(201984462 bytes)Available download formats
Dataset updated
Mar 13, 2022
Authors
Yamac Eren Ay
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
About

For more in-depth information about audio features provided by Spotify: https://developer.spotify.com/documentation/web-api/reference/#/operations/get-audio-features

I reposted my old dataset as many people requested. I don't consider updating the dataset further.

Meta-information

Title: Spotify Dataset 1921-2020, 600k+ Tracks Subtitle: Audio features of 600k+ tracks, popularity metrics of 1M+ artists Source: Spotify Web API Creator: Yamac Eren Ay Release Date (of Last Version): April 2021 Link to this dataset: https://www.kaggle.com/yamaerenay/spotify-dataset-19212020-600k-tracks Link to the old dataset: https://www.kaggle.com/yamaerenay/spotify-dataset-1921-2020-160k-tracks

Disclaimer

I am not posting here third-party Spotify data for arbitrary reasons or getting upvote.

The old dataset has been mentioned in tens of scientific papers using the old link which doesn't work anymore since July 2021, and most of the authors had some problems proving the validity of the dataset. You can cite the same dataset under the new link. I'll be posting more information regarding the old dataset.

If you have inquiries or complaints, please don't hesitate to reach out to me on LinkedIn or you can send me an email.
Spotify Most Popular Songs Dataset
kaggle.com
zip
Updated Feb 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RishabhPancholi1302 (2025). Spotify Most Popular Songs Dataset [Dataset]. https://www.kaggle.com/datasets/rishabhpancholi1302/spotify-most-popular-songs-dataset
Explore at:
zip(3707341 bytes)Available download formats
Dataset updated
Feb 21, 2025
Authors
RishabhPancholi1302
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Spotify Most Popular Songs Dataset 🎵

Overview:

This dataset contains a collection of the most popular songs on Spotify, along with various attributes that can be used for music analysis and recommendation systems. It includes audio features, lyrical details, and general metadata about each track, making it an excellent resource for machine learning, data science, and music analytics projects.

Each song in the dataset includes the following features:

🎧 Audio Features (Extracted from Spotify API):

- Danceability – How suitable a track is for dancing (0.0 – 1.0).

- Energy – Intensity and activity level of a song (0.0 – 1.0).

- Loudness – Overall loudness in decibels (dB).

- Speechiness – Presence of spoken words in the track (0.0 – 1.0).

- Acousticness – Probability that a track is acoustic (0.0 – 1.0).

- Instrumentalness – Predicts if a track is instrumental (0.0 – 1.0).

- Liveness – Probability of a live audience (0.0 – 1.0).

- Valence – Musical positivity or happiness (0.0 – 1.0).

- Tempo – Beats per minute (BPM) of the track.

- Key & Mode – Musical key and mode (major/minor).

📝 Lyrics-Based Features:

- Lyrics Text – Full lyrics of the song (if available).

🎶 General Song Information:

- Track Name – Name of the song.

- Artist(s) – Performing artist(s).

- Album Name – Album the track belongs to.

- Release Year – Year when the song was released.

- Genre – Song’s primary genre classification.

- Popularity Score – Spotify popularity metric (0 – 1).

Use Cases 🚀:

This dataset is ideal for:

- Music Recommendation Systems – Build collaborative or content-based recommenders.

- Audio Feature Analysis – Discover trends in song characteristics.

- Sentiment Analysis – Study how song lyrics relate to emotions.

- Hit Song Prediction – Use machine learning to predict song popularity.

- Music Genre Classification – Train classifiers to categorize music.

Acknowledgments:

Data collected using the Spotify API and other sources. If you use this dataset, consider crediting it in your projects!
Spotify Playlist Analysis Datasets
kaggle.com
zip
Updated Jun 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mattia Girolami (2023). Spotify Playlist Analysis Datasets [Dataset]. https://www.kaggle.com/datasets/mattiagirolami/spotify-playlist-analisys-datasets
Explore at:
zip(748016 bytes)Available download formats
Dataset updated
Jun 27, 2023
Authors
Mattia Girolami
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The Spotify Playlist Analysis Dataset is a collection of 3,232 different songs obtained from a personal music library and has been uploaded to Kaggle for analysis and exploration. The dataset offers a comprehensive overview of various tracks, enabling researchers, music enthusiasts, and data scientists to gain insights into the musical preferences and characteristics present in the library.
Spotify Dataset 2023
kaggle.com
zip
Updated Dec 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tony Gordon Jr. (2023). Spotify Dataset 2023 [Dataset]. https://www.kaggle.com/datasets/tonygordonjr/spotify-dataset-2023/code
Explore at:
zip(101062584 bytes)Available download formats
Dataset updated
Dec 20, 2023
Authors
Tony Gordon Jr.
Description
I've been diving into the vibrant world of data for a solid two years, and guess what? I'm finally cracking the code on what it takes to soar in this industry! Early in my data adventures, I was like a kid on Limewire when I found Kaggle, downloading everything that caught my eye. But then, I stumbled upon Spotify's data and... let's just say, it was a bit of a reality check.

I found myself wrestling with duplicate records, scratching my head over inconsistent schemas, and feeling lost in the sauce without any guides. That experience was a game-changer for me. I made a promise to my future self: “When you've got the skills, create a dataset that's not just good, but legendary.” That time has come!

Introducing my unique Spotify dataset – a crystal clear reflection of dedication and clarity. What makes this set stand out? You're not just getting data; you're getting a story. You can literally trace my steps, unraveling the magic behind each table through my script on Github. It's like having a backstage pass to a data concert! (Yes, Swifties will love this dataset too 😉)

I'm all about transparency, and I believe it's the key to trust. With this dataset, I'm laying it all out there – no smoke and mirrors, just pure, unadulterated, CLEAN data. I want you to feel the same excitement I do when data just clicks into place. I encourage you all to checkout the Github repo I linked above to see how this dataset came to life!

If you have any questions, suggestions or simply want to network, reach out to me on LinkedIn

This dataset is created using data sourced from Spotify and adheres to their Terms of Use. The dataset is intended for non-commercial, academic purposes and does not infringe upon Spotify's intellectual property rights. For full details on Spotify's terms, please visit Spotify's Terms and Conditions of Use.

You can find documentation for Spotifys Web APIs here

As of 12/20/2023, this is V1 of my data and I'll most likely release a few more versions after working through kinks from former releases.

Other Datasets: - Zillow
Streaming Activity Dataset
kaggle.com
zip
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Streaming Activity Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/streaming-activity-dataset/code
Explore at:
zip(3586470 bytes)Available download formats
Dataset updated
Dec 4, 2023
Authors
The Devastator
Description
Streaming Activity Dataset

4 years of diverse music streaming data across platforms and ages

By Sean Miller [source]

About this dataset

The dataset consists of two main files: Scrobble_Features.csv and My Streaming Activity.csv. The Scrobble_Features.csv file contains detailed information about the music tracks, including genre, duration, popularity, and various audio features. On the other hand, the My Streaming Activity.csv file offers 4 years' worth of music streaming data from multiple platforms.

Key columns in these files include: - Performer: The name of the performer or artist. - Song: The title of the song. - Album: The name of the album that each song belongs to. - spotify_genre: The genre(s) assigned to each song according to Spotify's classification. - spotify_track_preview_url: URLs providing previews for each song on Spotify. - spotify_track_duration_ms: The duration of each song in milliseconds. - spotify_track_popularity: A popularity score indicating how popular each track is on Spotify. - spotify_track_explicit: A boolean value indicating whether or not a track contains explicit content.

Further musical attributes are also included: - danceability: A measure determining how suitable a song is for dancing based on various musical elements. - energy: An indicator measuring the intensity and activity level present in a song's composition. - key: Identifies the key signature (e.g., C major) that each track is performed in - loudness: Reveals how loud or soft a given track is overall in decibels (dB). - mode : Indicates whether a given track is composed in major or minor scale/mode. These attributes aim to provide insights into different aspects of a song's overall composition and impact.

Additionally, this dataset offers information about the timestamps when streaming activities occurred in both Central Time Zone (TimeStamp_Central) and Coordinated Universal Time (UTC) (TimeStamp_UTC).

How to use the dataset

In this guide, we will walk you through how to effectively use this dataset for your analysis or projects. Let's get started!

Understanding the Columns

Before diving into analyzing the data, let's understand the meaning of each column in the dataset:

Performer: The name of the performer or artist of the song.

Song: The title of the song.

spotify_genre: The genre(s) of the song according to Spotify.

spotify_track_preview_url: The URL of a preview of the song on Spotify.

spotify_track_duration_ms: The duration of the song in milliseconds.

spotify_track_popularity: The popularity score of the song on Spotify. (Numeric/Integer)

spotify_track_explicit: Indicates whether the song contains explicit content. (Boolean)

danceability: A measure of how suitable a song is for dancing based on a combination of musical elements. (Numeric/Float)

energy: A measure o fthe intensity and activity level present in a track.(Alternatively it can also represent acoustic as well). (Numeric/Float)

'key'- represents grouping.of songs based on keys found within that specific set pf songs

'loundess' represents how loud or.silent that particular tract is usually defines by Clown Circle Diameter'.(diameter varies with loudness(sound pressure level). -'mode':defines what type/modeis represented(i.e If Major mode denoted by '1',If minor mood is denoted.by value '0') -'Speechiness':Detecting spoken words(actually presence/removal of spoken dialects.song verses). -Acousticness:Probability of track being acoustic,concerted,edt. -instrumentalness-instrumental.also calcylates effectively considering odds and ends ( for example; Intensity of beat.Solo drumming. -'liveness':a sentiment reflecting the probability that a song was performed since the recording being analysed 'valence'-The musical positivity/cheerfulness conveyed by a track.'1'represents most positive ;'0'mostly one(most presumably sad) -tempo:'Rate at which particular beats re occur in.oncluding beats); BPM (

Research Ideas

Music Recommendation System: This dataset can be used to develop a music recommendation system by analyzing the streaming activity and audio features of different songs. By understanding the preferences and listening habits of users, personalized music recommendations can be generated for individuals or households.

Genre Analysis and Trends: The dataset provides information about the performer, genre, and popularity of songs. This data can be utilized to analyze trends in music genres over the years, identify popular artists in different genres, and understand the ...
Spotify dataset - A.R.Rahman
kaggle.com
zip
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tarun Sirpi (2024). Spotify dataset - A.R.Rahman [Dataset]. https://www.kaggle.com/datasets/tarunsirpi/spotify-dataset-a-r-rahman
Explore at:
zip(163610 bytes)Available download formats
Dataset updated
Oct 21, 2024
Authors
Tarun Sirpi
Description
This dataset aims to get data from the Spotify API and do a data analysis on the acquired data. The inspiration behind this is to analyse the tracks of my favourite artist (A.R.Rahman) , which is done using the features provided by Spotify for each track.

The dataset is created using python by utilizing the the spotipy module to get data from the Spotify API and process the data using the pandas module.

For more information and source code, please refer the following Github repository link:

https://github.com/tarunsirpi/spotify-data-analysis
Spotify Customer Churn dataset
kaggle.com
zip
Updated Jul 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdul Wadood (2025). Spotify Customer Churn dataset [Dataset]. https://www.kaggle.com/abdulwadood11220/spotify-customer-churn-dataset
Explore at:
zip(11039 bytes)Available download formats
Dataset updated
Jul 2, 2025
Authors
Abdul Wadood
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
🎧 Can you predict who’s about to hit “unsubscribe”?

This dynamic Spotify churn dataset gives you the chance to find out. Packed with real-world user behavior — from listening time, skips, and ads seen to premium usage, plan types, and login activity — this dataset is your backstage pass into how users interact with a top music streaming platform.

With key demographic info (age, gender, country) and a clear churn indicator, it’s perfect for building powerful machine learning models, uncovering retention trends, or exploring the secret sauce behind loyal listeners.

Whether you're aiming to reduce churn, boost engagement, or showcase your data science skills, this dataset hits all the right notes. 🎶 Ready to turn data into decisions? Hit play.
Spotify Tracks Genre
kaggle.com
zip
Updated Nov 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Spotify Tracks Genre [Dataset]. https://www.kaggle.com/thedevastator/spotify-tracks-genre-dataset
Explore at:
zip(8571539 bytes)Available download formats
Dataset updated
Nov 30, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Spotify Tracks Genre

Audio features of tracks across diverse genres

By maharshipandya (From Huggingface) [source]

About this dataset

This dataset provides comprehensive information about Spotify tracks encompassing a diverse collection of 125 genres. It has been compiled and cleaned using Spotify's Web API and Python. Presented in CSV format, this dataset is easily accessible and amenable to analysis. The dataset comprises multiple columns, each representing distinctive audio features associated with individual tracks.

The columns include: artists (the name of the artist or artists who performed the track), album_name (the title of the album to which the track belongs), track_name (the specific name of each track), popularity (a numerical score indicating the popularity of a song on Spotify ranging from 0 to 100), duration_ms (the duration of each track measured in milliseconds), explicit (a boolean value denoting whether a song contains explicit content or not).

Furthermore, there are various audio features that provide deep insights into the musical characteristics of each track. These features include danceability, energy, key, loudness, mode, speechiness (indicating whether spoken words are present in a song), acousticness (measuring how much a song leans towards acoustic sounds rather than electric ones), instrumentalness (indicating how likely it is for a song to be instrumental rather than vocal-oriented).

Additional audio attributes encompass liveness, reflecting the presence or absence of live audience elements within tracks; valence quantifying musical positiveness conveyed by a song; tempo denoting beats per minute; and time_signature revealing details about bar structures within tracks.

The dataset enables users to discern patterns across multiple genres while also facilitating genre prediction based on perceptible audio nuances derived through machine learning models.

Aspiring audiophiles, music enthusiasts,and data scientists can effectively harness this repository for research purposes—fostering extensive exploration into genre dynamics and comprehending nuanced relationships between various musical attributes featured in these Spotify masterpieces

How to use the dataset

Introduction:

Download and Load the Dataset: Start by downloading the dataset from Kaggle in CSV format. Once downloaded, load the dataset into your preferred programming environment or tool such as Python, R, or Excel.

Familiarize Yourself with the Columns: Take some time to understand the meaning of each column in the dataset:

artists: The name of the artist(s) who performed the track.

album_name: The name of at album that contains a given track.

track_name: The name of a specific track.

popularity: A score indicating how popular a track is on Spotify (ranging from 0 to 100).

duration_ms: The duration of a track in milliseconds.

explicit: Indicates whether a track contains explicit content (True or False).

Explore Audio Features: This dataset includes various audio features associated with each track. Here are some notable ones:

A. Danceability: Danceability measures how suitable a track is for dancing, ranging from 0 to 1. Tracks with high danceability scores are more energetic and rhythmic, making them ideal for dancing.

B. Energy: Energy represents intensity and activity within a song on a scale from 0 to 1. Tracks with high energy tend to be more fast-paced and intense.

C.Loudness: Loudness indicates how loud or quiet an entire song is in decibels (dB). Positive values represent louder songs while negative values suggest quieter ones.

D.Key: Key refers to different musical keys assigned integers ranging from 0-11, with each number representing a different key. Knowing the key can provide insights into the mood and tone of a song.

E.Valence: Valence measures the musical positiveness conveyed by a track, ranging from 0 to 1. High valence values indicate more positive or happy tracks, while lower values suggest more negative or sad ones.

F.Tempo: Tempo is the speed or pace of a song in beats per minute (BPM). It gives an idea about how fast or slow a track is.

Data Analysis and Visualization: Utilize various data analysis techniques and visualization tools to gain insights into the

Research Ideas

Music Recommendation System: With multiple audio features such as danceability, energy, and valence, this dataset can be used to bu...
Spotify 10000 Songs Dataset
kaggle.com
zip
Updated Apr 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeremy CTE (2024). Spotify 10000 Songs Dataset [Dataset]. https://www.kaggle.com/datasets/jeremycte/spotify-10000-songs-dataset/suggestions
Explore at:
zip(2714062 bytes)Available download formats
Dataset updated
Apr 7, 2024
Authors
Jeremy CTE
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context and Inspiration Your dataset creation is driven by the desire to understand the evolving landscape of music genres through the lens of data analysis. By examining the characteristics of songs across various genres, you aimed to investigate the boundaries that define these genres and explore the potential of machine learning models to classify songs in a manner akin to human perception. This exploration touches on the intersection of musicology and data science, aiming to reveal insights about the music we enjoy and how it's structured and perceived on a technical level. Through sentiment analysis, you're also diving into the emotional depth of music, connecting the sonic features of songs to the emotions conveyed in their lyrics, an ambitious and fascinating endeavor that bridges the gap between the quantitative and the qualitative aspects of music.

Spotify Dataset Columns Explanation

Under the data.csv, spotify_tracks_top50.csv - track_id: Unique identifier for each track on Spotify. - playlist_id: Unique identifier for the playlist from which the track was collected. - date_added: Date the track was added to the playlist. - track_name: Name of the track. - first_artist: Name of the primary artist of the track. - artist_id: Unique identifier for the primary artist on Spotify. - track_preview: URL for the 30-second preview mp3 of the track. - album_name: Name of the album the track is from. - danceability: A measure of how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. - energy: A perceptual measure of intensity and activity. - key: The key the track is in. Integers map to pitches using standard Pitch Class notation. - loudness: The overall loudness of a track in decibels (dB). - mode: Modality of the track (major or minor). - speechiness: Measures the presence of spoken words in a track. - acousticness: A confidence measure of whether the track is acoustic. - instrumentalness: Predicts whether a track contains no vocals. - liveness: Detects the presence of an audience in the recording. - valence: Measures the musical positiveness conveyed by a track. - tempo: The overall estimated tempo of a track in beats per minute (BPM). - type, id, uri, track_href, analysis_url: Various identifiers that provide detailed information about the track or facilitate accessing more data about the track through the Spotify Web API. - duration_ms: The duration of the track in milliseconds. - time_signature: An estimated overall time signature of a track.

Librosa Dataset Columns Explanation

MFCC1 - MFCC20: The first 20 Mel-frequency cepstral coefficients, which are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum").

Spectral Contrast Freq Band1 - Band7: Measures of the difference in amplitude between peaks and valleys in the spectrum. These bands capture the texture of the sound by examining the spectral shape.

Artist Data CSV Explanation

id: The Spotify ID for the artist.

name: The name of the artist.

genres: A list of genres the artist is known for.

popularity: A metric that ranks the artist's popularity with values from 0 to 100.

followers: The total number of followers the artist has on Spotify.

url: The Spotify URL for the artist's page.

Sources Link

Spotify Developer Link - https://developer.spotify.com/documentation/web-api/reference/get-an-artist

10000 Spotify Playlist - https://open.spotify.com/playlist/1YL4XoegERoragv0RK2RC9?si=569f0a0b1149489b

License

Refer to the Spotify License and Agreement Terms https://developer.spotify.com/terms
Data from: Most Streamed Spotify Songs 2023
kaggle.com
zip
Updated Aug 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nidula Elgiriyewithana ⚡ (2023). Most Streamed Spotify Songs 2023 [Dataset]. https://www.kaggle.com/datasets/nelgiriyewithana/top-spotify-songs-2023/discussion
Explore at:
zip(48187 bytes)Available download formats
Dataset updated
Aug 26, 2023
Authors
Nidula Elgiriyewithana ⚡
Description
Description :

This dataset contains a comprehensive list of the most famous songs of 2023 as listed on Spotify. The dataset offers a wealth of features beyond what is typically available in similar datasets. It provides insights into each song's attributes, popularity, and presence on various music platforms. The dataset includes information such as track name, artist(s) name, release date, Spotify playlists and charts, streaming statistics, Apple Music presence, Deezer presence, Shazam charts, and various audio features.

Here is the link for the 2024 data: "https://www.kaggle.com/datasets/nelgiriyewithana/most-streamed-spotify-songs-2024">Most Streamed Spotify Songs 2024 🟢

Key Features:

track_name: Name of the song

artist(s)_name: Name of the artist(s) of the song

artist_count: Number of artists contributing to the song

released_year: Year when the song was released

released_month: Month when the song was released

released_day: Day of the month when the song was released

in_spotify_playlists: Number of Spotify playlists the song is included in

in_spotify_charts: Presence and rank of the song on Spotify charts

streams: Total number of streams on Spotify

in_apple_playlists: Number of Apple Music playlists the song is included in

in_apple_charts: Presence and rank of the song on Apple Music charts

in_deezer_playlists: Number of Deezer playlists the song is included in

in_deezer_charts: Presence and rank of the song on Deezer charts

in_shazam_charts: Presence and rank of the song on Shazam charts

bpm: Beats per minute, a measure of song tempo

key: Key of the song

mode: Mode of the song (major or minor)

danceability_%: Percentage indicating how suitable the song is for dancing

valence_%: Positivity of the song's musical content

energy_%: Perceived energy level of the song

acousticness_%: Amount of acoustic sound in the song

instrumentalness_%: Amount of instrumental content in the song

liveness_%: Presence of live performance elements

speechiness_%: Amount of spoken words in the song

Potential Use Cases:

Music analysis: Explore patterns in audio features to understand trends and preferences in popular songs.

Platform comparison: Compare the song's popularity across different music platforms.

Artist impact: Analyze how artist involvement and attributes relate to a song's success.

Temporal trends: Identify any shifts in music attributes and preferences over time.

Cross-platform presence: Investigate how songs perform across different streaming services.

If you find this dataset useful, your support through an upvote would be greatly appreciated ❤️🙂
Thank you

Facebook

Twitter

Click to copy link

Link copied

Cite

Malinga Rajapaksha (2024). My Spotify Data - Cleaned [Dataset]. https://www.kaggle.com/datasets/malingarajapaksha/my-spotify-data-cleaned

My Spotify Data - Cleaned

cleaned version of spotify streaming history

Explore at:

zip(2952139 bytes)Available download formats

Dataset updated

Jan 26, 2024

Authors

Malinga Rajapaksha

Description

The dataset contains records of the user's Spotify streaming history, with each row representing a specific instance of a played track. The data includes various attributes providing insights into the user's music listening habits.

Columns:

ts (Timestamp):
- The timestamp when the track was played.
platform:
- The platform or device used for streaming (e.g., Windows 10).
ms_played:
- The duration in milliseconds of how long the track was played.
conn_country:
- The country code indicating the user's location during streaming (e.g., LK for Sri Lanka).
master_metadata_track_name:
- The name of the track played.
master_metadata_album_artist_name:
- The artist of the album to which the track belongs.
master_metadata_album_album_name:
- The name of the album containing the track.
spotify_track_uri:
- The unique Spotify URI for the track.
reason_start:
- The reason for starting the track (e.g., play button clicked).
reason_end:
- The reason for ending the track (e.g., track done).
shuffle:
- Indicates whether shuffle mode was enabled (True/False).
offline:
- Indicates whether the track was played offline (True/False).
offline_timestamp:
- Timestamp indicating when the track was played offline (if applicable).
incognito_mode:
- Indicates whether incognito mode was enabled (True/False).

Purpose:

This dataset is suitable for performing detailed Exploratory Data Analysis (EDA) to uncover patterns, trends, and insights into the user's music-listening behaviour. Potential analyses could include the distribution of listening durations, favourite artists and tracks, exploration of geographic listening patterns, and examination of usage patterns across different platforms.

Visualization tools such as Matplotlib and Seaborn could be utilized for a more in-depth analysis to create visual representations of the findings. This dataset aligns well with your interest in data science, offering opportunities to apply analytical techniques to real-world streaming data.

Clear search

Close search

Google apps

Main menu

My Spotify Data - Cleaned

The dataset contains records of the user's Spotify streaming history, with each row representing a specific instance of a played track. The data includes various attributes providing insights into the user's music listening habits.

Columns:

Purpose:

Spotify Dataset

My Spotify Data

Dataset

Contents

MY SPOTIFY WRAPPED EDA

Dataset

Contents

Spotify Songs for ML & Analysis (8700+ tracks)

🕹️ About Dataset

🎯 Context

📦 Content

🙏 Acknowledgements

💡 Inspiration

Cleaned Version

Cleaning Process (SQL)

🔍 Null Handling & Imputation

✨ Standardization

📅 Release Date Normalization

⏱ Duration Conversion

🎵 Genre Enrichment

🧹 Deduplication

Example Analysis

🤝 Contribute

Full Spotify Streaming History

Spotify top artists by monthly listeners

Eminem Album Trends

Content

Acknowledgements

Inspiration

Spotify Song Attributes

Context

Content

Acknowledgements

Inspiration

Billie Eilish Spotify Analysis

Billie Eilish Spotify Analysis

Investigating Popularity, Danceability, and Lyrics

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Spotify Dataset 1921-2020, 600k+ Tracks

About

Meta-information

Disclaimer

Spotify Most Popular Songs Dataset

Spotify Most Popular Songs Dataset 🎵

Overview:

Use Cases 🚀:

Acknowledgments:

Spotify Playlist Analysis Datasets

Spotify Dataset 2023

Streaming Activity Dataset

Streaming Activity Dataset

4 years of diverse music streaming data across platforms and ages

About this dataset

How to use the dataset

Understanding the Columns

Research Ideas

Spotify dataset - A.R.Rahman

Spotify Customer Churn dataset

Spotify Tracks Genre

Spotify Tracks Genre

Audio features of tracks across diverse genres

About this dataset

How to use the dataset

Research Ideas

Spotify 10000 Songs Dataset

Spotify Dataset Columns Explanation

Librosa Dataset Columns Explanation

Artist Data CSV Explanation