36 datasets found
  1. My Spotify Data - Cleaned

    • kaggle.com
    zip
    Updated Jan 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malinga Rajapaksha (2024). My Spotify Data - Cleaned [Dataset]. https://www.kaggle.com/datasets/malingarajapaksha/my-spotify-data-cleaned
    Explore at:
    zip(2952139 bytes)Available download formats
    Dataset updated
    Jan 26, 2024
    Authors
    Malinga Rajapaksha
    Description

    The dataset contains records of the user's Spotify streaming history, with each row representing a specific instance of a played track. The data includes various attributes providing insights into the user's music listening habits.

    Columns:

    1. ts (Timestamp):

      • The timestamp when the track was played.
    2. platform:

      • The platform or device used for streaming (e.g., Windows 10).
    3. ms_played:

      • The duration in milliseconds of how long the track was played.
    4. conn_country:

      • The country code indicating the user's location during streaming (e.g., LK for Sri Lanka).
    5. master_metadata_track_name:

      • The name of the track played.
    6. master_metadata_album_artist_name:

      • The artist of the album to which the track belongs.
    7. master_metadata_album_album_name:

      • The name of the album containing the track.
    8. spotify_track_uri:

      • The unique Spotify URI for the track.
    9. reason_start:

      • The reason for starting the track (e.g., play button clicked).
    10. reason_end:

      • The reason for ending the track (e.g., track done).
    11. shuffle:

      • Indicates whether shuffle mode was enabled (True/False).
    12. offline:

      • Indicates whether the track was played offline (True/False).
    13. offline_timestamp:

      • Timestamp indicating when the track was played offline (if applicable).
    14. incognito_mode:

      • Indicates whether incognito mode was enabled (True/False).

    Purpose:

    This dataset is suitable for performing detailed Exploratory Data Analysis (EDA) to uncover patterns, trends, and insights into the user's music-listening behaviour. Potential analyses could include the distribution of listening durations, favourite artists and tracks, exploration of geographic listening patterns, and examination of usage patterns across different platforms.

    Visualization tools such as Matplotlib and Seaborn could be utilized for a more in-depth analysis to create visual representations of the findings. This dataset aligns well with your interest in data science, offering opportunities to apply analytical techniques to real-world streaming data.

  2. Spotify Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated Apr 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Spotify Dataset [Dataset]. https://brightdata.com/products/datasets/spotify
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Apr 10, 2024
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Gain valuable insights into music trends, artist popularity, and streaming analytics with our comprehensive Spotify Dataset. Designed for music analysts, marketers, and businesses, this dataset provides structured and reliable data from Spotify to enhance market research, content strategy, and audience engagement.

    Dataset Features

    Track Information: Access detailed data on songs, including track name, artist, album, genre, and release date. Streaming Popularity: Extract track popularity scores, listener engagement metrics, and ranking trends. Artist & Album Insights: Analyze artist performance, album releases, and genre trends over time. Related Searches & Recommendations: Track related search terms and suggested content for deeper audience insights. Historical & Real-Time Data: Retrieve historical streaming data or access continuously updated records for real-time trend analysis.

    Customizable Subsets for Specific Needs Our Spotify Dataset is fully customizable, allowing you to filter data based on track popularity, artist, genre, release date, or listener engagement. Whether you need broad coverage for industry analysis or focused data for content optimization, we tailor the dataset to your needs.

    Popular Use Cases

    Market Analysis & Trend Forecasting: Identify emerging music trends, genre popularity, and listener preferences. Artist & Label Performance Tracking: Monitor artist rankings, album success, and audience engagement. Competitive Intelligence: Analyze competitor music strategies, playlist placements, and streaming performance. AI & Machine Learning Applications: Use structured music data to train AI models for recommendation engines, playlist curation, and predictive analytics. Advertising & Sponsorship Insights: Identify high-performing tracks and artists for targeted advertising and sponsorship opportunities.

    Whether you're optimizing music marketing, analyzing streaming trends, or enhancing content strategies, our Spotify Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.

  3. My Spotify Data

    • kaggle.com
    zip
    Updated Dec 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nilay Gaitonde (2021). My Spotify Data [Dataset]. https://www.kaggle.com/nilaygaitonde/my-spotify-data
    Explore at:
    zip(92508 bytes)Available download formats
    Dataset updated
    Dec 15, 2021
    Authors
    Nilay Gaitonde
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Nilay Gaitonde

    Released under CC0: Public Domain

    Contents

  4. MY SPOTIFY WRAPPED EDA

    • kaggle.com
    zip
    Updated Jan 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Imraan Virani (2022). MY SPOTIFY WRAPPED EDA [Dataset]. https://www.kaggle.com/imraanvirani/my-spotify-wrapped-eda
    Explore at:
    zip(60516 bytes)Available download formats
    Dataset updated
    Jan 1, 2022
    Authors
    Imraan Virani
    Description

    Dataset

    This dataset was created by Imraan Virani

    Contents

  5. Spotify Songs for ML & Analysis (8700+ tracks)

    • kaggle.com
    zip
    Updated Nov 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AlyAhmedTS13 (2025). Spotify Songs for ML & Analysis (8700+ tracks) [Dataset]. https://www.kaggle.com/datasets/alyahmedts13/spotify-songs-for-ml-and-analysis-over-8700-tracks
    Explore at:
    zip(1289021 bytes)Available download formats
    Dataset updated
    Nov 6, 2025
    Authors
    AlyAhmedTS13
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🕹️ About Dataset

    🎯 Context

    What makes a song popular on Spotify?
    Do artist popularity and follower count influence track success more than audio features?
    How do album types and release dates shape listening trends?

    These were the questions that inspired me to build this dataset.

    Using Spotify’s API, I collected data on over 8,700 tracks, capturing detailed metadata about songs, artists, and albums. This dataset is ideal for exploring the intersection of music analytics, artist influence, and streaming behavior.

    📦 Content

    This dataset contains one CSV file with over 8,700 rows. Each row represents a unique track and includes metadata across three dimensions: track, artist, and album.

    Column NameDescription
    track_idUnique identifier for the track
    track_numberTrack’s position on the album
    track_popularitySpotify popularity score (0–100)
    track_duration_msDuration of the track in milliseconds
    explicitWhether the track contains explicit content
    artist_nameName of the performing artist
    artist_popularitySpotify popularity score for the artist
    artist_followersNumber of Spotify followers for the artist
    album_idUnique identifier for the album
    album_nameName of the album
    album_release_dateOriginal release date of the album
    artist_genresGenre tags associated with the artist
    album_total_tracksTotal number of tracks on the album
    album_typeType of album (e.g., album, single, compilation)

    🙏 Acknowledgements

    All data was collected using the Spotify Web API.
    This dataset is intended for educational and research purposes only.

    💡 Inspiration

    You can use this dataset to:

    • Analyze which artist traits correlate with track popularity
    • Explore genre trends across different album types and release years
    • Build machine learning models to predict song success
    • Visualize music trends using Power BI or Python
    • Compare artists, albums, or genres based on metadata

    Cleaned Version

    A cleaned version of the dataset (spotify_data_clean.csv) is now available. It includes:

    Cleaning Process (SQL)

    The cleaned dataset (spotify_data_clean.csv) was generated through a multi-step SQL pipeline designed to ensure consistency, completeness, and usability for analysis. Below is a summary of the transformations applied:

    🔍 Null Handling & Imputation

    • Identified and removed rows with missing track_name.
    • Imputed missing artist_name, artist_popularity, artist_followers, and artist_genres using album-level joins (e.g., for albums like 1989).
    • Replaced remaining nulls with default values:
      • 'N/A' for strings
      • 0 for numeric fields
      • '[]' for genre arrays (temporary placeholder)

    ✨ Standardization

    • Trimmed whitespace from key fields: track_name, artist_name, album_name, album_type, explicit.
    • Converted explicit values to uppercase (TRUE / FALSE).
    • Cleaned artist_genres using regex to remove brackets and quotes.

    📅 Release Date Normalization

    • For year-only dates (e.g., 2020), appended -06-30 to estimate mid-year.
    • For year-month formats (e.g., 2020-07), appended -01 to complete the date.
    • Converted all dates to DATE format using STR_TO_DATE().

    ⏱ Duration Conversion

    • Added a new column track_duration_min by converting track_duration_ms to minutes.
    • Dropped the original track_duration_ms column after conversion.

    🎵 Genre Enrichment

    • Populated missing artist_genres for well-known artists using manual overrides:
      • Taylor Swift: country, pop, indie, folk
      • Olivia Rodrigo: pop rock, alternative pop, pop punk
      • Billie Eilish: alternative pop, electropop, dark pop
      • (...and more for 10+ artists)
    • Remaining empty genres were replaced with 'N/A'.

    🧹 Deduplication

    • Used ROW_NUMBER() over track_name, artist_name, album_name, and album_release_date to identify duplicates.
    • Removed duplicate rows and dropped the helper row_num column.

    This SQL workflow ensures the dataset is clean, consistent, and ready for exploratory data analysis, genre modeling, and public sharing. All transformations were verified using sample queries and profiling tools.

    Example Analysis

    Explore genre trends and usage patterns in this companion notebook:
    👉 Top Genres Using Pandas

    🤝 Contribute

    Feel free to fork the dataset or share your analyses!
    If you clean, enrich, or expand the dataset, contributions are always welcome.

  6. Full Spotify Streaming History

    • kaggle.com
    zip
    Updated Feb 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anshul Raj Verma (2025). Full Spotify Streaming History [Dataset]. https://www.kaggle.com/datasets/arvanshul/full-spotify-streaming-history
    Explore at:
    zip(1953408 bytes)Available download formats
    Dataset updated
    Feb 22, 2025
    Authors
    Anshul Raj Verma
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains my all Streaming History of Spotify yet from June-2020 to February-2025.

    I have requested this dataset from Spotify itself.

    Use case of the Dataset:

    • Analyse user's listening behaviour and extract interesting insights from it.
    • You can practice SQL queries on this dataset extensively.
    • It has datetime column so you can also perform some time series analysis (at some extent).
  7. Spotify top artists by monthly listeners

    • kaggle.com
    zip
    Updated Sep 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    meer atif magsi (2023). Spotify top artists by monthly listeners [Dataset]. https://www.kaggle.com/datasets/meeratif/spotify-top-artists-by-monthly-listeners
    Explore at:
    zip(59910 bytes)Available download formats
    Dataset updated
    Sep 10, 2023
    Authors
    meer atif magsi
    Description

    Welcome to the "Spotify Top Artists by Monthly Listeners" dataset! Dive into the world of music streaming with this comprehensive collection of data, which offers valuable insights into the artists dominating the digital airwaves on Spotify, one of the world's leading music streaming platforms.

    Columns:

    Artist: The "Artist" column contains the names or unique identifiers of musical artists. Each row in this column represents a specific artist. This column serves as the primary identifier for the artists in the dataset.

    Listeners: The "Listeners" column represents the number of listeners or fans for each artist. It quantifies the artist's fan base or the total number of people who have engaged with their music.

    Daily Trend: The "Daily Trend" column contains data that reflects the daily trend or change in popularity of each artist. It may include metrics or values indicating whether an artist is gaining or losing listeners on a daily basis. This metric helps to track an artist's current momentum and popularity.

    Peak: The "Peak" column signifies the highest point of popularity or the peak level of engagement that an artist has achieved within a specific timeframe. It provides insights into the artist's historical performance and when they were most widely appreciated.

    PkListeners: The "PkListeners" column represents the number of listeners an artist had at the peak of their popularity. This metric offers a specific quantitative measure of an artist's highest level of engagement with their audience.

    This dataset is a goldmine for music enthusiasts, data analysts, and researchers eager to explore the dynamics of popularity and musical diversity on Spotify. It provides a rich source of information for tracking artist trends, analyzing genre preferences, and gaining a deeper understanding of the global music landscape.

    Whether you're interested in uncovering emerging artists, studying the impact of genres, or simply exploring the musical tastes of Spotify's user base, this dataset offers a robust foundation for insightful analyses and engaging visualizations.

    Join us on a data-driven journey through the Spotify music ecosystem as we uncover the artists captivating the ears of millions of listeners, and let the data guide your exploration of the vibrant and ever-evolving world of music. Happy analyzing!

  8. Eminem Album Trends

    • kaggle.com
    zip
    Updated May 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaivalya Powale (2020). Eminem Album Trends [Dataset]. https://www.kaggle.com/kaivalyapowale/eminem-album-trends
    Explore at:
    zip(15869 bytes)Available download formats
    Dataset updated
    May 23, 2020
    Authors
    Kaivalya Powale
    Description

    Eminem is one of the most influential hip-hop artists of all time, and the Rap God. I acquired this data using Spotify APs and supplemented it with other research to add to my own analysis. You can find my original analysis here: https://kaivalyapowale.com/2020/01/25/eminems-album-trends-and-music-to-be-murdered-by-2020/

    My analysis was also published by top hip-hop websites: HipHop 24x7 - Data analysis reveals M2BMB is the most negative album Eminem Pro - Album's data analysis Eminem Pro - Eminem's albums are getting shorter

    You can also check out visualizations on Tableau Public for some ideas: https://public.tableau.com/profile/kaivalya.powale#!/

    Content

    I have primarily used data from Spotify’s API using multiple endpoints for albums and tracks. I supplemented the data with stats from Billboard and calculations from this post.

    Here's the explanation for all the audio features provided by Spotify!

    I have researched data about album sales from multiple sources online. They are cited in my original analysis.

    Acknowledgements

    Here are the Spotify's Album endpoints. Charts data from Billboard. Swear data from this source.

    Inspiration

    I'd love to see new visualizations using this data or using the sales, swear, or duration for an analysis. It would be wonderful if someone compares this with other hip-hop greats.

  9. Spotify Song Attributes

    • kaggle.com
    zip
    Updated Aug 4, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GeorgeMcIntire (2017). Spotify Song Attributes [Dataset]. https://www.kaggle.com/forums/f/5360/spotify-song-attributes
    Explore at:
    zip(100786 bytes)Available download formats
    Dataset updated
    Aug 4, 2017
    Authors
    GeorgeMcIntire
    Description

    Context

    A dataset of 2017 songs with attributes from Spotify's API. Each song is labeled "1" meaning I like it and "0" for songs I don't like. I used this to data to see if I could build a classifier that could predict whether or not I would like a song.

    I wrote an article about the project I used this data for. It includes code on how to grab this data from the Spotipy API wrapper and the methods behind my modeling. https://opendatascience.com/blog/a-machine-learning-deep-dive-into-my-spotify-data/

    Content

    Each row represents a song.

    There are 16 columns. 13 of which are song attributes, one column for song name, one for artist, and a column called "target" which is the label for the song.

    Here are the 13 track attributes: acousticness, danceability, duration_ms, energy, instrumentalness, key, liveness, loudness, mode, speechiness, tempo, time_signature, valence.

    Information on what those traits mean can be found here: https://developer.spotify.com/web-api/get-audio-features/

    Acknowledgements

    I would like to thank Spotify for providing this readily accessible data.

    Inspiration

    I'm a music lover who's curious about why I love the music that I love.

  10. Billie Eilish Spotify Analysis

    • kaggle.com
    zip
    Updated Jan 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Billie Eilish Spotify Analysis [Dataset]. https://www.kaggle.com/datasets/thedevastator/billie-eilish-spotify-analysis
    Explore at:
    zip(37509 bytes)Available download formats
    Dataset updated
    Jan 21, 2023
    Authors
    The Devastator
    Description

    Billie Eilish Spotify Analysis

    Investigating Popularity, Danceability, and Lyrics

    By Priyanka Dobhal [source]

    About this dataset

    This dataset contains information about the music of Billie Eilish on Spotify, including track name, acousticness, danceability, energy, instrumentalness, liveness, loudness, speechiness and tempo. It also includes data about the popularity of each song and the artist behind it. Each song is also uniquely identified using its URI. This dataset gives us insight into what characteristics make up Billie Eilish's music and how popular her songs are. With this dataset we can analyse what factors influence a song's popularity to better understand why some songs become hits while others don't get as much attention. We can also compare the features of her music to other artists' songs in order to find similarities and differences between them both in sound style and how much people listen to them

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains information about Billie Eilish's music on Spotify, including track name, acousticness, danceability, energy, instrumentalness, liveness, loudness, speechiness

    Research Ideas

    • Analyzing the effect of musical attributes (e.g. acousticness, danceability, energy, etc.) on listeners' engagement with a specific artist's music.
    • Exploring the relationship between lyrical content and popularity of an artist's songs to discover potential trends in songwriting approaches that increase or decrease a song's chances of success.
    • Finding correlations between lyrical and musical elements to gain insights into popular music trends over time or within Billie Eilish’s discography specifically

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

    Columns

    File: Billie_Eilish_Spotify.csv | Column name | Description | |:---------------------|:--------------------------------------------------------------| | album | The name of the album the song is from. (String) | | track_number | The track number of the song on the album. (Integer) | | uri | The unique identifier for the song. (String) | | acousticness | A measure of how acoustic the song is. (Float) | | danceability | A measure of how suitable a song is for dancing. (Float) | | energy | A measure of the intensity and activity of a song. (Float) | | instrumentalness | A measure of how much of the song is instrumental. (Float) | | liveness | A measure of how much the song was performed live. (Float) | | loudness | A measure of the volume of the song. (Float) | | speechiness | A measure of how much the song contains spoken words. (Float) | | tempo | The speed of the song. (Float) | | valence | A measure of the positivity of the song. (Float) | | popularity | A measure of how popular the song is. (Integer) | | artist | The artist who produced and performs the song. (String) |

    File: Billie_Eilish_Lyrics_to_words.csv | Column name | Description | |:-----------------|:--------------------------------------------------------| | album | The name of the album the song is from. (String) | | track_number | The track number of the song on the album. (Integer) | | uri | The unique identifier for the song. (String) | | artist | The artist who produced and performs the song. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original a...

  11. Spotify Dataset 1921-2020, 600k+ Tracks

    • kaggle.com
    zip
    Updated Mar 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yamac Eren Ay (2022). Spotify Dataset 1921-2020, 600k+ Tracks [Dataset]. https://www.kaggle.com/datasets/yamaerenay/spotify-dataset-19212020-600k-tracks
    Explore at:
    zip(201984462 bytes)Available download formats
    Dataset updated
    Mar 13, 2022
    Authors
    Yamac Eren Ay
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    About

    For more in-depth information about audio features provided by Spotify: https://developer.spotify.com/documentation/web-api/reference/#/operations/get-audio-features

    I reposted my old dataset as many people requested. I don't consider updating the dataset further.

    Meta-information

    Title: Spotify Dataset 1921-2020, 600k+ Tracks Subtitle: Audio features of 600k+ tracks, popularity metrics of 1M+ artists Source: Spotify Web API Creator: Yamac Eren Ay Release Date (of Last Version): April 2021 Link to this dataset: https://www.kaggle.com/yamaerenay/spotify-dataset-19212020-600k-tracks Link to the old dataset: https://www.kaggle.com/yamaerenay/spotify-dataset-1921-2020-160k-tracks

    Disclaimer

    I am not posting here third-party Spotify data for arbitrary reasons or getting upvote.

    The old dataset has been mentioned in tens of scientific papers using the old link which doesn't work anymore since July 2021, and most of the authors had some problems proving the validity of the dataset. You can cite the same dataset under the new link. I'll be posting more information regarding the old dataset.

    If you have inquiries or complaints, please don't hesitate to reach out to me on LinkedIn or you can send me an email.

  12. Spotify Most Popular Songs Dataset

    • kaggle.com
    zip
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RishabhPancholi1302 (2025). Spotify Most Popular Songs Dataset [Dataset]. https://www.kaggle.com/datasets/rishabhpancholi1302/spotify-most-popular-songs-dataset
    Explore at:
    zip(3707341 bytes)Available download formats
    Dataset updated
    Feb 21, 2025
    Authors
    RishabhPancholi1302
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Spotify Most Popular Songs Dataset 🎵

    Overview:

    This dataset contains a collection of the most popular songs on Spotify, along with various attributes that can be used for music analysis and recommendation systems. It includes audio features, lyrical details, and general metadata about each track, making it an excellent resource for machine learning, data science, and music analytics projects.

    Each song in the dataset includes the following features:

    🎧 Audio Features (Extracted from Spotify API):

    1. - Danceability – How suitable a track is for dancing (0.0 – 1.0).
    2. - Energy – Intensity and activity level of a song (0.0 – 1.0).
    3. - Loudness – Overall loudness in decibels (dB).
    4. - Speechiness – Presence of spoken words in the track (0.0 – 1.0).
    5. - Acousticness – Probability that a track is acoustic (0.0 – 1.0).
    6. - Instrumentalness – Predicts if a track is instrumental (0.0 – 1.0).
    7. - Liveness – Probability of a live audience (0.0 – 1.0).
    8. - Valence – Musical positivity or happiness (0.0 – 1.0).
    9. - Tempo – Beats per minute (BPM) of the track.
    10. - Key & Mode – Musical key and mode (major/minor).

    📝 Lyrics-Based Features:

    1. - Lyrics Text – Full lyrics of the song (if available).

    🎶 General Song Information:

    1. - Track Name – Name of the song.
    2. - Artist(s) – Performing artist(s).
    3. - Album Name – Album the track belongs to.
    4. - Release Year – Year when the song was released.
    5. - Genre – Song’s primary genre classification.
    6. - Popularity Score – Spotify popularity metric (0 – 1).

    Use Cases 🚀:

    This dataset is ideal for:

    1. - Music Recommendation Systems – Build collaborative or content-based recommenders.
    2. - Audio Feature Analysis – Discover trends in song characteristics.
    3. - Sentiment Analysis – Study how song lyrics relate to emotions.
    4. - Hit Song Prediction – Use machine learning to predict song popularity.
    5. - Music Genre Classification – Train classifiers to categorize music.

    Acknowledgments:

    Data collected using the Spotify API and other sources. If you use this dataset, consider crediting it in your projects!

  13. Spotify Playlist Analysis Datasets

    • kaggle.com
    zip
    Updated Jun 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mattia Girolami (2023). Spotify Playlist Analysis Datasets [Dataset]. https://www.kaggle.com/datasets/mattiagirolami/spotify-playlist-analisys-datasets
    Explore at:
    zip(748016 bytes)Available download formats
    Dataset updated
    Jun 27, 2023
    Authors
    Mattia Girolami
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The Spotify Playlist Analysis Dataset is a collection of 3,232 different songs obtained from a personal music library and has been uploaded to Kaggle for analysis and exploration. The dataset offers a comprehensive overview of various tracks, enabling researchers, music enthusiasts, and data scientists to gain insights into the musical preferences and characteristics present in the library.

  14. Spotify Dataset 2023

    • kaggle.com
    zip
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tony Gordon Jr. (2023). Spotify Dataset 2023 [Dataset]. https://www.kaggle.com/datasets/tonygordonjr/spotify-dataset-2023/code
    Explore at:
    zip(101062584 bytes)Available download formats
    Dataset updated
    Dec 20, 2023
    Authors
    Tony Gordon Jr.
    Description

    I've been diving into the vibrant world of data for a solid two years, and guess what? I'm finally cracking the code on what it takes to soar in this industry! Early in my data adventures, I was like a kid on Limewire when I found Kaggle, downloading everything that caught my eye. But then, I stumbled upon Spotify's data and... let's just say, it was a bit of a reality check.

    I found myself wrestling with duplicate records, scratching my head over inconsistent schemas, and feeling lost in the sauce without any guides. That experience was a game-changer for me. I made a promise to my future self: “When you've got the skills, create a dataset that's not just good, but legendary.” That time has come!

    Introducing my unique Spotify dataset – a crystal clear reflection of dedication and clarity. What makes this set stand out? You're not just getting data; you're getting a story. You can literally trace my steps, unraveling the magic behind each table through my script on Github. It's like having a backstage pass to a data concert! (Yes, Swifties will love this dataset too 😉)

    I'm all about transparency, and I believe it's the key to trust. With this dataset, I'm laying it all out there – no smoke and mirrors, just pure, unadulterated, CLEAN data. I want you to feel the same excitement I do when data just clicks into place. I encourage you all to checkout the Github repo I linked above to see how this dataset came to life!

    If you have any questions, suggestions or simply want to network, reach out to me on LinkedIn

    This dataset is created using data sourced from Spotify and adheres to their Terms of Use. The dataset is intended for non-commercial, academic purposes and does not infringe upon Spotify's intellectual property rights. For full details on Spotify's terms, please visit Spotify's Terms and Conditions of Use.

    You can find documentation for Spotifys Web APIs here

    As of 12/20/2023, this is V1 of my data and I'll most likely release a few more versions after working through kinks from former releases.

    Other Datasets: - Zillow

  15. Streaming Activity Dataset

    • kaggle.com
    zip
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Streaming Activity Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/streaming-activity-dataset/code
    Explore at:
    zip(3586470 bytes)Available download formats
    Dataset updated
    Dec 4, 2023
    Authors
    The Devastator
    Description

    Streaming Activity Dataset

    4 years of diverse music streaming data across platforms and ages

    By Sean Miller [source]

    About this dataset

    The dataset consists of two main files: Scrobble_Features.csv and My Streaming Activity.csv. The Scrobble_Features.csv file contains detailed information about the music tracks, including genre, duration, popularity, and various audio features. On the other hand, the My Streaming Activity.csv file offers 4 years' worth of music streaming data from multiple platforms.

    Key columns in these files include: - Performer: The name of the performer or artist. - Song: The title of the song. - Album: The name of the album that each song belongs to. - spotify_genre: The genre(s) assigned to each song according to Spotify's classification. - spotify_track_preview_url: URLs providing previews for each song on Spotify. - spotify_track_duration_ms: The duration of each song in milliseconds. - spotify_track_popularity: A popularity score indicating how popular each track is on Spotify. - spotify_track_explicit: A boolean value indicating whether or not a track contains explicit content.

    Further musical attributes are also included: - danceability: A measure determining how suitable a song is for dancing based on various musical elements. - energy: An indicator measuring the intensity and activity level present in a song's composition. - key: Identifies the key signature (e.g., C major) that each track is performed in - loudness: Reveals how loud or soft a given track is overall in decibels (dB). - mode : Indicates whether a given track is composed in major or minor scale/mode. These attributes aim to provide insights into different aspects of a song's overall composition and impact.

    Additionally, this dataset offers information about the timestamps when streaming activities occurred in both Central Time Zone (TimeStamp_Central) and Coordinated Universal Time (UTC) (TimeStamp_UTC).

    How to use the dataset

    In this guide, we will walk you through how to effectively use this dataset for your analysis or projects. Let's get started!

    Understanding the Columns

    Before diving into analyzing the data, let's understand the meaning of each column in the dataset:

    • Performer: The name of the performer or artist of the song.
    • Song: The title of the song.
    • spotify_genre: The genre(s) of the song according to Spotify.
    • spotify_track_preview_url: The URL of a preview of the song on Spotify.
    • spotify_track_duration_ms: The duration of the song in milliseconds.
    • spotify_track_popularity: The popularity score of the song on Spotify. (Numeric/Integer)
    • spotify_track_explicit: Indicates whether the song contains explicit content. (Boolean)
    • danceability: A measure of how suitable a song is for dancing based on a combination of musical elements. (Numeric/Float)
    • energy: A measure o fthe intensity and activity level present in a track.(Alternatively it can also represent acoustic as well). (Numeric/Float)
      • 'key'- represents grouping.of songs based on keys found within that specific set pf songs
      • 'loundess' represents how loud or.silent that particular tract is usually defines by Clown Circle Diameter'.(diameter varies with loudness(sound pressure level). -'mode':defines what type/modeis represented(i.e If Major mode denoted by '1',If minor mood is denoted.by value '0') -'Speechiness':Detecting spoken words(actually presence/removal of spoken dialects.song verses). -Acousticness:Probability of track being acoustic,concerted,edt. -instrumentalness-instrumental.also calcylates effectively considering odds and ends ( for example; Intensity of beat.Solo drumming. -'liveness':a sentiment reflecting the probability that a song was performed since the recording being analysed 'valence'-The musical positivity/cheerfulness conveyed by a track.'1'represents most positive ;'0'mostly one(most presumably sad) -tempo:'Rate at which particular beats re occur in.oncluding beats); BPM (

    Research Ideas

    • Music Recommendation System: This dataset can be used to develop a music recommendation system by analyzing the streaming activity and audio features of different songs. By understanding the preferences and listening habits of users, personalized music recommendations can be generated for individuals or households.
    • Genre Analysis and Trends: The dataset provides information about the performer, genre, and popularity of songs. This data can be utilized to analyze trends in music genres over the years, identify popular artists in different genres, and understand the ...
  16. Spotify dataset - A.R.Rahman

    • kaggle.com
    zip
    Updated Oct 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarun Sirpi (2024). Spotify dataset - A.R.Rahman [Dataset]. https://www.kaggle.com/datasets/tarunsirpi/spotify-dataset-a-r-rahman
    Explore at:
    zip(163610 bytes)Available download formats
    Dataset updated
    Oct 21, 2024
    Authors
    Tarun Sirpi
    Description

    This dataset aims to get data from the Spotify API and do a data analysis on the acquired data. The inspiration behind this is to analyse the tracks of my favourite artist (A.R.Rahman) , which is done using the features provided by Spotify for each track.

    The dataset is created using python by utilizing the the spotipy module to get data from the Spotify API and process the data using the pandas module.

    For more information and source code, please refer the following Github repository link:

    https://github.com/tarunsirpi/spotify-data-analysis

  17. Spotify Customer Churn dataset

    • kaggle.com
    zip
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdul Wadood (2025). Spotify Customer Churn dataset [Dataset]. https://www.kaggle.com/abdulwadood11220/spotify-customer-churn-dataset
    Explore at:
    zip(11039 bytes)Available download formats
    Dataset updated
    Jul 2, 2025
    Authors
    Abdul Wadood
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    🎧 Can you predict who’s about to hit “unsubscribe”?

    This dynamic Spotify churn dataset gives you the chance to find out. Packed with real-world user behavior — from listening time, skips, and ads seen to premium usage, plan types, and login activity — this dataset is your backstage pass into how users interact with a top music streaming platform.

    With key demographic info (age, gender, country) and a clear churn indicator, it’s perfect for building powerful machine learning models, uncovering retention trends, or exploring the secret sauce behind loyal listeners.

    Whether you're aiming to reduce churn, boost engagement, or showcase your data science skills, this dataset hits all the right notes. 🎶 Ready to turn data into decisions? Hit play.

  18. Spotify Tracks Genre

    • kaggle.com
    zip
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Spotify Tracks Genre [Dataset]. https://www.kaggle.com/thedevastator/spotify-tracks-genre-dataset
    Explore at:
    zip(8571539 bytes)Available download formats
    Dataset updated
    Nov 30, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Spotify Tracks Genre

    Audio features of tracks across diverse genres

    By maharshipandya (From Huggingface) [source]

    About this dataset

    This dataset provides comprehensive information about Spotify tracks encompassing a diverse collection of 125 genres. It has been compiled and cleaned using Spotify's Web API and Python. Presented in CSV format, this dataset is easily accessible and amenable to analysis. The dataset comprises multiple columns, each representing distinctive audio features associated with individual tracks.

    The columns include: artists (the name of the artist or artists who performed the track), album_name (the title of the album to which the track belongs), track_name (the specific name of each track), popularity (a numerical score indicating the popularity of a song on Spotify ranging from 0 to 100), duration_ms (the duration of each track measured in milliseconds), explicit (a boolean value denoting whether a song contains explicit content or not).

    Furthermore, there are various audio features that provide deep insights into the musical characteristics of each track. These features include danceability, energy, key, loudness, mode, speechiness (indicating whether spoken words are present in a song), acousticness (measuring how much a song leans towards acoustic sounds rather than electric ones), instrumentalness (indicating how likely it is for a song to be instrumental rather than vocal-oriented).

    Additional audio attributes encompass liveness, reflecting the presence or absence of live audience elements within tracks; valence quantifying musical positiveness conveyed by a song; tempo denoting beats per minute; and time_signature revealing details about bar structures within tracks.

    The dataset enables users to discern patterns across multiple genres while also facilitating genre prediction based on perceptible audio nuances derived through machine learning models.

    Aspiring audiophiles, music enthusiasts,and data scientists can effectively harness this repository for research purposes—fostering extensive exploration into genre dynamics and comprehending nuanced relationships between various musical attributes featured in these Spotify masterpieces

    How to use the dataset

    • Introduction:

    • Download and Load the Dataset: Start by downloading the dataset from Kaggle in CSV format. Once downloaded, load the dataset into your preferred programming environment or tool such as Python, R, or Excel.

    • Familiarize Yourself with the Columns: Take some time to understand the meaning of each column in the dataset:

      • artists: The name of the artist(s) who performed the track.
      • album_name: The name of at album that contains a given track.
      • track_name: The name of a specific track.
      • popularity: A score indicating how popular a track is on Spotify (ranging from 0 to 100).
      • duration_ms: The duration of a track in milliseconds.
      • explicit: Indicates whether a track contains explicit content (True or False).
    • Explore Audio Features: This dataset includes various audio features associated with each track. Here are some notable ones:

      A. Danceability: Danceability measures how suitable a track is for dancing, ranging from 0 to 1. Tracks with high danceability scores are more energetic and rhythmic, making them ideal for dancing.

      B. Energy: Energy represents intensity and activity within a song on a scale from 0 to 1. Tracks with high energy tend to be more fast-paced and intense.

      C.Loudness: Loudness indicates how loud or quiet an entire song is in decibels (dB). Positive values represent louder songs while negative values suggest quieter ones.

      D.Key: Key refers to different musical keys assigned integers ranging from 0-11, with each number representing a different key. Knowing the key can provide insights into the mood and tone of a song.

      E.Valence: Valence measures the musical positiveness conveyed by a track, ranging from 0 to 1. High valence values indicate more positive or happy tracks, while lower values suggest more negative or sad ones.

      F.Tempo: Tempo is the speed or pace of a song in beats per minute (BPM). It gives an idea about how fast or slow a track is.

    • Data Analysis and Visualization: Utilize various data analysis techniques and visualization tools to gain insights into the

    Research Ideas

    • Music Recommendation System: With multiple audio features such as danceability, energy, and valence, this dataset can be used to bu...
  19. Spotify 10000 Songs Dataset

    • kaggle.com
    zip
    Updated Apr 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeremy CTE (2024). Spotify 10000 Songs Dataset [Dataset]. https://www.kaggle.com/datasets/jeremycte/spotify-10000-songs-dataset/suggestions
    Explore at:
    zip(2714062 bytes)Available download formats
    Dataset updated
    Apr 7, 2024
    Authors
    Jeremy CTE
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context and Inspiration Your dataset creation is driven by the desire to understand the evolving landscape of music genres through the lens of data analysis. By examining the characteristics of songs across various genres, you aimed to investigate the boundaries that define these genres and explore the potential of machine learning models to classify songs in a manner akin to human perception. This exploration touches on the intersection of musicology and data science, aiming to reveal insights about the music we enjoy and how it's structured and perceived on a technical level. Through sentiment analysis, you're also diving into the emotional depth of music, connecting the sonic features of songs to the emotions conveyed in their lyrics, an ambitious and fascinating endeavor that bridges the gap between the quantitative and the qualitative aspects of music.

    Spotify Dataset Columns Explanation

    Under the data.csv, spotify_tracks_top50.csv - track_id: Unique identifier for each track on Spotify. - playlist_id: Unique identifier for the playlist from which the track was collected. - date_added: Date the track was added to the playlist. - track_name: Name of the track. - first_artist: Name of the primary artist of the track. - artist_id: Unique identifier for the primary artist on Spotify. - track_preview: URL for the 30-second preview mp3 of the track. - album_name: Name of the album the track is from. - danceability: A measure of how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. - energy: A perceptual measure of intensity and activity. - key: The key the track is in. Integers map to pitches using standard Pitch Class notation. - loudness: The overall loudness of a track in decibels (dB). - mode: Modality of the track (major or minor). - speechiness: Measures the presence of spoken words in a track. - acousticness: A confidence measure of whether the track is acoustic. - instrumentalness: Predicts whether a track contains no vocals. - liveness: Detects the presence of an audience in the recording. - valence: Measures the musical positiveness conveyed by a track. - tempo: The overall estimated tempo of a track in beats per minute (BPM). - type, id, uri, track_href, analysis_url: Various identifiers that provide detailed information about the track or facilitate accessing more data about the track through the Spotify Web API. - duration_ms: The duration of the track in milliseconds. - time_signature: An estimated overall time signature of a track.

    Librosa Dataset Columns Explanation

    • MFCC1 - MFCC20: The first 20 Mel-frequency cepstral coefficients, which are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum").
    • Spectral Contrast Freq Band1 - Band7: Measures of the difference in amplitude between peaks and valleys in the spectrum. These bands capture the texture of the sound by examining the spectral shape.

    Artist Data CSV Explanation

    • id: The Spotify ID for the artist.
    • name: The name of the artist.
    • genres: A list of genres the artist is known for.
    • popularity: A metric that ranks the artist's popularity with values from 0 to 100.
    • followers: The total number of followers the artist has on Spotify.
    • url: The Spotify URL for the artist's page.

    Sources Link

    Spotify Developer Link - https://developer.spotify.com/documentation/web-api/reference/get-an-artist

    10000 Spotify Playlist - https://open.spotify.com/playlist/1YL4XoegERoragv0RK2RC9?si=569f0a0b1149489b

    License

    Refer to the Spotify License and Agreement Terms https://developer.spotify.com/terms

  20. Data from: Most Streamed Spotify Songs 2023

    • kaggle.com
    zip
    Updated Aug 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nidula Elgiriyewithana ⚡ (2023). Most Streamed Spotify Songs 2023 [Dataset]. https://www.kaggle.com/datasets/nelgiriyewithana/top-spotify-songs-2023/discussion
    Explore at:
    zip(48187 bytes)Available download formats
    Dataset updated
    Aug 26, 2023
    Authors
    Nidula Elgiriyewithana ⚡
    Description

    Description :

    This dataset contains a comprehensive list of the most famous songs of 2023 as listed on Spotify. The dataset offers a wealth of features beyond what is typically available in similar datasets. It provides insights into each song's attributes, popularity, and presence on various music platforms. The dataset includes information such as track name, artist(s) name, release date, Spotify playlists and charts, streaming statistics, Apple Music presence, Deezer presence, Shazam charts, and various audio features.

    DOI

    Here is the link for the 2024 data: "https://www.kaggle.com/datasets/nelgiriyewithana/most-streamed-spotify-songs-2024">Most Streamed Spotify Songs 2024 🟢

    Key Features:

    • track_name: Name of the song
    • artist(s)_name: Name of the artist(s) of the song
    • artist_count: Number of artists contributing to the song
    • released_year: Year when the song was released
    • released_month: Month when the song was released
    • released_day: Day of the month when the song was released
    • in_spotify_playlists: Number of Spotify playlists the song is included in
    • in_spotify_charts: Presence and rank of the song on Spotify charts
    • streams: Total number of streams on Spotify
    • in_apple_playlists: Number of Apple Music playlists the song is included in
    • in_apple_charts: Presence and rank of the song on Apple Music charts
    • in_deezer_playlists: Number of Deezer playlists the song is included in
    • in_deezer_charts: Presence and rank of the song on Deezer charts
    • in_shazam_charts: Presence and rank of the song on Shazam charts
    • bpm: Beats per minute, a measure of song tempo
    • key: Key of the song
    • mode: Mode of the song (major or minor)
    • danceability_%: Percentage indicating how suitable the song is for dancing
    • valence_%: Positivity of the song's musical content
    • energy_%: Perceived energy level of the song
    • acousticness_%: Amount of acoustic sound in the song
    • instrumentalness_%: Amount of instrumental content in the song
    • liveness_%: Presence of live performance elements
    • speechiness_%: Amount of spoken words in the song

    Potential Use Cases:

    • Music analysis: Explore patterns in audio features to understand trends and preferences in popular songs.
    • Platform comparison: Compare the song's popularity across different music platforms.
    • Artist impact: Analyze how artist involvement and attributes relate to a song's success.
    • Temporal trends: Identify any shifts in music attributes and preferences over time.
    • Cross-platform presence: Investigate how songs perform across different streaming services.

    If you find this dataset useful, your support through an upvote would be greatly appreciated ❤️🙂
    Thank you

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Malinga Rajapaksha (2024). My Spotify Data - Cleaned [Dataset]. https://www.kaggle.com/datasets/malingarajapaksha/my-spotify-data-cleaned
Organization logo

My Spotify Data - Cleaned

cleaned version of spotify streaming history

Explore at:
zip(2952139 bytes)Available download formats
Dataset updated
Jan 26, 2024
Authors
Malinga Rajapaksha
Description

The dataset contains records of the user's Spotify streaming history, with each row representing a specific instance of a played track. The data includes various attributes providing insights into the user's music listening habits.

Columns:

  1. ts (Timestamp):

    • The timestamp when the track was played.
  2. platform:

    • The platform or device used for streaming (e.g., Windows 10).
  3. ms_played:

    • The duration in milliseconds of how long the track was played.
  4. conn_country:

    • The country code indicating the user's location during streaming (e.g., LK for Sri Lanka).
  5. master_metadata_track_name:

    • The name of the track played.
  6. master_metadata_album_artist_name:

    • The artist of the album to which the track belongs.
  7. master_metadata_album_album_name:

    • The name of the album containing the track.
  8. spotify_track_uri:

    • The unique Spotify URI for the track.
  9. reason_start:

    • The reason for starting the track (e.g., play button clicked).
  10. reason_end:

    • The reason for ending the track (e.g., track done).
  11. shuffle:

    • Indicates whether shuffle mode was enabled (True/False).
  12. offline:

    • Indicates whether the track was played offline (True/False).
  13. offline_timestamp:

    • Timestamp indicating when the track was played offline (if applicable).
  14. incognito_mode:

    • Indicates whether incognito mode was enabled (True/False).

Purpose:

This dataset is suitable for performing detailed Exploratory Data Analysis (EDA) to uncover patterns, trends, and insights into the user's music-listening behaviour. Potential analyses could include the distribution of listening durations, favourite artists and tracks, exploration of geographic listening patterns, and examination of usage patterns across different platforms.

Visualization tools such as Matplotlib and Seaborn could be utilized for a more in-depth analysis to create visual representations of the findings. This dataset aligns well with your interest in data science, offering opportunities to apply analytical techniques to real-world streaming data.

Search
Clear search
Close search
Google apps
Main menu