28 datasets found

h
spotify-tracks-dataset
huggingface.co
Updated Jun 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
maharshipandya (2023). spotify-tracks-dataset [Dataset]. https://huggingface.co/datasets/maharshipandya/spotify-tracks-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2023
Authors
maharshipandya
License
https://choosealicense.com/licenses/bsd/https://choosealicense.com/licenses/bsd/
Description
Content

This is a dataset of Spotify tracks over a range of 125 different genres. Each track has some audio features associated with it. The data is in CSV format which is tabular and can be loaded quickly.

Usage

The dataset can be used for:

Building a Recommendation System based on some user input or preference Classification purposes based on audio features and available genres Any other application that you can think of. Feel free to discuss!

Column… See the full description on the dataset page: https://huggingface.co/datasets/maharshipandya/spotify-tracks-dataset.
Data from: Spotify Playlists Dataset
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Pichl; Eva Zangerle; Eva Zangerle; Martin Pichl (2020). Spotify Playlists Dataset [Dataset]. http://doi.org/10.5281/zenodo.2594557
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2594557
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Martin Pichl; Eva Zangerle; Eva Zangerle; Martin Pichl
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is based on the subset of users in the #nowplaying dataset who publish their #nowplaying tweets via Spotify. In principle, the dataset holds users, their playlists and the tracks contained in these playlists.

The csv-file holding the dataset contains the following columns: "user_id", "artistname", "trackname", "playlistname", where

user_id is a hash of the user's Spotify user name

artistname is the name of the artist

trackname is the title of the track and

playlistname is the name of the playlist that contains this track.

The separator used is , each entry is enclosed by double quotes and the escape character used is \.

A description of the generation of the dataset and the dataset itself can be found in the following paper:

Pichl, Martin; Zangerle, Eva; Specht, Günther: "Towards a Context-Aware Music Recommendation Approach: What is Hidden in the Playlist Name?" in 15th IEEE International Conference on Data Mining Workshops (ICDM 2015), pp. 1360-1365, IEEE, Atlantic City, 2015.
Spotify's most streamed albums 2024
statista.com
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Spotify's most streamed albums 2024 [Dataset]. https://www.statista.com/topics/2075/spotify/
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
In 2024, Taylor Swift was the artist with the most streamed album on Spotify. Her album "THE TORTURED POETS DEPARTMENT" was streamed over 6.6 billion times in 2024. She also entered the top 10 with her album "Lover" in sixth position, having registered nearly 3.3 billion streams on Spotify.
Z
Spotify Million Playlist: Recsys Challenge 2018 Dataset
data.niaid.nih.gov
zenodo.org
Updated Apr 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AIcrowd (2022). Spotify Million Playlist: Recsys Challenge 2018 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6425592
Explore at:
Dataset updated
Apr 9, 2022
Dataset authored and provided by
AIcrowd
Description
Spotify Million Playlist Dataset Challenge

Summary

The Spotify Million Playlist Dataset Challenge consists of a dataset and evaluation to enable research in music recommendations. It is a continuation of the RecSys Challenge 2018, which ran from January to July 2018. The dataset contains 1,000,000 playlists, including playlist titles and track titles, created by users on the Spotify platform between January 2010 and October 2017. The evaluation task is automatic playlist continuation: given a seed playlist title and/or initial set of tracks in a playlist, to predict the subsequent tracks in that playlist. This is an open-ended challenge intended to encourage research in music recommendations, and no prizes will be awarded (other than bragging rights).

Background

Playlists like Today’s Top Hits and RapCaviar have millions of loyal followers, while Discover Weekly and Daily Mix are just a couple of our personalized playlists made especially to match your unique musical tastes.

Our users love playlists too. In fact, the Digital Music Alliance, in their 2018 Annual Music Report, state that 54% of consumers say that playlists are replacing albums in their listening habits.

But our users don’t love just listening to playlists, they also love creating them. To date, over 4 billion playlists have been created and shared by Spotify users. People create playlists for all sorts of reasons: some playlists group together music categorically (e.g., by genre, artist, year, or city), by mood, theme, or occasion (e.g., romantic, sad, holiday), or for a particular purpose (e.g., focus, workout). Some playlists are even made to land a dream job, or to send a message to someone special.

The other thing we love here at Spotify is playlist research. By learning from the playlists that people create, we can learn all sorts of things about the deep relationship between people and music. Why do certain songs go together? What is the difference between “Beach Vibes” and “Forest Vibes”? And what words do people use to describe which playlists?

By learning more about nature of playlists, we may also be able to suggest other tracks that a listener would enjoy in the context of a given playlist. This can make playlist creation easier, and ultimately help people find more of the music they love.

Dataset

To enable this type of research at scale, in 2018 we sponsored the RecSys Challenge 2018, which introduced the Million Playlist Dataset (MPD) to the research community. Sampled from the over 4 billion public playlists on Spotify, this dataset of 1 million playlists consist of over 2 million unique tracks by nearly 300,000 artists, and represents the largest public dataset of music playlists in the world. The dataset includes public playlists created by US Spotify users between January 2010 and November 2017. The challenge ran from January to July 2018, and received 1,467 submissions from 410 teams. A summary of the challenge and the top scoring submissions was published in the ACM Transactions on Intelligent Systems and Technology.

In September 2020, we re-released the dataset as an open-ended challenge on AIcrowd.com. The dataset can now be downloaded by registered participants from the Resources page.

Each playlist in the MPD contains a playlist title, the track list (including track IDs and metadata), and other metadata fields (last edit time, number of playlist edits, and more). All data is anonymized to protect user privacy. Playlists are sampled with some randomization, are manually filtered for playlist quality and to remove offensive content, and have some dithering and fictitious tracks added to them. As such, the dataset is not representative of the true distribution of playlists on the Spotify platform, and must not be interpreted as such in any research or analysis performed on the dataset.

Dataset Contains

1000 examples of each scenario:

Title only (no tracks) Title and first track Title and first 5 tracks First 5 tracks only Title and first 10 tracks First 10 tracks only Title and first 25 tracks Title and 25 random tracks Title and first 100 tracks Title and 100 random tracks

Download Link

Full Details: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge Download Link: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge/dataset_files
Spotify - Beyoncé's Track Data
kaggle.com
Updated Mar 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
yuka_with_data (2024). Spotify - Beyoncé's Track Data [Dataset]. https://www.kaggle.com/datasets/yukawithdata/beyonce-track-attribute-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
yuka_with_data
Description
💁‍♀️Please take a moment to carefully read through this description and metadata to better understand the dataset and its nuances before proceeding to the Suggestions and Discussions section.

Dataset Description:

This dataset compiles the tracks from all of Beyoncé's albums available on Spotify, showcasing the evolution of one of the most influential artists in the music industry. It represents a comprehensive array of genres, influences, and musical styles that Beyoncé has explored throughout her career. Each track in the dataset is detailed with a variety of features, popularity, and metadata. This dataset serves as an excellent resource for music enthusiasts, data analysts, and researchers aiming to explore the impact of Beyoncé's music, identify trends in her musical evolution, or develop music recommendation systems based on empirical data.

Scope of the Data:

The focus of this dataset is on providing a comprehensive view of Beyoncé's musical releases on Spotify, specifically tailored to showcase her creative output. To this end, the dataset includes tracks from the following album types: - Albums: Full-length albums released by Beyoncé, encapsulating a range of her musical styles and eras. - Singles: Standalone single releases, highlighting key songs that have been released independently of her full albums. It's important to note that this dataset deliberately excludes compilation albums. Compilations, which often contain a mixture of tracks from various artists or previously released tracks by Beyoncé, are not included to maintain a focus on her original releases and to provide a clearer picture of her artistic evolution.

Data Collection and Processing:

Obtaining the Data: The data was obtained directly from the Spotify Web API, specifically focusing on albums and tracks by Beyoncé. The Spotify API provides detailed information about tracks, artists, and albums through various endpoints.

Data Processing: To process and structure the data, Python scripts were developed using data science libraries such as pandas for data manipulation and spotipy for API interactions, specifically for Spotify data retrieval.

Workflow: - Authentication - API Requests - Data Cleaning and Transformation - Saving the Data

Attribute Descriptions:

artist_name: the name of the artist (Beyoncé and collaborators)

track_name: the title of the track

is_explicit: Indicates whether the track contains explicit content

album_release_date: The date when the track was released

genres: A list of genres associated with Beyoncé

danceability: A measure from 0.0 to 1.0 indicating how suitable a track is for - dancing based on a combination of musical elements

valence: A measure from 0.0 to 1.0 indicating the musical positiveness conveyed by a track

energy: A measure from 0.0 to 1.0 representing a perceptual measure of intensity and activity

loudness: The overall loudness of a track in decibels (dB)

acousticness: A measure from 0.0 to 1.0 whether the track is acoustic

instrumentalness: Predicts whether a track contains no vocals

liveness: Detects the presence of an audience in the recordings

speechiness: Detects the presence of spoken words in a track

key: The key the track is in. Integers map to pitches using standard Pitch Class notation

tempo: The overall estimated tempo of a track in beats per minute (BPM)

mode: Modality of the track

duration_ms: The length of the track in milliseconds

time_signature: An estimated overall time signature of a track

popularity: A score between 0 and 100, with 100 being the most popular

Possible Data Projects:

Trend Analysis in Beyonce's Musical Evolution

Mood and Musical Elements in Beyonce's Tracks

Beyonce's Influence on the Music Industry Analysis

Disclaimer and Responsible Use:

This dataset, derived from Spotify focusing on Beyoncé's albums and tracks, is intended for educational, research, and analysis purposes only. Users are urged to use this data responsibly, ethically, and within the bounds of legal stipulations. - Compliance with Terms of Service: Users should adhere to Spotify's Terms of Service and Developer Policies when utilizing this dataset. - Copyright Notice: The dataset presents music track information including names and artist details for analytical purposes and does not convey any rights to the music itself. Users must ensure that their use does not infringe on the copyright holders' rights. Any analysis, distribution, or derivative work should respect the intellectual property rights of all involved parties and comply with applicable laws. - No Warranty Disclaimer: The dataset is provided "as is," without warranty, and the creator disclaims any legal liability for its use by others. - Ethical Use: Users are encouraged to consider the ethical implications of their analyses and the potential impact...
Spotify's most streamed artists as of 2024
statista.com
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Spotify's most streamed artists as of 2024 [Dataset]. https://www.statista.com/topics/2075/spotify/
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
In 2024, Taylor Swift was the most streamed artist on Spotify. Her songs were streamed over 28 billion times within the year. The second most streamed artist was The Weeknd with more than 13 billion streams in 2023.
Song Features Dataset - Regressing Popularity
kaggle.com
Updated Jan 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayush Oturkar (2023). Song Features Dataset - Regressing Popularity [Dataset]. https://www.kaggle.com/datasets/ayushnitb/song-features-dataset-regressing-popularity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ayush Oturkar
Description
Introduction Spotify for Developers offers a wide range of possibilities to utilize the extensive catalog of Spotify data. One of them are the audio features calculated for each song and made available via the official Spotify Web API.

This is an attempt to retrieve the spotify data post the last extracted data. Haven't fully tested if this spotify allowed any other API full request post 2019

About Each song (row) has values for artist name, track name, track id and the audio features itself (for more information about the audio features check out this doc from Spotify).

Additionally, there is also a popularity feature included in this dataset. Please note that Spotify recalculates this value based on the number of plays the track receives so it might not be correct value anymore when you access the data.

Key Questions/Hypothesis that can be Answered 1. ARE SONGS IN MAJOR MODE ARE MORE POPULAR THAN ONES IN MINOR? 2. ARE SONGS WITH HIGH LOUDNESS ARE MOST POPULAR? 3. MOST PEOPLE LIKE LISTENING TO SONGS WITH SHORTER DURATION?

In addition more detailed analysis can be done to see what causes a song to be popular.

Credit Entire Credit goes to Spotify for providing this data via their Web API.

https://developer.spotify.com/documentation/web-api/reference/tracks/get-track/
Spotify's premium subscribers 2015-2025
statista.com
abripper.com
Updated Oct 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Spotify's premium subscribers 2015-2025 [Dataset]. https://www.statista.com/statistics/244995/number-of-paying-spotify-subscribers/
Explore at:
Dataset updated
Oct 6, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
How many paid subscribers does Spotify have? As of the second quarter of 2025, Spotify had 276 million premium subscribers worldwide, up from 246 million in the corresponding quarter of 2024. Spotify’s subscriber base has increased dramatically in the last few years and has more than doubled since early 2019. Spotify and competitors Spotify is a music streaming service originally founded in 2006 in Sweden. The platform can be used from various devices and allows users to browse through a catalog of music licensed through multiple record labels, as well as create and share playlists with other users. Additionally, listeners are able to enjoy music for free with advertisements or are also given the option to purchase a subscription to allow for unlimited ad-free music streaming. Spotify’s largest competitors are Pandora, a company that offers a similar service and remains popular in the United States, and Apple Music, which was launched in 2015. While Pandora was once among the highest-grossing music apps in the Apple App Store, recent rankings show that global services like QQ Music, NetEase Cloud Music, and YouTube Music now generate higher monthly revenues.Users can also register Spotify accounts using Facebook directly through the website using an app. This enables them to connect with other Facebook friends and explore their music tastes and playlists. Spotify is a popular source for keeping up-to-date with music, and the ability to enjoy Spotify anywhere at any time allows consumers to shape their music consumption around their lifestyles and preferences.
🎹 Spotify Tracks Dataset
kaggle.com
Updated Oct 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MaharshiPandya (2022). 🎹 Spotify Tracks Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/4372070
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/4372070
Dataset updated
Oct 22, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
MaharshiPandya
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Content

This is a dataset of Spotify tracks over a range of 125 different genres. Each track has some audio features associated with it. The data is in CSV format which is tabular and can be loaded quickly.

Usage

The dataset can be used for:

Building a Recommendation System based on some user input or preference

Classification purposes based on audio features and available genres

Any other application that you can think of. Feel free to discuss!

Column Description

track_id: The Spotify ID for the track

artists: The artists' names who performed the track. If there is more than one artist, they are separated by a ;

album_name: The album name in which the track appears

track_name: Name of the track

popularity: The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are. Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently. Artist and album popularity is derived mathematically from track popularity.

duration_ms: The track length in milliseconds

explicit: Whether or not the track has explicit lyrics (true = yes it does; false = no it does not OR unknown)

danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable

energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale

key: The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1

loudness: The overall loudness of a track in decibels (dB)

mode: Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0

speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks

acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic

instrumentalness: Predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content

liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live

valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry)

tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration

time_signature: An estimated time signature. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of 3/4, to 7/4.

track_genre: The genre in which the track belongs

Acknowledgement

Image credits: BPR world
Weekly time spent listening to music worldwide 2022-2023
statista.com
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Weekly time spent listening to music worldwide 2022-2023 [Dataset]. https://www.statista.com/topics/2075/spotify/
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
According to a study from 2023, music listeners worldwide spent 20.7 hours on average listening to music. This marked a slight increase from the previous year, when the weekly average time stood at 20.1 hours.
Consumers enjoying music or podcast streaming in the U.S. 2023, by age
statista.com
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Consumers enjoying music or podcast streaming in the U.S. 2023, by age [Dataset]. https://www.statista.com/topics/2075/spotify/
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
According to a 2023 survey, 52 percent of 18 to 24 year-olds and 40 percent of 25 to 34 year-olds in the United States said they enjoyed streaming music and podcast content. The age group which enjoyed the audio content the least was 55 years and older, with 31 percent of U.S. respondents saying they enjoyed listening to online music or podcasts.
h
spotify-tracks-lite
huggingface.co
Updated May 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anton Blu (2024). spotify-tracks-lite [Dataset]. https://huggingface.co/datasets/engels/spotify-tracks-lite
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 14, 2024
Authors
Anton Blu
License
https://choosealicense.com/licenses/bsd/https://choosealicense.com/licenses/bsd/
Description
Context

This dataset consists of 24000 tracks from 30 genres, and is a shrunk version of maharshipandya/spotify-tracks-dataset dataset. All non-heuristic data is cut and cleaned for better usability and performance. All data taken from Spotify API and is open source. This dataset can be used to train prediction models based on user preferences, or categorise tracks by corresponding heuristic.

Column Description

danceability: Danceability describes how suitable a track is… See the full description on the dataset page: https://huggingface.co/datasets/engels/spotify-tracks-lite.
Eminem Dataset
kaggle.com
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chik0di (2025). Eminem Dataset [Dataset]. https://www.kaggle.com/datasets/chik0di/eminem-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 3, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Chik0di
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Eminem Spotify Dataset 🎶

This dataset features detailed insights into the discography of Eminem, one of the most iconic artists in hip-hop history, scraped directly from Spotify using Python. The data spans Eminem's career, covering the characteristics of his most-streamed tracks and albums.

🎯 Potential Use Cases - Music Analysis: Analyze how audio features like tempo, energy, and danceability vary across Eminem’s albums and see if they correlate with popularity. - Comparative Analysis: Compare Eminem’s musical style to other artists or genres by using similar datasets. - Predictive Modeling: Use the audio features to train a model that predicts a track's popularity based on its characteristics. - Time-Series Analysis: Track the evolution of Eminem’s musical style over time by analyzing changes in audio features across his discography.

With data-driven insights, this dataset is perfect for anyone interested in analyzing Eminem’s musical style, exploring the qualities of popular vs. lesser-known tracks, or even using the data for machine learning models to predict a song's success.

Other Artists' Dataset: - Tyler, The Creator Dataset
Z
MGD: Music Genre Dataset
data.niaid.nih.gov
Updated May 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mirella M. Moro (2021). MGD: Music Genre Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4778562
Explore at:
Dataset updated
May 28, 2021
Dataset provided by
Mirella M. Moro
Gabriel P. Oliveira
Anisio Lacerda
Danilo B. Seufitelli
Mariana O. Silva
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MGD: Music Genre Dataset

Over recent years, the world has seen a dramatic change in the way people consume music, moving from physical records to streaming services. Since 2017, such services have become the main source of revenue within the global recorded music market. Therefore, this dataset is built by using data from Spotify. It provides a weekly chart of the 200 most streamed songs for each country and territory it is present, as well as an aggregated global chart.

Considering that countries behave differently when it comes to musical tastes, we use chart data from global and regional markets from January 2017 to December 2019, considering eight of the top 10 music markets according to IFPI: United States (1st), Japan (2nd), United Kingdom (3rd), Germany (4th), France (5th), Canada (8th), Australia (9th), and Brazil (10th).

We also provide information about the hit songs and artists present in the charts, such as all collaborating artists within a song (since the charts only provide the main ones) and their respective genres, which is the core of this work. MGD also provides data about musical collaboration, as we build collaboration networks based on artist partnerships in hit songs. Therefore, this dataset contains:

Genre Networks: Success-based genre collaboration networks

Genre Mapping: Genre mapping from Spotify genres to super-genres

Artist Networks: Success-based artist collaboration networks

Artists: Some artist data

Hit Songs: Hit Song data and features

Charts: Enhanced data from Spotify Weekly Top 200 Charts

This dataset was originally built for a conference paper at ISMIR 2020. If you make use of the dataset, please also cite the following paper:

Gabriel P. Oliveira, Mariana O. Silva, Danilo B. Seufitelli, Anisio Lacerda, and Mirella M. Moro. Detecting Collaboration Profiles in Success-based Music Genre Networks. In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR 2020), 2020.

@inproceedings{ismir/OliveiraSSLM20, title = {Detecting Collaboration Profiles in Success-based Music Genre Networks}, author = {Gabriel P. Oliveira and Mariana O. Silva and Danilo B. Seufitelli and Anisio Lacerda and Mirella M. Moro}, booktitle = {21st International Society for Music Information Retrieval Conference} pages = {726--732}, year = {2020} }
s
Spotify AI Playlist Prompt Dataset
spotmod.online
Updated Jul 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Spotmod (2025). Spotify AI Playlist Prompt Dataset [Dataset]. https://spotmod.online/spotify-ai-playlists/
Explore at:
Dataset updated
Jul 19, 2025
Dataset authored and provided by
Spotmod
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A breakdown of how Spotify’s AI playlists are generated using user prompts, NLP, and music metadata.
Spotify's Daily Song Ranking - music released date
kaggle.com
Updated May 6, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kmmd (2018). Spotify's Daily Song Ranking - music released date [Dataset]. https://www.kaggle.com/nnqkfdjq/spotifys-daily-song-ranking-music-released-date/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 6, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kmmd
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
These are the published date of music videos of every song in

https://www.kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking

Most of the time, music videos published dates are same as music themselves.

It would be valid to use the dates as release dates.

There are no other sources better than youtube to cover as much songs as possible.

The file contains no header

20 songs remained Nan (unavailable to find related videos)

This data was retrieved by Youtube API
Z
MuSe: The Musical Sentiment Dataset
data.niaid.nih.gov
Updated Mar 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akiki, Christopher; Burghardt, Manuel (2021). MuSe: The Musical Sentiment Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4281164
Explore at:
Dataset updated
Mar 1, 2021
Dataset provided by
Leipzig University
Authors
Akiki, Christopher; Burghardt, Manuel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The MuSe (Music Sentiment) dataset contains sentiment information for 90,408 songs. We computed scores for the affective dimensions of valence, dominance and arousal, based on the user-generated tags that are available for each song via Last.fm. In addition, we provide artist and title metadata as well as a Spotify ID and a MusicBrainz ID, which allow researchers to extend the dataset with further metadata, such as genre or year.

Though the tags themselves cannot be included in the dataset, we include a jupyter notebook in our accompanying Github repository that demonstrates how to fetch the tags of a given song from the Last.fm API (Last.fm_API.ipynb)

We further include a jupyter notebook in the same repository that demonstrates how one might enrich the dataset with audio features using different endpoints of the Spotify API using the included Spotify IDs (spotify_API.ipynb). Please note that in its current form, the dataset only contains tentative spotify IDs for a subset (around 68%) of the songs.
Most popular music streaming services in the U.S. 2018-2019, by audience
statista.com
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Most popular music streaming services in the U.S. 2018-2019, by audience [Dataset]. https://www.statista.com/statistics/798125/most-popular-us-music-streaming-services-ranked-by-audience/
Explore at:
Dataset updated
May 20, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 2018 - Sep 2019
Area covered
United States
Description
The most successful music streaming service in the United States was Apple Music as of September, with the most up to date information showing that 49.5 million users accessed the platform each month. Spotify closely followed, with a similarly impressive 47.7 million monthly users.

What is a music streaming service?

Music streaming services provide their users with a database compiled of songs, playlists, albums and videos, where content can be accessed online, downloaded, shared, bookmarked and organized.

The music streaming business is huge, and has sometimes been lauded as the savior of the music industry. The biggest two services are in constant competition for the monopoly of the market. Apple Music was launched in 2015, whereas Spotify has been around since 2008. Other popular streaming services include Deezer, SoundCloud and iHeartRadio.

Do artists make a lot of money from streaming services? 

In short, unfortunately not. Both Apple Music and Spotify have been frequently criticized for the tiny royalty payments they offer artists. Particularly for emerging talent, streaming services are far from a lucrative source of income. Bigger, established stars like Taylor Swift are more likely to regularly make a good amount of money this way. But either way, a track needs to go viral or be streamed several million times before it earns any real cash.
Data from: Malware Finances and Operations: a Data-Driven Study of the Value...
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Jun 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juha Nurmi; Juha Nurmi; Mikko Niemelä; Mikko Niemelä; Billy Brumley; Billy Brumley (2023). Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access [Dataset]. http://doi.org/10.5281/zenodo.8047205
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8047205
Dataset updated
Jun 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juha Nurmi; Juha Nurmi; Mikko Niemelä; Mikko Niemelä; Billy Brumley; Billy Brumley
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.

Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.

We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.

1. MalwareInfectionSet
We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.

2. VictimAccessSet
We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.

3. AccountAccessSet
The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.

Credits Authors

Billy Bob Brumley (Tampere University, Tampere, Finland)

Juha Nurmi (Tampere University, Tampere, Finland)

Mikko Niemelä (Cyber Intelligence House, Singapore)

Funding

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).

Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
f
Wiki-MID Dataset (LOD + TSV)
figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giovanni Stilo (2023). Wiki-MID Dataset (LOD + TSV) [Dataset]. http://doi.org/10.6084/m9.figshare.6231326.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6231326.v2
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Giovanni Stilo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Wiki-MID Dataset Wiki-MID is a LOD compliant multi-domain interests dataset to train and test Recommender Systems. Our English dataset includes an average of 90 multi-domain preferences per user on music, books, movies, celebrities, sport, politics and much more, for about half million Twitter users traced during six months in 2017. Preferences are either extracted from messages of users who use Spotify, Goodreads and other similar content sharing platforms, or induced from their "topical" friends, i.e., followees representing an interest rather than a social relation between peers. In addition, preferred items are matched with Wikipedia articles describing them. This unique feature of our dataset provides a mean to categorize preferred items, exploiting available semantic resources linked to Wikipedia such as the Wikipedia Category Graph, DBpedia, BabelNet and others. Data model: Our resource is designed on top of the Semantically-Interlinked Online Communities (SIOC) core ontology. The SIOC ontology favors the inclusion of data mined from social networks communities into the Linked Open Data (LOD) cloud.We represent Twitter users as instances of the SIOC UserAccount class.Topical users and message based user interests are then associated, through the usage of the Simple Knowledge Organization System Namespace Document (SKOS) predicate relatedMatch, to a corresponding Wikipedia page as a result of our automated mapping methodology.

Facebook

Twitter

Click to copy link

Link copied

Cite

maharshipandya (2023). spotify-tracks-dataset [Dataset]. https://huggingface.co/datasets/maharshipandya/spotify-tracks-dataset

spotify-tracks-dataset

Spotify Tracks Dataset

maharshipandya/spotify-tracks-dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 30, 2023

Authors

maharshipandya

License

https://choosealicense.com/licenses/bsd/https://choosealicense.com/licenses/bsd/

Description

Content

This is a dataset of Spotify tracks over a range of 125 different genres. Each track has some audio features associated with it. The data is in CSV format which is tabular and can be loaded quickly.

  Usage

The dataset can be used for:

Building a Recommendation System based on some user input or preference Classification purposes based on audio features and available genres Any other application that you can think of. Feel free to discuss!

  Column… See the full description on the dataset page: https://huggingface.co/datasets/maharshipandya/spotify-tracks-dataset.

Clear search

Close search

Google apps

Main menu

spotify-tracks-dataset

Data from: Spotify Playlists Dataset

Spotify's most streamed albums 2024

Spotify Million Playlist: Recsys Challenge 2018 Dataset

Spotify - Beyoncé's Track Data

Dataset Description:

Scope of the Data:

Data Collection and Processing:

Attribute Descriptions:

Possible Data Projects:

Disclaimer and Responsible Use:

Spotify's most streamed artists as of 2024

Song Features Dataset - Regressing Popularity

Spotify's premium subscribers 2015-2025

🎹 Spotify Tracks Dataset

Content

Usage

Column Description

Acknowledgement

Weekly time spent listening to music worldwide 2022-2023

Consumers enjoying music or podcast streaming in the U.S. 2023, by age

spotify-tracks-lite

Eminem Dataset

Eminem Spotify Dataset 🎶

MGD: Music Genre Dataset

Spotify AI Playlist Prompt Dataset

Spotify's Daily Song Ranking - music released date

MuSe: The Musical Sentiment Dataset

Most popular music streaming services in the U.S. 2018-2019, by audience

Data from: Malware Finances and Operations: a Data-Driven Study of the Value...

Wiki-MID Dataset (LOD + TSV)

spotify-tracks-dataset

Spotify Tracks Dataset

maharshipandya/spotify-tracks-dataset