Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset offers a comprehensive glimpse into the evolution of contemporary music, featuring 620 tracks from 87 artists who have dominated the charts between 2000 and 2023. Representing the pulse of modern pop and R&B, this collection captures the diversity and dynamism of the Hot 100 hits over the past two decades. Each track is meticulously annotated with Spotify's audio features, providing a rich, data-driven perspective on the sonic characteristics that have shaped the soundscape of the 21st century. From tempo to energy levels, and from danceability to valence, this dataset is a treasure trove for anyone looking to explore the trends and transformations in popular music.
Facebook
TwitterThe data this week comes from Data.World by way of Sean Miller, Billboard.com and Spotify.
Billboard Top 100 - Wikipedia
The Billboard Hot 100 is the music industry standard record chart in the United States for songs, published weekly by Billboard magazine. Chart rankings are based on sales (physical and digital), radio play, and online streaming in the United States.
Billboard Top 100 Article
Drake rewrites the record for the most entries ever on the Billboard Hot 100, as he lands his 208th career title on the latest list, dated March 21
billboard.csv| variable | class | description |
|---|---|---|
| url | character | Billboard Chart URL |
| week_id | character | Week ID |
| week_position | double | Week position 1: 100 |
| song | character | Song name |
| performer | character | Performer name |
| song_id | character | Song ID, combo of song/singer |
| instance | double | Instance (this is used to separate breaks on the chart for a given song. Example, an instance of 6 tells you that this is the sixth time this song has appeared on the chart) |
| previous_week_position | double | Previous week position |
| peak_position | double | Peak position as of that week |
| weeks_on_chart | double | Weeks on chart as of that week |
audio_features.csv| variable | class | description |
|---|---|---|
| song_id | character | Song ID |
| performer | character | Performer name |
| song | character | Song |
| spotify_genre | character | Genre |
| spotify_track_id | character | Track ID |
| spotify_track_preview_url | character | Spotify URL |
| spotify_track_duration_ms | double | Duration in ms |
| spotify_track_explicit | logical | Is explicit |
| spotify_track_album | character | Album name |
| danceability | double | Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. |
| energy | double | Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. |
| key | double | The estimated overall key of the track. Integers map to pitches using standard Pitch Class notation . E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1. |
| loudness | double | The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db. |
| mode | double | Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0. |
| speechiness | double | Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks. |
| acousticness | double | A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic. |
| instrumentalness | double | Predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0. |
| liveness | double | Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that t... |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MusicOSet is an open and enhanced dataset of musical elements (artists, songs and albums) based on musical popularity classification. Provides a directly accessible collection of data suitable for numerous tasks in music data mining (e.g., data visualization, classification, clustering, similarity search, MIR, HSS and so forth). To create MusicOSet, the potential information sources were divided into three main categories: music popularity sources, metadata sources, and acoustic and lyrical features sources. Data from all three categories were initially collected between January and May 2019. Nevertheless, the update and enhancement of the data happened in June 2019.
The attractive features of MusicOSet include:
| Data | # Records |
|:-----------------:|:---------:|
| Songs | 20,405 |
| Artists | 11,518 |
| Albums | 26,522 |
| Lyrics | 19,664 |
| Acoustic Features | 20,405 |
| Genres | 1,561 |
Facebook
TwitterBy Sean Miller [source]
The Billboard Hot 100 Weekly Charts with Audio dataset is a comprehensive collection that combines the historical data of the Billboard Hot 100 weekly singles charts with detailed audio features extracted from Spotify. The dataset provides valuable insights into the popularity and musical attributes of songs that have appeared on the Billboard charts.
The primary dataset, Hot Stuff.csv, includes information about each song's position on the weekly charts. It contains columns such as the Billboard Chart URL, WeekID, Song name, Performer name, unique SongID (concatenation of song and performer), Current week on chart, Instance (indicating breaks in chart appearances), Previous week position, Peak Position (highest chart position reached), and Weeks on Chart.
The second dataset, Hot 100 Audio Features.csv, provides in-depth audio features of each song sourced from Spotify's Web API. This includes various metrics such as danceability (suitability for dancing based on musical elements), energy level (intensity and activity), key (musical key signature), loudness (overall volume level in decibels dB), mode (major or minor key), speechiness rating (presence of spoken words in songs), acousticness rating (acoustic quality measure), instrumentalness rating (likelihood of a song being instrumental), liveness rating (presence of a live audience during recording/performance) valence rating(musical positiveness conveyed by a song). Additionally it provides tempo in BPM and time signature(e.g., 4/4 -the rhythm pattern).
Furthermore , this comprehensive dataset encompasses Spotify-related features: track preview URL for audio samples before full streaming or purchase decisions; total duration measured in milliseconds; explicit content indication; album details for songs; genre details provided by Spotify.
With this combined data set, researchers can analyze trends and patterns over time regarding how different audio features relate to a song's popularity and performance on the Billboard Hot 100. It offers endless possibilities for studying the influence of specific music attributes on commercial success and understanding the preferences of popular music audiences.
Whether you are interested in exploring genre-based trends, discovering correlations between chart positions and audio features, or investigating how certain attributes contribute to a song's longevity on the charts, this dataset serves as a valuable resource for deep analysis and insights into Billboard Hot 100 songs
Understanding the Datasets:
- The dataset consists of two files: Hot Stuff.csv and Hot 100 Audio Features.csv.
- The Hot Stuff.csv file contains the weekly Hot 100 singles chart data, including song names, performer names, chart positions, and other relevant information.
- The Hot 100 Audio Features.csv file contains detailed audio features for each song extracted from Spotify, such as danceability, energy, instrumentalness, etc.
- Both files can be merged using common attributes like Performer and Song to get a combined view of both datasets.
Exploring the Hot Stuff.csv File:
- This file provides information about each song's position on that week's Hot 100 singles chart.
- Important columns in this file are:
- WeekID: The week identifier.
- Song name: The name of the song.
- Performer name: The name of the performer or artist.
- Current week on chart: Represents how many weeks the song has been on the chart at that particular point in time.
- Instance: Indicates whether it is a separate entry for an already listed song (for example, an instance value of 6 means it appeared for the sixth time).
- Previous week position: The position of the song on the previous week's chart.
- Peak Position: The highest position reached by a particular song on any given week's chart.
- Weeks on Chart: Represents how many weeks a specific entry has spent on the chart so far.
Exploring the Hot 100 Audio Features.csv File:
- This file provides detailed audio features for each song extracted from Spotify using the Spotify Web API.
- It contains attributes like danceability, energy, instrumentalness, tempo, etc., which help capture different aspects of the song's musical characteristics.
- Important columns in this file are:
- Performer: The name of the performer or artist of the song.
- Song: The name of the song.
- spotify_genre: The genre(s) of the song according to Spotify....
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
I downloaded this dataset from a UT Austin GitHub repository you can find here
It contains the billboard hot 100 charts from 1958-2024.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Data of the Billboards Hot 100 chart for about 67 weeks using a Scrapy spider and BeautifulSoup.
To get more data use the following spider: https://github.com/ssen7/billboards-crawler
Requirements: Python3, scrapy, bs4
Each file contains the Billboard Chart Hot 100 songs of that week including the artist's name, previous week's rank, change in rank (default) and peak rank.
Billboard Charts: https://www.billboard.com/charts/hot-100/ scrapy team and BeautifulSoup
To build a tool to gather chart data for pop songs that can crawl data according to user specifications.
A secondary goal was to have a dataset that can track an artist's performance (to the extent a song on the Billboards Hot 100 can) across years.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a large file (~20MB) called EvolutionPopUSA_MainData.csv, in comma-separated data format with column headers. Each row corresponds to a recording. The file is viewable in any text editor, and can also be opened in Excel or imported to other data processing programs. Below is a list of the column headers, with annotations. public_idunique ID of the recording artist_namename of the recording artist artist_name_cleanartist name all upper case, no spaces, with secondary artists ("featuring") removed. track_namename of the track, i.e. usually name of the song first_entrydate of the first entry into the Billboard Hot 100 quarter, year, fiveyear, decadetransformations of first_entry to coarser time periods eraera the track belongs to (1,...,4), as determined by Foote segmentation on the PC data (see below) clustercluster membership of the track, as derived by k-means clustering on the PC data (see below) hTopic_01, ... , hTopic_08harmonic Topic weights, see description in the paper tTopic_01, ... , tTopic_08timbral Topic weights, see description in the paper PC1, ... , PC14principal components of the harmonic and timbral Topics harm_…193 columns of chord change counts; the chord change is indicated in the column label (e.g. harm_M.2.M means major chord followed by another major chord 2 semitones up). timb_01, ... , timb_3535 columns of timbre class counts (see description in supplementary information)
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
A curated, long-horizon collection of major Billboard music charts — from their beginnings (as far back as publicly accessible) up to the present. Data is refreshed every Wednesday at 02:00 (server time) so you always have up-to-date rankings for analysis, trending, and music data science projects.
Each file is a time series of weekly chart positions:
| File | Chart | Type |
|---|---|---|
billboard200.csv | Billboard 200 | Albums |
hot100.csv | Hot 100 | Songs (flagship singles chart) |
radio.csv | Radio Songs | Airplay-driven ranking |
streaming_songs.csv | Streaming Songs | Streaming activity |
digital_songs.csv | Digital Song Sales | Download activity |
All chart files share a consistent schema:
| Column | Description |
|---|---|
date | Chart week (YYYY-MM-DD; represents the chart issue date) |
title | Song (or album) title |
artist | Primary credited artist(s) |
rank | Current chart position for that week |
last_week | Position in the previous published week (may be blank if new) |
peak_pos | Best (lowest number) rank achieved to date |
weeks_on_chart | Total number of charting weeks up to and including this row |
image_url | Artwork URL when available (see Notes) |
New data is added weekly:
⏰ Every Wednesday at 02:00 (automated scraping + ingestion pipeline).
Missed a week? Older weeks are retained, so you can still build complete time series.
;|image_url field is set to #.This dataset is maintained by an automated pipeline available as open-source here:
🔗 Billboard Scraper GitHub Repository
The project includes: - A weekly Airflow DAG to scrape and upload fresh data - Backup manual scraping scripts - Configurable settings - Code to push updates directly to Kaggle
Feel free to explore it, fork it, or contribute improvements!
If you have suggestions for additional charts or improvements, feel free to reach out or share your ideas in the Kaggle discussion section.
Facebook
TwitterThe Billboard Hot 100 is a chart that ranks the best-performing singles of the United States. Its data, published by Billboard magazine and compiled by Nielsen Sound Scan, is based collectively on each single's weekly physical and digital sales, as well as airplay and streaming. At the end of a year, Billboard will publish an annual list of the 100 most successful songs throughout that year on the Hot 100 chart based on the information.
Billboard year end chart works These charts are a cumulative measure of a single or album's performance in the United States, based upon the Billboard magazine charts during any given chart year. Other factors including the total weeks a song spent on the chart and at its peak position were calculated into its year-end total.
Billboard Hot is determined The Hot 100 is ranked by radio airplay audience impressions as measured by Nielsen BDS, sales data compiled by Nielsen Sound scan (both at retail and digitally) and streaming activity provided by online music sources. There are several component charts that contribute to the overall calculation of the Hot 100.
The Billboard Global 200 is a weekly record chart published by Billboard magazine. The chart ranks the top songs globally and is based on digital sales and online streaming from over 200 territories worldwide.
Stories about the Billboard 200 albums chart generally post on Sunday afternoons, while stories about the Billboard Hot 100 generally post each Monday afternoon. Other stories, podcasts, videos and more covering our full menu of charts post throughout the week.
In the US, Billboard represents the cream of all the objective data. And their efforts to collect all the data from all these various sources to create an objective, final tally of each artist's popularity in a given week, still has merit. ... This is why the billboard chart is important.
So here we had collected list of Billboard Hot 100 singles from the year 1992 to 2014.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Combines two data tables and a PDF with extensive information on which artists and tags strongly feature the T-Topics and H-Topics from the paper.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Music is a volatile industry, where its dynamic nature can directly influence artist career behavior. That is, musical careers can suffer ups and downs depending on the current market moment. This dataset provides data about hot streak periods in musical careers, which are defined by high-impact bursts occurring in sequence.
Success in the music industry has a temporal structure, as the audience tastes change over time. Here, we use the Billboard Hot 100 charts with Spotify data to represent success over time. For musical careers, we build their time series from the debut date (i.e., date of the first release obtained from Spotify) to the last chart collected. Thus, each point in the time series represents the success of such an artist in a given week, according to the Hot 100 chart.
Therefore, we present MUHSIC (Music-oriented Hot Streak Information Collection), which contains:
Charts: enhanced data on all weekly Hot 100 Charts
Artists: artist success time series with hot streak information
Genres: genre success time series with hot streak information (the genre is the aggregated of all its artists)
Hot Streaks: summarized hot streak information
Facebook
TwitterDrake reigns supreme as the best-selling artist of all time worldwide, with an impressive 298.5 million certified units sold. This Canadian rapper has dominated the music industry, surpassing legendary acts like The Beatles and Elvis Presley. Drake's success reflects the changing landscape of popular music, with hip-hop and contemporary R&B artists now occupying top spots alongside rock and pop icons. Hip-hop's growing influence The rise of hip-hop is evident in the list of best-selling artists, with Drake, Eminem, and Kanye West all ranking in the top 10. This trend is further supported by recent Billboard chart data, which shows Drake as the top songwriter from 2012 to 2023, with 52 songs in the Billboard Top 100. The rapper's dominance extends to his performance as a solo artist, where he also leads with 52 songs in the Top 100 during the same period.
Diversity and representation in music While male artists still dominate the best-selling list, female artists like Rihanna, Beyoncé, and Taylor Swift have secured high positions. This aligns with recent trends showing increased representation of women in popular music. A study found that 35 percent of artists featuring on songs in the top 100 charts between 2012 and 2023 were women, up from 30 percent in the previous year. The industry's evolving landscape is further exemplified by the 2024 Grammy nominations, where artists like Taylor Swift, Billie Eilish, and SZA received multiple nods, highlighting the growing recognition of diverse talent.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Billboard Hot 100 is the music industry standard record chart in the United States for songs, published weekly by Billboard magazine. Chart rankings are based on sales, radio play, and online streaming in the United States.
Every week, Billboard releases "The Hot 100" chart of songs that were trending on sales and airplay for that week. This dataset is a collection of all "The Hot 100" charts released since its inception in 1958.
Image credits: Photo by Stas Knop from Pexels
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We also provide this additional file with genre tags for every song, which we used for validation. Here, too, the rows correspond to recordings. The data sets can be joined via the recording_id field.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comusic is an ongoing project that seeks to study the impact of collaboration networks' topological features on musical success. To that end, we analyze and identify such characterizations in a musical success-based network; that is, a network composed only of successful artists. Our findings offer a new perspective on success in the music industry, unraveling how collaboration profiles can contribute to an artist's popularity.
Our Methodology
Initially, using data from Billboard and the Spotify platform, we model a "successful" collaborative network and apply tools of network science to study its structure. By means of topological metrics, we defined four categories of collaboration profiles and, applying a clustering algorithm, we identified three communities with different collaboration patterns and notable discrepancies in musical success levels. Then, we conduct a statistical correlation analysis to evaluate the correlation between collaboration profiles and the artist's success.
Our Findings
By detecting cluster and their respective patterns of network collaboration, we focus on analyzing the impact of these profiles on successful musical artists. Considering topological metrics, we define four main categories of collaboration profiles: Interaction, Distance, Influence and Similarity. Among them, we find that the first three affect musical success more intensely than Similarity.
Our Contributions
Our findings provide evidence that:
there are indeed distinct success factors for music collaboration profiles that are socially measurable, and
there exist common factors to successful collaboration in the music market.
Furthermore, our exploratory approach based on collaborative networks can easily be extended to other areas of knowledge (e.g., arts and science).
Files
Successfull Network: The successful musical collaboration network. (8,88 MB)
Billboard Charts: Some Billboard Charts data. (3,04 MB)
Ego Networks: All the 30 ego networks. (38 KB)
Time Series: All the time series. (807 KB)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a csv that contains the Billboard top 100 year-end songs in the US. Not every year (particularly the earlier years) had 100 top songs. Some years include ties. Data goes from 1946-2021. Songs can appear in multiple years. Source: Wikipedia.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 5,000 songs, blending real-world data from Spotify charts with synthetic entries to create a diverse mix of genres, artists, and attributes. Designed for machine learning models, it helps predict a song’s Peak Chart Position based on key musical and popularity metrics.
| Feature | Description |
|---|---|
| Song | Title of the track |
| Artist | Name of the performer/band |
| Streams | Total number of streams (lifetime) |
| Daily Streams | Streams per day |
| Genre | Music genre (Pop, Hip-Hop, Rock, etc.) |
| Release Year | Year the song was released |
| Peak Position | Highest Billboard/Spotify chart rank achieved |
| Weeks on Chart | Total weeks spent on the chart |
| Lyrics Sentiment | Sentiment analysis of lyrics (-1 to +1) |
| TikTok Virality | Popularity score based on TikTok trends (0-100) |
| Danceability | How danceable the song is (0-1) |
| Acousticness | Level of acoustic elements (0-1) |
| Energy | Overall energy level of the song (0-1) |
🔗 Optimized for machine learning & data visualization! 🚀
Facebook
TwitterThe dataset is first introduced in the following paper: Siqi Wu, Marian-Andrei Rizoiu, and Lexing Xie. Beyond Views: Measuring and Predicting Engagement in Online Videos. In AAAI International Conference on Weblogs and Social Media (ICWSM), 2018. Tweeted videos dataset This dataset contains YouTube videos published between July 1st and August 31st, 2016. To be collected, the video needs (a) be mentioned on Twitter during aforementioned collection period; (b) have insight statistics available; (c) have at least 100 views within the first 30 days after upload. Quality videos datasets These datasets contain videos deemed of high quality by domain experts. Vevo videos: Videos of verified Vevo artists, as of August 31st, 2016. Billboard16 videos: Videos of 2016 Billboard Hot 100 chart. Top news videos: Videos of top 100 most viewed News channels. freebase_mid_type_name.csv It maps a freebase mid to a real-world entity. See more details in this data description.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
License information was derived automatically
This dataset contains Shazam query timings ('offsets') and query dates corresponding to 20 hit songs from the Billboard Year End Hot 100 2015 chart. Queries were aggregated from 1 January 2014 to 31 May 2016, inclusive. Number of queries per song range from 3,020,785 to 19,974,795, with a total of 188,271,243 queries across the 20 songs. Data are stored in .csv files (one file per song) ranging in size from 62.9MB to 416.1MB. The total size of the dataset is around 4GB.
Facebook
TwitterAccording to a study on representation and equality in the music industry, only *** percent of producers were female while approximately **** percent were male. The share of female music producers has been increasing since 2017, despite the setback in 2020 and still leaving a significant gap in terms of proportionate representation. Gender inequality in the music industry Even though music audiences are as diverse as ever, and recent data has also indicated that male and female listeners account for similar shares of digital music users in the United States, there are still significant gaps when it comes to the representation of different groups. The share of female songwriters across the top 100 songs in 2020 stood at below ** percent - a figure that has pretty much remained unchanged in the past decade. But this disparity not only unfolds behind the scenes: In 2020, just over ** percent of artists on Billboard’s top 100 charts were female, and in genres like hip-hop or alternative, this share was even lower. Grammy Awards The fact that the music industry remains a male-dominated landscape is also reflected in the Grammy Awards. While the show made headlines by merging male and female categories back in 2012, the imbalances have remained. Data on the gender distribution of Grammy nominees collected between 2013 and 2021 shows that less than ** percent of nominees for awards like Record of the Year, Album of the Year, and Producer of the Year were female. And even though the playing field was much more balanced in the Best New Artist category, many artists still fail to get the spotlight they deserve.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset offers a comprehensive glimpse into the evolution of contemporary music, featuring 620 tracks from 87 artists who have dominated the charts between 2000 and 2023. Representing the pulse of modern pop and R&B, this collection captures the diversity and dynamism of the Hot 100 hits over the past two decades. Each track is meticulously annotated with Spotify's audio features, providing a rich, data-driven perspective on the sonic characteristics that have shaped the soundscape of the 21st century. From tempo to energy levels, and from danceability to valence, this dataset is a treasure trove for anyone looking to explore the trends and transformations in popular music.