This is an analysis of the data on Spotify tracks from 1921-2020 with Jupyter Notebook and Python Data Science tools.
The Spotify dataset (titled data.csv) consists of 160,000+ tracks sorted by name, from 1921-2020 found in Spotify as of June 2020. Collected by Kaggle user and Turkish Data Scientist Yamaç Eren Ay, the data was retrieved and tabulated from the Spotify Web API. Each row in the dataset corresponds to a track, with variables such as the title, artist, and year located in their respective columns. Aside from the fundamental variables, musical elements of each track, such as the tempo, danceability, and key, were likewise extracted; the algorithm for these values were generated by Spotify based on a range of technical parameters.
Spotify Data.ipynb
is the main notebook where the data is imported for EDA and FII.data.csv
is the dataset downloaded from Kaggle.spotify_eda.html
is the HTML file for the comprehensive EDA done using the Pandas Profiling module.Credits to gabminamedez for the original dataset.
https://choosealicense.com/licenses/bsd/https://choosealicense.com/licenses/bsd/
Content
This is a dataset of Spotify tracks over a range of 125 different genres. Each track has some audio features associated with it. The data is in CSV format which is tabular and can be loaded quickly.
Usage
The dataset can be used for:
Building a Recommendation System based on some user input or preference Classification purposes based on audio features and available genres Any other application that you can think of. Feel free to discuss!
Column… See the full description on the dataset page: https://huggingface.co/datasets/maharshipandya/spotify-tracks-dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a list of lyrics from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc. We also provide some informations as lyrics which can be used to natural language processing.
The audio data was scraped using Echo Nest® API integrated engine with spotipy Python’s package. The spotipy API permits the user to search for specific genres, artists,songs, release date, etc. To obtain the lyrics we used the Lyrics Genius® API as baseURL for requesting data based on the song title and artist name.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MusicOSet is an open and enhanced dataset of musical elements (artists, songs and albums) based on musical popularity classification. Provides a directly accessible collection of data suitable for numerous tasks in music data mining (e.g., data visualization, classification, clustering, similarity search, MIR, HSS and so forth). To create MusicOSet, the potential information sources were divided into three main categories: music popularity sources, metadata sources, and acoustic and lyrical features sources. Data from all three categories were initially collected between January and May 2019. Nevertheless, the update and enhancement of the data happened in June 2019.
The attractive features of MusicOSet include:
| Data | # Records |
|:-----------------:|:---------:|
| Songs | 20,405 |
| Artists | 11,518 |
| Albums | 26,522 |
| Lyrics | 19,664 |
| Acoustic Features | 20,405 |
| Genres | 1,561 |
This dataset was created by Noor Saeed
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This curated dataset contains detailed music reviews scraped from Metacritic, covering hundreds of albums across genres. Delivered in CSV format, it includes critic scores, review excerpts, album names, artists, release years, and publication sources. Ideal for machine learning, sentiment analysis, data journalism, and research in music trends or media bias.
Whether you're building a recommender system, training a model on music-related sentiment, or visualizing the evolution of critical reception in music — this dataset gives you a powerful head start.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This distribution includes details of the SLAC multimodal music dataset as well as features extracted from it. This dataset is intended to facilitate research comparing relative musical influences of four different musical modalities: symbolic, lyrical, audio and cultural. SLAC was assembled by independently collecting, for each of its component musical pieces, a symbolic MIDI encoding, a lyrical text transcription, an audio MP3 recording and cultural information mined from the internet. It is important to emphasize the independence of how each of these components were collected; for example, the MIDI and MP3 encodings of each piece were collected entirely separately, and neither was generated from the other.
Features have been extracted from each of the musical pieces in SLAC using the jMIR (http://jmir.sourceforge.net) feature extractor corresponding to each of the modalities: jSymbolic for symbolic, jLyrics for lyrics, jAudio for audio and jWebMiner2 for mining cultural data from search engines and Last.fm (https://www.last.fm).
SLAC is quite small, consisting of only 250 pieces. This is due to the difficulty of finding matching information in all four modalities independently. Although this limited size does pose certain limitations, the dataset is nonetheless the largest (and only) known dataset including all four independently collected modalities.
The dataset is divided into ten genres, with 25 pieces belonging to each genre: Modern Blues, Traditional Blues, Baroque, Romantic, Bop, Swing, Hardcore Rap, Pop Rap, Alternative Rock and Metal. These can be collapsed into a 5-genre taxonomy, with 50 pieces per genre: Blues, Classical, Jazz, Rap and Rock. This facilitates experiments with both coarser and finer classes.
SLAC was published at the ISMIR 2010 conference, and was itself an expansion of the SAC dataset (published at the ISMIR 2008 conference), which is identical except that it excludes the lyrics and lyrical features found in SLAC. Both ISMIR papers are included in this distribution.
Due to copyright limitations, this distribution does not include the actual music or lyrics of the pieces comprising SLAC. It does, however, include details of the contents of the dataset as well as features extracted from each of its modalities using the jMIR software. These include the original features extracted for the 2010 ISMIR paper, as well as an updated set of symbolic features extracted in 2021 using the newer jSymbolic 2.2 feature extractor (published at ISMIR 2018). These jSymbolic 2.2 features include both the full MIDI feature set and a “conservative” feature set meant to limit potential biases due to encoding practice. Feature values are distributed as CSV files, Weka ARFF (https://www.cs.waikato.ac.nz/ml/weka/) files and ACE XML (http://jmir.sourceforge.net) files.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Jeremy Ray Acevedo
Released under CC0: Public Domain
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset for: Leipold, B. & Loepthien, T. (2021). Attentive and emotional listening to music: The role of positive and negative affect. Jahrbuch Musikpsychologie, 30. https://doi.org/10.5964/jbdgm.78 In a cross-sectional study associations of global affect with two ways of listening to music – attentive–analytical listening (AL) and emotional listening (EL) were examined. More specifically, the degrees to which AL and EL are differentially correlated with positive and negative affect were examined. In Study 1, a sample of 1,291 individuals responded to questionnaires on listening to music, positive affect (PA), and negative affect (NA). We used the PANAS that measures PA and NA as high arousal dimensions. AL was positively correlated with PA, EL with NA. Moderation analyses showed stronger associations between PA and AL when NA was low. Study 2 (499 participants) differentiated between three facets of affect and focused, in addition to PA and NA, on the role of relaxation. Similar to the findings of Study 1, AL was correlated with PA, EL with NA and PA. Moderation analyses indicated that the degree to which PA is associated with an individual´s tendency to listen to music attentively depends on their degree of relaxation. In addition, the correlation between pleasant activation and EL was stronger for individuals who were more relaxed; for individuals who were less relaxed the correlation between unpleasant activation and EL was stronger. In sum, the results demonstrate not only simple bivariate correlations, but also that the expected associations vary, depending on the different affective states. We argue that the results reflect a dual function of listening to music, which includes emotional regulation and information processing.: Dataset Study 1
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
This upload contains the supplementary material for our paper presented at the MMM2024 conference.
The dataset contains rich text descriptions for music audio files collected from Wikipedia articles.
The audio files are freely accessible and available for download through the URLs provided in the dataset.
A few hand-picked, simplified examples of the dataset.
file |
aspects |
sentences |
['bongoes', 'percussion instrument', 'cumbia', 'drums'] |
['a loop of bongoes playing a cumbia beat at 99 bpm'] | |
🔈 Example of double tracking in a pop-rock song (3 guitar tracks).ogg |
['bass', 'rock', 'guitar music', 'guitar', 'pop', 'drums'] |
['a pop-rock song'] |
['jazz standard', 'instrumental', 'jazz music', 'jazz'] |
['Considered to be a jazz standard', 'is an jazz composition'] | |
['chirping birds', 'ambient percussion', 'new-age', 'flute', 'recorder', 'single instrument', 'woodwind'] |
['features a single instrument with delayed echo, as well as ambient percussion and chirping birds', 'a new-age composition for recorder'] | |
['instrumental', 'brass band'] |
['an instrumental brass band performance'] | |
... |
... |
... |
We provide three variants of the dataset in the data
folder.
All are described in the paper.
all.csv
contains all the data we collected, without any filtering.filtered_sf.csv
contains the data obtained using the self-filtering method.filtered_mc.csv
contains the data obtained using the MusicCaps dataset method.Each CSV file contains the following columns:
file
: the name of the audio filepageid
: the ID of the Wikipedia article where the text was collected fromaspects
: the short-form (tag) description texts collected from the Wikipedia articlessentences
: the long-form (caption) description texts collected from the Wikipedia articlesaudio_url
: the URL of the audio fileurl
: the URL of the Wikipedia article where the text was collected fromIf you use this dataset in your research, please cite the following paper:
@inproceedings{wikimute,
title = {WikiMuTe: {A} Web-Sourced Dataset of Semantic Descriptions for Music Audio},
author = {Weck, Benno and Kirchhoff, Holger and Grosche, Peter and Serra, Xavier},
booktitle = "MultiMedia Modeling",
year = "2024",
publisher = "Springer Nature Switzerland",
address = "Cham",
pages = "42--56",
doi = {10.1007/978-3-031-56435-2_4},
url = {https://doi.org/10.1007/978-3-031-56435-2_4},
}
The data is available under the Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.
Each entry in the dataset contains a URL linking to the article, where the text data was collected from.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Music emotion recognition delineates and categorises the spectrum of emotions expressed within musical compositions by conducting a comprehensive analysis of fundamental attributes, including melody, rhythm, and timbre. This task is pivotal for the tailoring of music recommendations, the enhancement of music production, the facilitation of psychotherapeutic interventions, and the execution of market analyses, among other applications. The cornerstone is the establishment of a music emotion recognition dataset annotated with reliable emotional labels, furnishing machine learning algorithms with essential training and validation tools, thereby underpinning the precision and dependability of emotion detection. The Music Emotion Dataset with 2496 Songs (Memo2496) dataset, comprising 2496 instrumental musical pieces annotated with valence-arousal (VA) labels and acoustic features, is introduced to advance music emotion recognition and affective computing. The dataset is meticulously annotated by 30 music experts proficient in music theory and devoid of cognitive impairments, ensuring an unbiased perspective. The annotation methodology and experimental paradigm are grounded in previously validated studies, guaranteeing the integrity and high calibre of the data annotations.Memo2496 R1 updated by Qilin Li @12Feb20251. Remove some unannotated music raw data, now the music contained in MusicRawData.zip file are all annotated music.2. The ‘Music Raw Data.zip’ file on FigShare has been updated to contain 2496 songs, consistent with the corpus described in the manuscript. The metadata fields on “Title”, “Contributing Artists”, “Genre”, and/or “Album” have been removed to ensure the songs remain anonymous.3. Adjusted the file structure, now the files on FigShare are placed in folders named ‘Music Raw Data’, ‘Annotations’, ‘Features’, and ‘Data Processing Utilities’ to reflect the format of the Data Records section in the manuscript.Memo2496 R2 updated by Qilin Li @14Feb2025The source of each song's download platform has been added in ‘songs_info_all.csv’ to enable users to search within the platform itself if necessary. This approach aims to balance the privacy requirements of the data with the potential needs of the dataset's users.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This upload includes the ESSENTIA analysis output of (a subset of) song snippets from the Million Song Dataset, namely those included in the Taste Profile subset. The audio snippets were collected from 7digital.com and were subsequently analyzed with ESSENTIA 2.1-beta3. Pre-trained SVM models provided by the ESSENTIA authors on their website were applied.
The file msd_song_jsons.rar contains the ESSENTIA analysis output after applying the SVM models for highlevel feature extraction. Please note that these are 204317 files.
The file msd_played_songs_essentia.csv.gz contains all one-dimensional real-valued fields of the jsons merged into one csv file with 204317 rows.
The full procedure and subsequent analysis is described in
Fricke, K. R., Greenberg, D. M., Rentfrow, P. J., & Herzberg, P. Y. (2019). Measuring musical preferences from listening behavior: Data from one million people and 200,000 songs. Psychology of Music, 0305735619868280.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
FANTASIAThis repository contains the data related to image descriptors and sound associated with a selection of frames of the films Fantasia and Fantasia 2000 produced by DisneyAboutThis repository contains the data used in the article Automatic composition of descriptive music: A case study of the relationship between image and sound published in the 6th International Workshop on Computational Creativity, Concept Invention, and General Intelligence (C3GI). Data structure is explained in detail in the article. AbstractHuman beings establish relationships with the environment mainly through sight and hearing. This work focuses on the concept of descriptive music, which makes use of sound resources to narrate a story. The Fantasia film, produced by Walt Disney was used in the case study. One of its musical pieces is analyzed in order to obtain the relationship between image and music. This connection is subsequently used to create a descriptive musical composition from a new video. Naive Bayes, Support Vector Machine and Random Forest are the three classifiers studied for the model induction process. After an analysis of their performance, it was concluded that Random Forest provided the best solution; the produced musical composition had a considerably high descriptive quality. DataNutcracker_data.arff: Image descriptors and the most important sound of each frame from the fragment "The Nutcracker Suite" in film Fantasia. Data stored into ARFF format.Firebird_data.arff: Image descriptors of each frame from the fragment "The Firebird" in film Fantasia 2000. Data stored into ARFF format.Firebird_midi_prediction.csv: Frame number of the fragment "The Firebird" in film Fantasia 2000 and the sound predicted by the system encoded in MIDI. Data stored into CSV format.Firebird_prediction.mp3: Audio file with the synthesizing of the prediction data for the fragment "The Firebird" of film Fantasia 2000.LicenseData is available under MIT License. To make use of the data the article must be cited.
Children's Song Dataset is open source dataset for singing voice research. This dataset contains 50 Korean and 50 English songs sung by one Korean female professional pop singer. Each song is recorded in two separate keys resulting in a total of 200 audio recordings. Each audio recording is paired with a MIDI transcription and lyrics annotations in both grapheme-level and phoneme-level.
Dataset Structure
The entire data splits into Korean and English and each language splits into 'wav', 'mid', 'lyric', 'txt' and 'csv' folders. Each song has the identical file name for each format. Each format represents following information. Additional information like original song name, tempo and time signature for each song can be found in 'metadata.json'.
Vocal Recording
While recording vocals, the singer sang along with the background music tracks. She deliberately rendered the singing in a “plain” style refraining from expressive singing skills. The recording took place in a dedicated soundproof room. Singer recorded three to four takes for each song and the best parts are combined into a single audio track. Two identical songs with different keys are discriminated by character 'a' and 'b' at the end of a filename.
MIDI Transcription
The MIDI data consists of monophonic notes. Each note contains onset and offset times which were manually fine-tuned along with the corresponding syllable. MIDI notes do not include any expression data or control change messages because those parameters can be ambiguous to define for singing voice. Singing voice is an highly expressive sound and it is hard to define precise onset timings and pitches. We assumed one syllable matches with one MIDI note and made the following criteria to represent various expressions in singing voice.
Lyric Annotation
Text files in the 'lyric' folder contains raw text for corresponding audio and the 'txt' folder contains phoneme-level lyric representation. The phoneme-level lyric representation is annotated in a special text format. Phonemes in a syllable are tied with underbar('_') and syllables are separated with space(' '). Each phonemes are annotated based on the international phonetic alphabet (IPA) and romanized symbols are used to annotate IPA symbols. You can find romanized IPA symbols and more detailed information in this repository.
License
This dataset was created by the KAIST Music and Audio Computing Lab under Industrial Technology Innovation Program (No. 10080667, Development of conversational speech synthesis technology to express emotion and personality of robots through sound source diversification) supported support by the Ministry of Trade, Industry & Energy (MOTIE, Korea).
CSD is released under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). It is provided primarily for research purposes and it is prohibited to be used for commercial purposes. When sharing your result based on CSD, any act that defames the original singer is strictly prohibited.
For more details, we refer to the following publication. We would highly appreciate if publications partly based on CSD quote the following publication:
Choi, S., Kim, W., Park, S., Yong, S., & Nam, J. (2020). Children’s Song Dataset for Singing Voice Research. 21th International Society for Music Information Retrieval Conference (ISMIR).
We are interested in knowing if you find...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the ground truth data used to evaluate the musical pitch, tempo and key estimation algorithms developed during the AudioCommons H2020 EU project and which are part of the Audio Commons Audio Extractor tool. It also includes ground truth information for the single-eventness audio descriptor also developed for the same tool.
This ground truth data has been used to generate the following documents:
All these documents are available in the materials section of the AudioCommons website.
All ground truth data in this repository is provided in the form of CSV files. Each CSV file corresponds to one of the individual datasets used in one or more evaluation tasks of the aforementioned deliverables. This repository does not include the audio files of each individual dataset, but includes references to the audio files. The following paragraphs describe the structure of the CSV files and give some notes about how to obtain the audio files in case these would be needed.
Structure of the CSV files
All CSV files in this repository (with the sole exception of SINGLE EVENT - Ground Truth.csv) feature the following 5 columns:
The remaining CSV file, SINGLE EVENT - Ground Truth.csv, has only the following 2 columns:
How to get the audio data
In this section we provide some notes about how to obtain the audio files corresponding to the ground truth annotations provided here. Note that due to licensing restrictions we are not allowed to re-distribute the audio data corresponding to most of these ground truth annotations.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
30 random song files from a single day of recording for each of 36 adult male zebra finches. Bird's experimental conditions and song tutors can be found in Bird_list.csv Each bird's .zip contains 30 .wav files of song, 30 .not.mat files with song annotations generated using the evsonganaly GUI in MATLAB, and a .csv file called '[Bird_ID]_syll_df_evsonganaly.csv' with all annotations from all files in the folder. Each row in this .csv represents a single syllable. The 'files' column has the name of the .wav containing that syllable, the 'onsets' and 'offsets' columns have the timestamps of the syllable onset and offset within the .wav file in milliseconds, and the 'labels' column contains the manual annotation of that syllable type. Syllables labeled 'i' represent introductory notes. Otherwise, the labeling characters are arbitrary.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is built by using data from Spotify. It provides a daily chart of the 200 most streamed songs for each country and territory it is present, as well as an aggregated global chart.
Considering that countries behave differently when it comes to musical tastes, we use chart data from global and regional markets from January 2017 to March 2022 (downloaded from CSV files), considering 68 distinct markets.
We also provide information about the hit songs and artists present in the charts, such as all collaborating artists within a song (since the charts only provide the main ones) and their respective genres, which is the core of this work. MGD+ also provides data about musical collaboration, as we build collaboration networks based on artist partnerships in hit songs. Therefore, this dataset contains:
Genre Networks: Success-based genre collaboration networks
Artist Networks: Success-based artist collaboration networks
Artists: Some artist data
Hit Songs: Hit Song data and features
Charts: Enhanced data from Spotify Daily Top 200 Charts
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was studied on Temporal Analysis and Visualisation of Music paper, in the following link:
https://sol.sbc.org.br/index.php/eniac/article/view/12155
This dataset provides a list of lyrics from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc. We also provide some informations as lyrics which can be used to natural language processing.
The audio data was scraped using Echo Nest® API integrated engine with spotipy Python’s package. The spotipy API permits the user to search for specific genres, artists,songs, release date, etc. To obtain the lyrics we used the Lyrics Genius® API as baseURL for requesting data based on the song title and artist name.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MusicNet is a collection of 330 freely-licensed classical music recordings, together with over 1 million annotated labels indicating the precise time of each note in every recording, the instrument that plays each note, and the note's position in the metrical structure of the composition. The labels are acquired from musical scores aligned to recordings by dynamic time warping. The labels are verified by trained musicians; we estimate a labeling error rate of 4%. We offer the MusicNet labels to the machine learning and music communities as a resource for training models and a common benchmark for comparing results. This dataset was introduced in the paper "Learning Features of Music from Scratch." [1]
This repository consists of 3 top-level files:
musicnet.tar.gz - This file contains the MusicNet dataset itself, consisting of PCM-encoded audio wave files (.wav) and corresponding CSV-encoded note label files (.csv). The data is organized according to the train/test split described and used in "Invariances and Data Augmentation for Supervised Music Transcription". [2]
musicnet_metadata.csv - This file contains track-level information about recordings contained in MusicNet. The data and label files are named with MusicNet ids, which you can use to cross-index the data and labels with this metadata file.
musicnet_midis.tar.gz - This file contains the reference MIDI files used to construct the MusicNet labels.
A PyTorch interface for accessing the MusicNet dataset is available on GitHub. For an audio/visual introduction and summary of this dataset, see the MusicNet inspector, created by Jong Wook Kim. The audio recordings in MusicNet consist of Creative Commons licensed and Public Domain performances, sourced from the Isabella Stewart Gardner Museum, the European Archive Foundation, and Musopen. The provenance of specific recordings and midis are described in the metadata file.
[1] Learning Features of Music from Scratch. John Thickstun, Zaid Harchaoui, and Sham M. Kakade. In International Conference on Learning Representations (ICLR), 2017. ArXiv Report.
@inproceedings{thickstun2017learning, title={Learning Features of Music from Scratch}, author = {John Thickstun and Zaid Harchaoui and Sham M. Kakade}, year={2017}, booktitle = {International Conference on Learning Representations (ICLR)} }
[2] Invariances and Data Augmentation for Supervised Music Transcription. John Thickstun, Zaid Harchaoui, Dean P. Foster, and Sham M. Kakade. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018. ArXiv Report.
@inproceedings{thickstun2018invariances, title={Invariances and Data Augmentation for Supervised Music Transcription}, author = {John Thickstun and Zaid Harchaoui and Dean P. Foster and Sham M. Kakade}, year={2018}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is based on the subset of users in the #nowplaying dataset who publish their #nowplaying tweets via Spotify. In principle, the dataset holds users, their playlists and the tracks contained in these playlists.
The csv-file holding the dataset contains the following columns: "user_id", "artistname", "trackname", "playlistname", where
user_id is a hash of the user's Spotify user name
artistname is the name of the artist
trackname is the title of the track and
playlistname is the name of the playlist that contains this track.
The separator used is , each entry is enclosed by double quotes and the escape character used is .
A description of the generation of the dataset and the dataset itself can be found in the following paper:
Pichl, Martin; Zangerle, Eva; Specht, Günther: "Towards a Context-Aware Music Recommendation Approach: What is Hidden in the Playlist Name?" in 15th IEEE International Conference on Data Mining Workshops (ICDM 2015), pp. 1360-1365, IEEE, Atlantic City, 2015.
This is an analysis of the data on Spotify tracks from 1921-2020 with Jupyter Notebook and Python Data Science tools.
The Spotify dataset (titled data.csv) consists of 160,000+ tracks sorted by name, from 1921-2020 found in Spotify as of June 2020. Collected by Kaggle user and Turkish Data Scientist Yamaç Eren Ay, the data was retrieved and tabulated from the Spotify Web API. Each row in the dataset corresponds to a track, with variables such as the title, artist, and year located in their respective columns. Aside from the fundamental variables, musical elements of each track, such as the tempo, danceability, and key, were likewise extracted; the algorithm for these values were generated by Spotify based on a range of technical parameters.
Spotify Data.ipynb
is the main notebook where the data is imported for EDA and FII.data.csv
is the dataset downloaded from Kaggle.spotify_eda.html
is the HTML file for the comprehensive EDA done using the Pandas Profiling module.Credits to gabminamedez for the original dataset.