Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The dataset consists of over 1.4 Million musical artists present in MusicBrainz database -- their names, tags, and popularity (listeners/scrobbles), based on data scraped from last.fm. For the code used to obtain it, see here.
Last.fm suffers from the problem of multiple artists sharing the same profile page due to the fact that they have the same name. This means that for artists that have non-unique name it is not possible to establish how many of the listeners/scrobbles should be attributed to whom. In many cases this problem is negligible – for instance, there are at least 6 artists called Nirvana in the MusicBrainz database, but their popularity is distributed such that almost all of the scrobbles can be safely assumed to come from “this” Nirvana. However, for some artists, tracking the number of listeners/scrobbles is completely broken – Arkona or Ikon are some examples of this issue. In these cases, each artist with a duplicated name has an aggregated number of listeners/scrobbles assigned to them. In addition some artists have multiple profiles due to different spelling or transliteration between alphabets (see, e.g. here and here).
Last.fm tags are generated by the website’s users, so expect a lot of stuff in the tags_lastfm column to be factually incorrect, non-serious or vulgar (e.g., check out tags for Justin Bieber).
Data in column country_lastfm is established based on matching existing country names and their adjectivals with tags received from last.fm. This approach is prone to error (see the point above). In addition, there is an unfortunate ambiguity in tags such as spanish or german. These are used both to indicate where the artists come from and to indicate the language in which the lyrics are written. Because of that, many Latin American, Austrian, and Swiss artists have an incorrect country assigned to them.