Search
Clear search
Close search
Main menu
Google apps
100+ datasets found
  1. Spotify Lyrics Dataset

    • kaggle.com
    Updated Feb 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eva Bot (2023). Spotify Lyrics Dataset [Dataset]. https://www.kaggle.com/datasets/evabot/spotify-lyrics-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 7, 2023
    Dataset provided by
    Kaggle
    Authors
    Eva Bot
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset for 10k and 100k (comming soon) lyrics from spotify tracks crawled from both spotify and lyrics web pages.

    Some lyrics contain some noise due to heavily scrapped content. The dataset contains information about:

    • Name of the song
    • Implied Artists: For some artists, no exploration was performed and, therefore, all we have is the artist Id. JSON format.
    • Explicit: Flag indicated if the lyrics are marked as explicit by Spotify.
    • Genres: Genres of the song by merging the genres of the collaborating artists with semicolon separator.
    • Lyrics.

    Feel free to use it however you wish in non-commercial terms. This dataset does not aim to represent Spotify or any platform opinions or intervention and should only be considered for educational or personal usage.

    For any concerns regarding the dataset, feel free to contact me in any of my media GitHub or mail.

  2. P

    WASABI Dataset

    • paperswithcode.com
    Updated Feb 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WASABI Dataset [Dataset]. https://paperswithcode.com/dataset/wasabi
    Explore at:
    Dataset updated
    Feb 20, 2021
    Description

    The WASABI Song Corpus is a large corpus of songs enriched with metadata extracted from music databases on the Web, and resulting from the processing of song lyrics and from audio analysis. More specifically, given that lyrics encode an important part of the semantics of a song, the authors focus on the description of the methods they proposed to extract relevant information from the lyrics, such as their structure segmentation, their topics, the explicitness of the lyrics content, the salient passages of a song and the emotions conveyed. The corpus contains 1.73M songs with lyrics (1.41M unique lyrics) annotated at different levels with the output of the above mentioned methods. Such corpus labels and the provided methods can be exploited by music search engines and music professionals (e.g. journalists, radio presenters) to better handle large collections of lyrics, allowing an intelligent browsing, categorization and segmentation recommendation of songs.

  3. English Songs Lyrics

    • kaggle.com
    Updated Apr 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    raza (2023). English Songs Lyrics [Dataset]. https://www.kaggle.com/datasets/razauhaq/english-songs-lyrics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 24, 2023
    Dataset provided by
    Kaggle
    Authors
    raza
    Description

    This dataset is preprocessed version of 5 Million Song Lyrics Dataset contaning lyrics of only English songs extracted by CARLOSGDCJ

  4. Data from: Song Interpretation Dataset

    • zenodo.org
    bin
    Updated Jan 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yixiao Zhang; Junyan Jiang; Gus Xia; Simon Dixon; Yixiao Zhang; Junyan Jiang; Gus Xia; Simon Dixon (2023). Song Interpretation Dataset [Dataset]. http://doi.org/10.5281/zenodo.7019124
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 9, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yixiao Zhang; Junyan Jiang; Gus Xia; Simon Dixon; Yixiao Zhang; Junyan Jiang; Gus Xia; Simon Dixon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Song Interpretation Dataset combines data from two sources: (1) music and metadata from the Music4All Dataset and (2) lyrics and user interpretations from SongMeanings.com. We design a music metadata-based matching algorithm that aligns matching items in the two datasets with each other. In the end, we successfully match 25.47% of the tracks in the Music4All Dataset.

    The dataset contains audio excerpts from 27,834 songs (30 seconds each, recorded at 44.1 kHz), the corresponding music metadata, about 490,000 user interpretations of the lyric text, and the number of votes given for each of these user interpretations. The average length of the interpretations is 97 words. Music in the dataset covers various genres, of which the top 5 are: Rock (11,626), Pop (6,071), Metal (2,516), Electronic (2,213) and Folk (1,760).

    For more details, please refer to our paper "Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model".

  5. Z

    Data from: LFM2b Lyrics Descriptor Analyses

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elisabeth Lex (2024). LFM2b Lyrics Descriptor Analyses [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7740044
    Explore at:
    Dataset updated
    Oct 2, 2024
    Dataset provided by
    Stefan Brandl
    Emilia Parada-Cabaleiro
    Marcin Skowron
    Markus Schedl
    Maximilian Mayerl
    Eva Zangerle
    Elisabeth Lex
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LFM2b Lyrics Descriptor Analyses

    This dataset provides lyrics descriptors for 580,000 songs, including lexical, structural, diversity-related, readability, rhyme, structural, and emotional descriptors. This dataset was the basis of an analysis of the evolution of song lyrics over the course of five decades and five genres (pop, rock, rap, country, and R&B).

    Dataset Generation As a basis for the dataset, we relied on the LFM-2b dataset (http://www.cp.jku.at/datasets/LFM-2b) of listening events by Last.fm. It contains more than two billion listening records, and more than fifty million songs by more than five million artists. We enrich the dataset with information about songs' release year, genre, lyrics, and popularity information. For quantifying the popularity of tracks and lyrics, we distinguish between the listening count, i.e., the number of listening events in the LFM-2b dataset, and lyrics view count, i.e., the number of views of lyrics on the Genius platform (https://genius.com). Release years, genre information, and lyrics are obtained from the Genius platform. Genres are expressed by one primary genre. We used https://polyglot.readthedocs.io/ to automatically infer the language of the lyrics and considered only English lyrics. Adopting this procedure, we ultimately obtain complete information for 582,759 songs.

    Data and Features We provide the full dataset, containing features for 582,759 songs (full_dataset.json.gz). For each song, the dataset contains track title and artist information, genre, popularity, and release date information, and a wide variety of lexical, structural, diversity-related, readability, rhyme, structural, and emotional descriptors.

    For further information on the semantics of the features, we provided a short overview in the following. Please check the implementation of the feature extractor at https://github.com/MaximilianMayerl/CorrelatesOfSongLyrics/ for further details.

    • Track and artist
    • Genre
    • Popularity descriptors:
      • Lyrics view count
      • Last.fm playcount
    • Lexical descriptors:
      • Line counts: Total number of lines, blank lines, unique lines, ratio of blank and repeated lines
      • Token counts: Number of tokens, characters, repeated token ratio, unique tokens per line, and avg. tokens per line
      • Character counts: Number of \texttt{[!?.,:;"-()]} (total amount of these characters and individual counts per character) and digits, ratio of punctuation and digits
      • Token length: Average length of tokens
      • n-gram ratios: Ratio of unique bigrams and trigrams
      • Legomenon ratios: Ratio of hapax legomena, dis legomena and tris legomena
      • Parts of speech: Frequency of adjectives, adverbs, nouns, pronouns, verbs
      • Past tense: Ratio of verbs in past tense to other verbs
      • Stop words: Number and ratio of stop words, stop words per line
      • Uncommon words: Number of uncommon words (i.e., words not contained WordNet)
    • Diversity descriptors
      • Compression ratio: Ratio of the size of zlib compressed lyrics vs the original lyrics
      • Diversity measures: Measure of Textual Lexical Diversity (MTLD), Herdan's C, Summer's S, Dugast's U^2, and Maas' a^2
    • Readability Descriptors
      • Readability formulas: Flesch Reading Ease, Flesch Kincaid Grade, SMOG (Simple Measure of Gobbledygook), Automated Readability Index, Coleman Liau Index, Dale Chall Readability Score, Linsear Write Formula, Gunning Fog, Fernandez Huerta, Szigriszt Pazos and Gutierrez Polini
      • Difficult words: Number of difficult words (three or more syllables)
    • Rhyme Descriptors
      • Rhyme structures: Numbers of couplets, clerihews, alternating rhymes and nested rhymes
      • Rhyme words: Number of unique rhyming words, percentage of rhyming lines in the lyrics
      • Alliterations: Number of alliterations of length two, three, and four or more
    • Structural Descriptors
      • Element counts: Number of sections and verses
      • Distribution: Relation between the number of verses vs. sections and number of choruses vs sections
      • Title occurrences: Number of times the song's title appears
      • Pattern: Verse and chorus alternating, two verses and at least one chorus, two choruses and at least one verse
      • Start: Starts with chorus (binary attribute)
      • Ending: Ends with two chorus repetitions (binary attribute)
    • Emotional/Affective Descriptors
      • Sentiment scores: Positivity and negativity scores via AFINN, the sentiment lexicon by Bing Liu et al., the MPQA opinion corpus, the sentiment140 dataset, and the SentiWordNetlexicon
      • NRC: Emotion scores according to the NRC affect intensity lexicon
      • LIWC: Descriptors provided by LIWC
      • Happiness: Happiness score according to labMT
  6. h

    artist-lyrics-dataset

    • huggingface.co
    Updated Apr 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    artist-lyrics-dataset [Dataset]. https://huggingface.co/datasets/SpartanCinder/artist-lyrics-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 22, 2024
    Authors
    Connor Homayouni
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    SpartanCinder/artist-lyrics-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. 🎹960K Spotify Songs With Lyrics data🎵

    • kaggle.com
    Updated Aug 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2024). 🎹960K Spotify Songs With Lyrics data🎵 [Dataset]. http://doi.org/10.34740/kaggle/ds/5458328
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 17, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    BwandoWando
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F965fa93347a97db46bbd21982538f09b%2Fspotifysongs2.png?generation=1722177422554701&alt=media" alt="">

    The Spotify datasets that I usually see here in Kaggle are almost always Songs and Attributes only, with no lyric data. So I downloaded some of the largest Spotify datasets here in Kaggle and fed all the songs to the Spotify lyrics API.

    For all those songs with returned lyrics, I compiled them as one dataset.

    Source Datasets

    Full credits and citations to all the creators of the awesome datasets above, this dataset is a supplementary and enriched version of these datasets.

    Important Notes

    • Not all songs have available lyrics
    • Not all source datasets have Album data
    • Not all songs that have lyrics have properly annotated startTimeMs values
    • I submitted 3.3M unique songs to the Lyrics API and only 960K songs have lyrics

    Upvoting

    If you'd upvote and utilize this dataset, PLEASE upvote the datasets above as well!

    Image

    Created with Bing Image Creator

  8. Song lyrics dataset

    • kaggle.com
    Updated Apr 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mehedi Hasan9021 (2021). Song lyrics dataset [Dataset]. https://kaggle.com/mehedihasan9021/movie-script-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 26, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mehedi Hasan9021
    Description

    Dataset

    This dataset was created by Mehedi Hasan9021

    Contents

  9. Beatles Lyrics

    • kaggle.com
    zip
    Updated Jul 21, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jen Looper (2019). Beatles Lyrics [Dataset]. https://www.kaggle.com/datasets/jenlooper/beatles-lyrics
    Explore at:
    zip(48719 bytes)Available download formats
    Dataset updated
    Jul 21, 2019
    Authors
    Jen Looper
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Jen Looper

    Released under CC0: Public Domain

    Contents

  10. A

    ‘Lyrics Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jun 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Lyrics Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-lyrics-dataset-01f6/latest
    Explore at:
    Dataset updated
    Jun 3, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Lyrics Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/pratiksaha198/lyrics-generation on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    PLEASE UPVOTE IF YOU LIKE IT !

    Context

    Being a indie music lover attracts me to mellow , soulful and kinda sad songs. So why not train a Deep Learning Model to generate absolutely new lyrics of my favorite artists that didn't exist before.

    Content

    This project generates new lyrics which did'nt exist before of artists , using Recurrent-Neural-Networks (RNNs) from a data-set created by web scrapping The GENIUS website using its API.

    Challenges Faced

    Generating meaningful / coherent lyrics was the main challenge , which overcame with using a model containing :

    1. Embedding Layer
    2. GRU (Gated Recurrent Unit) Layer
    3. Dense Layer with a Dropout
    

    The Final Prediction Loop used ~

    https://camo.githubusercontent.com/210fbba1863976851c1be38f61b37507a869de4d/68747470733a2f2f7777772e74656e736f72666c6f772e6f72672f7475746f7269616c732f746578742f696d616765732f746578745f67656e65726174696f6e5f73616d706c696e672e706e67" alt="The Final Prediction Loop">

    Some lyrics generated that didn't exist before :

    Love in the world Happens each time to keep me warm Love in the rain And we'll stop eating food and I'll say the words that I want in this cold I don't have to think that I want to do I can't think of anyone, anyone else And we gotta save ourselves If we want it to be I'll be taking my time As she looks lovely in the wood

    Scope for Improvement :

    1. Using a LSTM-GAN based model, it has the best potential for most coherent output.
    2. Adding another layer of Birectional-LSTM and lowering the EPOCHS keeping the loss within 1.5 to 1.1 , for most meaningful text generated.
    

    --- Original source retains full ownership of the source dataset ---

  11. Z

    MUSDB18 lyrics extension

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Jun 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Clement S. J. Doire (2021). MUSDB18 lyrics extension [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3989266
    Explore at:
    Dataset updated
    Jun 25, 2021
    Dataset provided by
    Roland Badeau
    Gaël Richard
    Kilian Schulze-Forster
    Clement S. J. Doire
    Description

    This is a set of annotated lyrics transcripts for songs belonging to the MUSDB18 dataset. The set comprises lyrics of all songs which have English lyrics, i.e. 96 out of 100 songs for the training set and 45 out of 50 songs for the test set. MUSDB18 is a dataset for music source separation and provides the following separated tracks for each song: vocals, bass, drums, other (rest of the accompaniment), mixture.

    The lyrics transcripts, together with the audio files of MUSDB18, are a valuable resource for research on tasks such as text-informed singing voice separation, automatic lyrics alignment, automatic lyrics transcription, and singing voice synthesis and analysis. The provided data should be used for research purposes only.

    Disclaimer

    The lyrics were transcribed manually by the authors who are not native English speakers. It is likely that the transcriptions are not 100% correct. The composers of the songs are the copyright holders of the original lyrics.

    The songs were divided into sections of lengths between 3 and 12 seconds. The priority when choosing the section boundaries was that they correspond to natural pauses and do not cut vocal sounds. The sections do not necessarily correspond to lyrically meaningful lines. Most of the sections do not overlap, some have an overlap of 1 second. In some difficult cases, e.g. shouting in metal songs or mumbled words, where the words are barely intelligible, we made an effort to make the transcriptions as accurate as possible phonetically and did not prioritize semantically meaningful phrases.

    Citation

    The dataset was built for the paper

    Schulze-Forster, K., Doire, C., Richard, G., & Badeau, R. "Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation." IEEE/ACM Transactions on Audio, Speech and Language Processing (2021).

    If you use the data for your research, please cite the corresponding paper:

    @article{schulze2021phoneme, title={Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation}, author={Schulze-Forster, Kilian and Doire, Clement and Richard, Ga{"e}l and Badeau, Roland}, journal={IEEE/ACM Transactions on Audio, Speech and Language Processing}, year={2021}, publisher={IEEE} }

    Annotations

    For each section, the annotations comprise: the start and end time, the corresponding lyrics, and a label indicating one of the following four properties:

    (a) only one person is singing (b) several singers are pronouncing the same phonemes at the same time (possibly singing different notes) (c) several singers are pronouncing different phonemes simultaneously (possibly singing different notes) (d) no singing

    Segments that are labelled with the property (b) or (c) do not necessarily have this property over the whole segment duration. As soon as somewhere in a segment several singers are present, label (b) was assigned; as soon as they sung different phonemes somewhere at the same time, label (c) was assigned. Property (a) and (d) are valid for the entire segment. Furthermore, segments with property (c) can contain either some (lead) singer(s) singing some words in the presence of background singers singing long vowels such as ’ah’ or ’oh’ or they can contain multiple singers who sing different words at the same time. In the latter case, it was very difficult to recognise the sung words and to decide in which order to transcribe words or phrases sung simultaneously. These segments are marked with a '*' and it is recommended to reject them for most use cases.

    The annotations have the following format:

    Example: 00:18 00:23 a i know the reasons why --> starts at 18 sec., ends at 23 sec., vocals type (a), lyrics: i know the reasons why

    The Python script musdb_lyrics_cut_audio.py is provided to automatically cut the MUSDB songs into the annotated segments. The script requires the musdb and soundfile package. The user needs to update the paths and select the desired sources and vocals types in lines 19-26. The script saves wav-files for each selected source for each annotated segment as well as the corresponding lyrics as txt-file. The MUSDB training partition is divided into a training and validation set. The tracks for the validation set can be changed below line 29.

    The file words_and_phonemes.txt contains a list of all words and their decomposition into phonemes. The phonemes are written in 2-letter ARPABET style and obtained with the LOGIOS Lexicon Tool.

    License

    The data is licensed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, read the provided LICENSE.txt file, visit https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

    The creators of MUSDB18 lyrics extension and their corresponding affiliation institutes are not liable for, and expressly exclude, all liability for loss or damage however and whenever caused to anyone by any use of MUSDB18 lyrics extension or any part of it.

    Acknowledgment

    The authors would like to thank Olumide Okubadejo and Sinead Namur for their help with transcribing and correcting part of the lyrics.

  12. h

    lyrics-dataset

    • huggingface.co
    Updated Sep 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Younes Matrab (2024). lyrics-dataset [Dataset]. https://huggingface.co/datasets/mrYou/lyrics-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2024
    Authors
    Younes Matrab
    Description

    mrYou/lyrics-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. E

    Arab-Andalusian music lyrics dataset

    • live.european-language-grid.eu
    • zenodo.org
    json
    Updated Oct 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Arab-Andalusian music lyrics dataset [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7554
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Oct 28, 2023
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The dataset contains lyrics for the songs in the Arab-Anadalusian music collection curated within the CompMusic project, that belong to the nawbas "Isbahan", "Maya”, “Raml Maya”, “Gharibat al-Husayn”, “Hijaz Kabir”, “Hijaz Msharqi”, “Istihlal”, “Rasd”, and ”Rasd Dayl”.

    Lyrics are stored in two formats: as Tab Separated Values (TSV) files and as JSON files.

    Each file is identified by its MusicBrainz recording ID (MBID).

    The lyrics are stored both in their original Arabic script (folder 'original') and a romanized/transliterated version (folder 'transliterated') using the American Library of Congress (ALA-LC standard).

    Corresponding audio files are available from the Arab-Andalusian music corpus, as well as the Internet Archive URL included in the metadata file ('metadata.csv').

    For more information about the exact format and contents of the dataset, please consult the README provided in the archive.

    For more information, please refer to http://compmusic.upf.edu/corpora.

  14. Taylor-Swift-lyrics

    • kaggle.com
    zip
    Updated Aug 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manva Pradhan (2020). Taylor-Swift-lyrics [Dataset]. https://www.kaggle.com/pradhanmanva/taylorswiftlyrics
    Explore at:
    zip(267064 bytes)Available download formats
    Dataset updated
    Aug 2, 2020
    Authors
    Manva Pradhan
    Description

    Content

    This dataset contains all songs by Taylor Swift along with album name, release date, song names and the lyrics line-wise. This contains the latest "Folklore" album as well. It does not have lyrics "Lakes by Taylor Swift" of "Folklore". Will keep updating.

    Acknowledgements

    First and foremost, big thanks to Taylor Swift for the Lyrics. Then to Genius API, for storing those songs, albums and lyrics in a neat bow-wrapped API. Thank you for Banner Image to Reddit user: costryme !

    Inspiration

    I like (more love than like) Taylor Swift so I wanted to create a lyric-generator for a couple of artists. This dataset is part of that.

  15. P

    Jam-ALT Dataset

    • paperswithcode.com
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ondřej Cífka; Constantinos Dimitriou; Cheng-i Wang; Hendrik Schreiber; Luke Miner; Fabian-Robert Stöter (2023). Jam-ALT Dataset [Dataset]. https://paperswithcode.com/dataset/jam-alt
    Explore at:
    Dataset updated
    Nov 22, 2023
    Authors
    Ondřej Cífka; Constantinos Dimitriou; Cheng-i Wang; Hendrik Schreiber; Luke Miner; Fabian-Robert Stöter
    Description

    JamALT is a revision of the JamendoLyrics dataset (80 songs in 4 languages), adapted for use as an automatic lyrics transcription (ALT) benchmark.

    The lyrics have been revised according to the newly compiled annotation guide, which include rules about spelling, punctuation, and formatting. The audio is identical to the JamendoLyrics dataset. However, only 79 songs are included, as one of the 20 French songs has been removed due to concerns about potentially harmful content.

  16. n

    Music Dataset: Lyrics and Metadata from 1950 to 2019

    • narcis.nl
    • data.mendeley.com
    Updated Oct 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moura, L (via Mendeley Data) (2020). Music Dataset: Lyrics and Metadata from 1950 to 2019 [Dataset]. http://doi.org/10.17632/3t9vbwxgr5.3
    Explore at:
    Dataset updated
    Oct 23, 2020
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Moura, L (via Mendeley Data)
    Description

    This dataset was studied on Temporal Analysis and Visualisation of Music paper, in the following link:

           https://sol.sbc.org.br/index.php/eniac/article/view/12155
    

    This dataset provides a list of lyrics from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc. We also provide some informations as lyrics which can be used to natural language processing.

    The audio data was scraped using Echo Nest® API integrated engine with spotipy Python’s package. The spotipy API permits the user to search for specific genres, artists,songs, release date, etc. To obtain the lyrics we used the Lyrics Genius® API as baseURL for requesting data based on the song title and artist name.

  17. Lyrics Dataset

    • kaggle.com
    zip
    Updated Apr 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanika Dhayabar (2024). Lyrics Dataset [Dataset]. https://www.kaggle.com/datasets/sanikadhayabar/lyrics-dataset/data
    Explore at:
    zip(315018 bytes)Available download formats
    Dataset updated
    Apr 25, 2024
    Authors
    Sanika Dhayabar
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Sanika Dhayabar

    Released under MIT

    Contents

  18. English Music Lyrics 5 Genres (500k)

    • kaggle.com
    Updated Apr 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anurag Ghosh (2024). English Music Lyrics 5 Genres (500k) [Dataset]. https://www.kaggle.com/datasets/d3stron/english-music-lyrics-5-genres-500k/suggestions?status=pending&yourSuggestions=true
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 7, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anurag Ghosh
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Description:

    The English Lyrics Dataset from Five Music Genres is a comprehensive collection of over 500,000 samples of song lyrics spanning five diverse genres of music: country, rap, metal, rock, and pop. Each sample in the dataset includes the genre label and the corresponding lyrics in English. This dataset provides a rich resource for exploring and analyzing lyrical content across different music genres.

    Columns:

    Genre: Categorical variable representing the genre of the song. Possible values include:

    • Country
    • Rap
    • Metal
    • Rock
    • Pop

    Lyrics: Text data containing the lyrics of the song in English.

    Potential Use Cases:

    • Music Genre Classification: The dataset can be utilized to train machine learning models for accurately classifying songs into their respective genres based on lyrical content.
    • Lyrical Theme Exploration: By analyzing word frequencies and themes within each genre, researchers can gain insights into the common topics and motifs present in songs across different genres.
    • Music Generation: Using techniques such as natural language processing (NLP) and deep learning, the dataset can serve as a valuable resource for training models to generate new song lyrics. By learning the patterns and structures of lyrics within each genre, these models can produce original compositions that emulate the style and themes characteristic of country, rap, metal, rock, and pop music. This can facilitate the development of automated songwriting tools and creative AI applications in the field of music composition.
    • Music Recommendation Systems: Incorporating lyrical features from this dataset can enhance the effectiveness of music recommendation systems by considering not only musical styles but also lyrical content preferences.

    License:

    The dataset is provided under an open license, allowing for free distribution, modification, and commercial use, with attribution to the original sources encouraged.

    Ethical Considerations:

    • Privacy: All efforts were made to anonymize and aggregate the data to protect the privacy of individual artists and songwriters.
    • Bias: While efforts were made to include a diverse range of artists and songs within each genre, inherent biases in the selection of songs and lyrics may exist, which should be taken into account during analysis and interpretation.
    • Dataset Availability: The English Lyrics Dataset from Five Music Genres is available for download in a structured format
  19. h

    autonlp-data-song-lyrics-demo

    • huggingface.co
    Updated Nov 3, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julien Simon (2021). autonlp-data-song-lyrics-demo [Dataset]. https://huggingface.co/datasets/juliensimon/autonlp-data-song-lyrics-demo
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 3, 2021
    Authors
    Julien Simon
    Description

    AutoNLP Dataset for project: song-lyrics-demo

      Table of content
    

    Dataset Description Languages

    Dataset Structure Data Instances Data Fields Data Splits

      Dataset Descritpion
    

    This dataset has been automatically processed by AutoNLP for project song-lyrics-demo.

      Languages
    

    The BCP-47 code for the dataset's language is en.

      Dataset Structure
    
    
    
    
    
    
    
      Data Instances
    

    A sample from this dataset looks as follows: [ {… See the full description on the dataset page: https://huggingface.co/datasets/juliensimon/autonlp-data-song-lyrics-demo.

  20. h

    rap-lyrics-v2

    • huggingface.co
    Updated Nov 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nate Raw (2023). rap-lyrics-v2 [Dataset]. https://huggingface.co/datasets/nateraw/rap-lyrics-v2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 26, 2023
    Authors
    Nate Raw
    Description

    Dataset Card for "rap-lyrics-v2"

    More Information needed

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Eva Bot (2023). Spotify Lyrics Dataset [Dataset]. https://www.kaggle.com/datasets/evabot/spotify-lyrics-dataset
Organization logo

Spotify Lyrics Dataset

Lyrics for 10k and 100k (comming soon) tracks from spotify.

Explore at:
17 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 7, 2023
Dataset provided by
Kaggle
Authors
Eva Bot
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Dataset for 10k and 100k (comming soon) lyrics from spotify tracks crawled from both spotify and lyrics web pages.

Some lyrics contain some noise due to heavily scrapped content. The dataset contains information about:

  • Name of the song
  • Implied Artists: For some artists, no exploration was performed and, therefore, all we have is the artist Id. JSON format.
  • Explicit: Flag indicated if the lyrics are marked as explicit by Spotify.
  • Genres: Genres of the song by merging the genres of the collaborating artists with semicolon separator.
  • Lyrics.

Feel free to use it however you wish in non-commercial terms. This dataset does not aim to represent Spotify or any platform opinions or intervention and should only be considered for educational or personal usage.

For any concerns regarding the dataset, feel free to contact me in any of my media GitHub or mail.