100+ datasets found
  1. h

    genius-lyrics

    • huggingface.co
    Updated Aug 25, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bruno Kreiner (2017). genius-lyrics [Dataset]. https://huggingface.co/datasets/brunokreiner/genius-lyrics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 25, 2017
    Authors
    Bruno Kreiner
    Description

    Dataset Card for Dataset Name

      Dataset Description
    
    
    
    
    
      Dataset Summary
    

    This dataset consists of roughly 480k english (classified using nltk language classifier) lyrics with some more meta data. The id corresponds to the spotify id. The meta data was taken from the million playlist challenge @ AICrowd. The lyrics were crawled using "[song name] [artist name]" as string using the lyricsgenius python package which uses the genius.com search function. There is no… See the full description on the dataset page: https://huggingface.co/datasets/brunokreiner/genius-lyrics.

  2. m

    Music Dataset: Lyrics and Metadata from 1950 to 2019

    • data.mendeley.com
    Updated Aug 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luan Moura (2020). Music Dataset: Lyrics and Metadata from 1950 to 2019 [Dataset]. http://doi.org/10.17632/3t9vbwxgr5.2
    Explore at:
    Dataset updated
    Aug 24, 2020
    Authors
    Luan Moura
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides a list of lyrics from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc. We also provide some informations as lyrics which can be used to natural language processing.

    The audio data was scraped using Echo Nest® API integrated engine with spotipy Python’s package. The spotipy API permits the user to search for specific genres, artists,songs, release date, etc. To obtain the lyrics we used the Lyrics Genius® API as baseURL for requesting data based on the song title and artist name.

  3. h

    autonlp-data-song-lyrics

    • huggingface.co
    Updated Mar 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julien Simon (2022). autonlp-data-song-lyrics [Dataset]. https://huggingface.co/datasets/juliensimon/autonlp-data-song-lyrics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2022
    Authors
    Julien Simon
    Description

    AutoNLP Dataset for project: song-lyrics

      Table of content
    

    Dataset Description Languages

    Dataset Structure Data Instances Data Fields Data Splits

      Dataset Descritpion
    

    This dataset has been automatically processed by AutoNLP for project song-lyrics.

      Languages
    

    The BCP-47 code for the dataset's language is en.

      Dataset Structure
    
    
    
    
    
    
    
      Data Instances
    

    A sample from this dataset looks as follows: [ {… See the full description on the dataset page: https://huggingface.co/datasets/juliensimon/autonlp-data-song-lyrics.

  4. Data from: Song Interpretation Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Jan 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yixiao Zhang; Junyan Jiang; Gus Xia; Simon Dixon; Yixiao Zhang; Junyan Jiang; Gus Xia; Simon Dixon (2023). Song Interpretation Dataset [Dataset]. http://doi.org/10.5281/zenodo.7019124
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 9, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yixiao Zhang; Junyan Jiang; Gus Xia; Simon Dixon; Yixiao Zhang; Junyan Jiang; Gus Xia; Simon Dixon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Song Interpretation Dataset combines data from two sources: (1) music and metadata from the Music4All Dataset and (2) lyrics and user interpretations from SongMeanings.com. We design a music metadata-based matching algorithm that aligns matching items in the two datasets with each other. In the end, we successfully match 25.47% of the tracks in the Music4All Dataset.

    The dataset contains audio excerpts from 27,834 songs (30 seconds each, recorded at 44.1 kHz), the corresponding music metadata, about 490,000 user interpretations of the lyric text, and the number of votes given for each of these user interpretations. The average length of the interpretations is 97 words. Music in the dataset covers various genres, of which the top 5 are: Rock (11,626), Pop (6,071), Metal (2,516), Electronic (2,213) and Folk (1,760).

    For more details, please refer to our paper "Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model".

  5. h

    lyrics-dataset

    • huggingface.co
    Updated Sep 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Younes Matrab (2024). lyrics-dataset [Dataset]. https://huggingface.co/datasets/mrYou/lyrics-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2024
    Authors
    Younes Matrab
    Description

    mrYou/lyrics-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. o

    Drake's Song Lyrics Collection Dataset

    • opendatabay.com
    .undefined
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Drake's Song Lyrics Collection Dataset [Dataset]. https://www.opendatabay.com/data/web-social/c2e0fbed-6b28-4339-8d55-25a228943044
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Art & Digital Creations
    Description

    The Drake Lyrics Dataset is a distinctive collection of song lyrics and associated metadata from the acclaimed artist, Drake. This dataset was initially compiled from Genius.com as a fundamental component of a larger data science initiative titled "Ye-Spirations". The overarching goal of "Ye-Spirations" is to serve as a motivational poster generator utilising hip-hop lyrics. Expanding upon an initial prototype that featured Kanye West's lyrics, this dataset is purposed for use in similar projects or for future reference, making it an invaluable resource for individuals engaged in Natural Language Processing (NLP) projects, text analysis, or creative applications involving music and popular culture.

    Columns

    The dataset is structured across three distinct files, with the .json and .csv formats sharing identical column structures: * lyrics: Contains the actual lyrical content of the songs. * song title: The specific title of each song. * album title: The title of the album to which each song belongs. * url: The direct URL linking to the song's page on Genius.com. * view count: Records the view count for each song at the point of data collection. The .txt file exclusively contains only the lyrical content.

    Distribution

    This dataset comprises 3 files: a .txt file (lyrics only), a .json file, and a .csv file, with the latter two containing comprehensive details including lyrics, song title, album title, URL, and view count. The data quality is rated exceptionally high at 5 out of 5. The dataset's current iteration is Version 1.0. While specific numbers for rows or records are not explicitly provided in the sources, the dataset's structured format ensures compatibility with various data analysis tools.

    Usage

    This dataset is ideally suited for a diverse array of applications and use cases: * Building machine learning models focused on Natural Language Processing (NLP) for in-depth text analysis of song lyrics. * Developing creative applications, such as the aforementioned motivational poster generator using hip-hop lyrics. * Exploring intricate patterns and insights within the realm of music lyrics and broader popular culture. * Practising and refining data extraction and processing techniques from web-based sources like Genius.com. * Supporting general data science projects that involve the analysis of textual data.

    Coverage

    This dataset's primary focus is exclusively on Drake's song lyrics and pertinent associated information. The data was meticulously gathered from Genius.com, recognised as a leading source for lyrics and annotations. Geographically, the dataset offers GLOBAL coverage, indicating its relevance transcends specific regional boundaries. A precise time range for the lyrics or view counts is not detailed beyond "at this time," suggesting the view counts represent a snapshot from the point of data collection.

    License

    This is identified as a Free Dataset and is available through the Opendatabay platform's "FREE DATASET LIBRARY". A specific license URL is not detailed within the provided source material for this dataset.

    Who Can Use It

    This dataset serves as a valuable resource for a broad spectrum of users: * Data Scientists and NLP Practitioners: Ideal for developing, training, and testing advanced text analysis models and algorithms. * Music Enthusiasts and Researchers: Those with an keen interest in the lyrical content, evolution, and patterns embedded within popular music. * Students and Developers: Seeking a practical and engaging dataset to build and hone personal projects, such as text generators or analytical tools. * Creative Coders: For applications that involve the generation of new content derived from existing text corpuses.

    Dataset Name Suggestions

    • Drake Lyrics Dataset
    • Drake's Song Lyrics Collection
    • Hip Hop Lyrics by Drake
    • Genius.com Drake Lyrics Data
    • Drake's Lyrical Archive

    Attributes

    Original Data Source: Drake Lyrics

  7. Indian Hindi songs lyrics dataset

    • kaggle.com
    Updated Aug 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MakarandVelankar (2020). Indian Hindi songs lyrics dataset [Dataset]. https://www.kaggle.com/datasets/makvel/indian-hindi-songs-lyrics-dataset/versions/2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 24, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    MakarandVelankar
    Description

    Context

    Context Analysis of Hindi lyrics using Natural Language Processing techniques for Hindi language(Devanagari Script). The algorithms developed will be useful to summarize Hindi literary work and context-based classification.

    Content

    People willing to work on a project related to the Devanagari script find it difficult to get hold of a suitable data set. After an extensive search, as per our observations, not much work has been done with the Devanagari script in the field of natural language programming.

    Acknowledgements

    Rachita Kotian, Chaitrali Mote and Anuja Patil were instrumental in preparing the data set.

    Inspiration

    People have always found songs/music significant in their lives. Lyrics can be a source of information to understand music. Lyrics provide high-level information about a song. Aim is contextual analysis on Hindi lyrics and to automate this process.

  8. Z

    Data from: LFM2b Lyrics Descriptor Analyses

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefan Brandl (2024). LFM2b Lyrics Descriptor Analyses [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7740044
    Explore at:
    Dataset updated
    Apr 15, 2024
    Dataset provided by
    Markus Schedl
    Emilia Parada-Cabaleiro
    Stefan Brandl
    Marcin Skowron
    Eva Zangerle
    Elisabeth Lex
    Maximilian Mayerl
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LFM2b Lyrics Descriptor Analyses

    This dataset provides lyrics descriptors for 580,000 songs, including lexical, structural, diversity-related, readability, rhyme, structural, and emotional descriptors. This dataset was the basis of an analysis of the evolution of song lyrics over the course of five decades and five genres (pop, rock, rap, country, and R&B).

    Dataset Generation As a basis for the dataset, we relied on the LFM-2b dataset (http://www.cp.jku.at/datasets/LFM-2b) of listening events by Last.fm. It contains more than two billion listening records, and more than fifty million songs by more than five million artists. We enrich the dataset with information about songs' release year, genre, lyrics, and popularity information. For quantifying the popularity of tracks and lyrics, we distinguish between the listening count, i.e., the number of listening events in the LFM-2b dataset, and lyrics view count, i.e., the number of views of lyrics on the Genius platform (https://genius.com). Release years, genre information, and lyrics are obtained from the Genius platform. Genres are expressed by one primary genre. We used https://polyglot.readthedocs.io/ to automatically infer the language of the lyrics and considered only English lyrics. Adopting this procedure, we ultimately obtain complete information for 582,759 songs.

    Data and Features We provide the full dataset, containing features for 582,759 songs (full_dataset.json.gz). For each song, the dataset contains track title and artist information, genre, popularity, and release date information, and a wide variety of lexical, structural, diversity-related, readability, rhyme, structural, and emotional descriptors.

    For further information on the semantics of the features, we provided a short overview in the following. Please check the implementation of the feature extractor at https://github.com/MaximilianMayerl/CorrelatesOfSongLyrics/ for further details.

    • Track and artist
    • Genre
    • Popularity descriptors:
      • Lyrics view count
      • Last.fm playcount
    • Lexical descriptors:
      • Line counts: Total number of lines, blank lines, unique lines, ratio of blank and repeated lines
      • Token counts: Number of tokens, characters, repeated token ratio, unique tokens per line, and avg. tokens per line
      • Character counts: Number of \texttt{[!?.,:;"-()]} (total amount of these characters and individual counts per character) and digits, ratio of punctuation and digits
      • Token length: Average length of tokens
      • n-gram ratios: Ratio of unique bigrams and trigrams
      • Legomenon ratios: Ratio of hapax legomena, dis legomena and tris legomena
      • Parts of speech: Frequency of adjectives, adverbs, nouns, pronouns, verbs
      • Past tense: Ratio of verbs in past tense to other verbs
      • Stop words: Number and ratio of stop words, stop words per line
      • Uncommon words: Number of uncommon words (i.e., words not contained WordNet)
    • Diversity descriptors
      • Compression ratio: Ratio of the size of zlib compressed lyrics vs the original lyrics
      • Diversity measures: Measure of Textual Lexical Diversity (MTLD), Herdan's C, Summer's S, Dugast's U^2, and Maas' a^2
    • Readability Descriptors
      • Readability formulas: Flesch Reading Ease, Flesch Kincaid Grade, SMOG (Simple Measure of Gobbledygook), Automated Readability Index, Coleman Liau Index, Dale Chall Readability Score, Linsear Write Formula, Gunning Fog, Fernandez Huerta, Szigriszt Pazos and Gutierrez Polini
      • Difficult words: Number of difficult words (three or more syllables)
    • Rhyme Descriptors
      • Rhyme structures: Numbers of couplets, clerihews, alternating rhymes and nested rhymes
      • Rhyme words: Number of unique rhyming words, percentage of rhyming lines in the lyrics
      • Alliterations: Number of alliterations of length two, three, and four or more
    • Structural Descriptors
      • Element counts: Number of sections and verses
      • Distribution: Relation between the number of verses vs. sections and number of choruses vs sections
      • Title occurrences: Number of times the song's title appears
      • Pattern: Verse and chorus alternating, two verses and at least one chorus, two choruses and at least one verse
      • Start: Starts with chorus (binary attribute)
      • Ending: Ends with two chorus repetitions (binary attribute)
    • Emotional/Affective Descriptors
      • Sentiment scores: Positivity and negativity scores via AFINN, the sentiment lexicon by Bing Liu et al., the MPQA opinion corpus, the sentiment140 dataset, and the SentiWordNetlexicon
      • NRC: Emotion scores according to the NRC affect intensity lexicon
      • LIWC: Descriptors provided by LIWC
      • Happiness: Happiness score according to labMT
  9. MUSDB18 lyrics extension

    • zenodo.org
    • data.niaid.nih.gov
    text/x-python, txt +1
    Updated Jun 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kilian Schulze-Forster; Clement S. J. Doire; Gaël Richard; Roland Badeau; Kilian Schulze-Forster; Clement S. J. Doire; Gaël Richard; Roland Badeau (2021). MUSDB18 lyrics extension [Dataset]. http://doi.org/10.5281/zenodo.3989267
    Explore at:
    zip, txt, text/x-pythonAvailable download formats
    Dataset updated
    Jun 25, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kilian Schulze-Forster; Clement S. J. Doire; Gaël Richard; Roland Badeau; Kilian Schulze-Forster; Clement S. J. Doire; Gaël Richard; Roland Badeau
    Description

    This is a set of annotated lyrics transcripts for songs belonging to the MUSDB18 dataset. The set comprises lyrics of all songs which have English lyrics, i.e. 96 out of 100 songs for the training set and 45 out of 50 songs for the test set. MUSDB18 is a dataset for music source separation and provides the following separated tracks for each song: vocals, bass, drums, other (rest of the accompaniment), mixture.

    The lyrics transcripts, together with the audio files of MUSDB18, are a valuable resource for research on tasks such as text-informed singing voice separation, automatic lyrics alignment, automatic lyrics transcription, and singing voice synthesis and analysis. The provided data should be used for research purposes only.

    Disclaimer

    The lyrics were transcribed manually by the authors who are not native English speakers. It is likely that the transcriptions are not 100% correct. The composers of the songs are the copyright holders of the original lyrics.

    The songs were divided into sections of lengths between 3 and 12 seconds. The priority when choosing the section boundaries was that they correspond to natural pauses and do not cut vocal sounds. The sections do not necessarily correspond to lyrically meaningful lines. Most of the sections do not overlap, some have an overlap of 1 second. In some difficult cases, e.g. shouting in metal songs or mumbled words, where the words are barely intelligible, we made an effort to make the transcriptions as accurate as possible phonetically and did not prioritize semantically meaningful phrases.

    Citation

    The dataset was built for the paper

    Schulze-Forster, K., Doire, C., Richard, G., & Badeau, R. "Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation." IEEE/ACM Transactions on Audio, Speech and Language Processing (2021).

    If you use the data for your research, please cite the corresponding paper:

    @article{schulze2021phoneme,
     title={Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation},
     author={Schulze-Forster, Kilian and Doire, Clement and Richard, Ga{\"e}l and Badeau, Roland},
     journal={IEEE/ACM Transactions on Audio, Speech and Language Processing},
     year={2021},
     publisher={IEEE}
    }

    Annotations

    For each section, the annotations comprise: the start and end time, the corresponding lyrics, and a label indicating one of the following four properties:

    (a) only one person is singing
    (b) several singers are pronouncing the same phonemes at the same time (possibly singing different notes)
    (c) several singers are pronouncing different phonemes simultaneously (possibly singing different notes)
    (d) no singing

    Segments that are labelled with the property (b) or (c) do not necessarily have this property over the whole segment duration. As soon as somewhere in a segment several singers are present, label (b) was assigned; as soon as they sung different phonemes somewhere at the same time, label (c) was assigned. Property (a) and (d) are valid for the entire segment. Furthermore, segments with property (c) can contain either some (lead) singer(s) singing some words in the presence of background singers singing long vowels such as ’ah’ or ’oh’ or they can contain multiple singers who sing different words at the same time. In the latter case, it was very difficult to recognise the sung words and to decide in which order to transcribe words or phrases sung simultaneously. These segments are marked with a '*' and it is recommended to reject them for most use cases.

    The annotations have the following format:

    Example:
    00:18 00:23 a i know the reasons why --> starts at 18 sec., ends at 23 sec., vocals type (a), lyrics: i know the reasons why

    The Python script musdb_lyrics_cut_audio.py is provided to automatically cut the MUSDB songs into the annotated segments. The script requires the musdb and soundfile package. The user needs to update the paths and select the desired sources and vocals types in lines 19-26. The script saves wav-files for each selected source for each annotated segment as well as the corresponding lyrics as txt-file. The MUSDB training partition is divided into a training and validation set. The tracks for the validation set can be changed below line 29.

    The file words_and_phonemes.txt contains a list of all words and their decomposition into phonemes. The phonemes are written in 2-letter ARPABET style and obtained with the LOGIOS Lexicon Tool.

    License

    The data is licensed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, read the provided LICENSE.txt file, visit https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

    The creators of MUSDB18 lyrics extension and their corresponding affiliation institutes are not liable for, and expressly exclude, all liability for loss or damage however and whenever caused to anyone by any use of MUSDB18 lyrics extension or any part of it.

    Acknowledgment

    The authors would like to thank Olumide Okubadejo and Sinead Namur for their help with transcribing and correcting part of the lyrics.

  10. o

    Rap Lyrics Dataset

    • opendatabay.com
    .undefined
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Rap Lyrics Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/ea749641-0e75-4202-8c97-65dc7552e51a
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 17, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Entertainment & Media Consumption
    Description

    This dataset was compiled by me for a personal project. It contains lyrics from 11 different artists including: Drake, J. Cole, Kendrick Lamar, Eminem, Nas, Skepta, Rapsody, Nicki Minaj, Dave, 2Pac, and Future.

    All data was compiled using Spotify's API and Genius' API. FEATURES track_name: the name of each track artist: the name of each artist raw_lyrics: raw text of lyrics scraped from Genius website artist_verses: text extracted from raw_lyrics — verses performed by each artist only NOTE: Some entires in raw_lyrics may contain a different formatting structure to others, so text consistency will vary.

    What can this dataset be used for? Text analysis Text pre-processing Text EDA Text classification

    License

    CC0

    Original Data Source: Rap Lyrics Dataset

  11. h

    artist-lyrics-dataset

    • huggingface.co
    Updated Apr 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Connor Homayouni (2024). artist-lyrics-dataset [Dataset]. https://huggingface.co/datasets/SpartanCinder/artist-lyrics-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 22, 2024
    Authors
    Connor Homayouni
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    SpartanCinder/artist-lyrics-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. E

    Arab-Andalusian music lyrics dataset

    • live.european-language-grid.eu
    • data.niaid.nih.gov
    • +1more
    json
    Updated Oct 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Arab-Andalusian music lyrics dataset [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7554
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Oct 28, 2023
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The dataset contains lyrics for the songs in the Arab-Anadalusian music collection curated within the CompMusic project, that belong to the nawbas "Isbahan", "Maya”, “Raml Maya”, “Gharibat al-Husayn”, “Hijaz Kabir”, “Hijaz Msharqi”, “Istihlal”, “Rasd”, and ”Rasd Dayl”.

    Lyrics are stored in two formats: as Tab Separated Values (TSV) files and as JSON files.

    Each file is identified by its MusicBrainz recording ID (MBID).

    The lyrics are stored both in their original Arabic script (folder 'original') and a romanized/transliterated version (folder 'transliterated') using the American Library of Congress (ALA-LC standard).

    Corresponding audio files are available from the Arab-Andalusian music corpus, as well as the Internet Archive URL included in the metadata file ('metadata.csv').

    For more information about the exact format and contents of the dataset, please consult the README provided in the archive.

    For more information, please refer to http://compmusic.upf.edu/corpora.

  13. Kanye West Lyrics Dataset

    • kaggle.com
    Updated Apr 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Gottlieb (2020). Kanye West Lyrics Dataset [Dataset]. https://www.kaggle.com/convolutionalnn/kanye-west-lyrics-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 27, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    David Gottlieb
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This is a text file consisting of Kanye West's lyrics from 9 of his studio albums (The College Dropout, Late Registration, Graduation, 808s & Heartbreak, My Beautiful Dark Twisted Fantasy, Ye, Yeezus, The Life of Pablo, and Jesus is King). This is made for a text generation project, where you can easily enter this text file as data, without needing to parse through a csv file.

    Content

    Each song is separated by two spaces; otherwise, this is one, uninterrupted document of lyrics.

    Acknowledgements

    Lyrics found on Genius: https://genius.com/artists/Kanye-west

    Inspiration

    I wanted to generate my own Kanye songs :)

  14. n

    Music Dataset: Lyrics and Metadata from 1950 to 2019

    • narcis.nl
    • data.mendeley.com
    Updated Oct 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moura, L (via Mendeley Data) (2020). Music Dataset: Lyrics and Metadata from 1950 to 2019 [Dataset]. http://doi.org/10.17632/3t9vbwxgr5.3
    Explore at:
    Dataset updated
    Oct 23, 2020
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Moura, L (via Mendeley Data)
    Description

    This dataset was studied on Temporal Analysis and Visualisation of Music paper, in the following link:

           https://sol.sbc.org.br/index.php/eniac/article/view/12155
    

    This dataset provides a list of lyrics from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc. We also provide some informations as lyrics which can be used to natural language processing.

    The audio data was scraped using Echo Nest® API integrated engine with spotipy Python’s package. The spotipy API permits the user to search for specific genres, artists,songs, release date, etc. To obtain the lyrics we used the Lyrics Genius® API as baseURL for requesting data based on the song title and artist name.

  15. P

    Jamendo Lyrics Dataset

    • paperswithcode.com
    Updated Oct 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Stoller; Simon Durand; Sebastian Ewert (2019). Jamendo Lyrics Dataset [Dataset]. https://paperswithcode.com/dataset/jamendo-lyrics
    Explore at:
    Dataset updated
    Oct 18, 2019
    Authors
    Daniel Stoller; Simon Durand; Sebastian Ewert
    Description

    Dataset for lyrics alignment and transcription evaluation. It contains 20 music pieces under CC license from the Jamendo website along with their lyrics, with:

    Manual annotations indicating the start time of each word in the audio file Predictions of start and end times for each word from both of the models presented in the paper

  16. P

    Jam-ALT Dataset

    • paperswithcode.com
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ondřej Cífka; Constantinos Dimitriou; Cheng-i Wang; Hendrik Schreiber; Luke Miner; Fabian-Robert Stöter (2023). Jam-ALT Dataset [Dataset]. https://paperswithcode.com/dataset/jam-alt
    Explore at:
    Dataset updated
    Nov 22, 2023
    Authors
    Ondřej Cífka; Constantinos Dimitriou; Cheng-i Wang; Hendrik Schreiber; Luke Miner; Fabian-Robert Stöter
    Description

    JamALT is a revision of the JamendoLyrics dataset (80 songs in 4 languages), adapted for use as an automatic lyrics transcription (ALT) benchmark.

    The lyrics have been revised according to the newly compiled annotation guide, which include rules about spelling, punctuation, and formatting. The audio is identical to the JamendoLyrics dataset. However, only 79 songs are included, as one of the 20 French songs has been removed due to concerns about potentially harmful content.

  17. h

    rap-lyrics-v2

    • huggingface.co
    Updated Nov 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nate Raw (2023). rap-lyrics-v2 [Dataset]. https://huggingface.co/datasets/nateraw/rap-lyrics-v2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 26, 2023
    Authors
    Nate Raw
    Description

    Dataset Card for "rap-lyrics-v2"

    More Information needed

  18. Reasons for getting lyrics to songs worldwide 2017

    • statista.com
    Updated May 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Reasons for getting lyrics to songs worldwide 2017 [Dataset]. https://www.statista.com/statistics/799899/music-song-lyrics-reasons/
    Explore at:
    Dataset updated
    May 29, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 2017
    Area covered
    Worldwide
    Description

    The statistic shows the most common reasons why music consumers get lyrics to songs worldwide as of November 2017. During the survey, 72 percent of respondents stated that they got the lyrics to songs in order to be able to sing along.

  19. o

    Collection of Diverse Songs with Lyrics

    • opendatabay.com
    .csv
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Collection of Diverse Songs with Lyrics [Dataset]. https://www.opendatabay.com/data/ai-ml/55b601f8-0e16-4583-84b2-fe195ec79ae9
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jun 20, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Entertainment & Media Consumption
    Description

    This dataset comprises a collection of unique and popular songs from various genres released over the past decades. With a total of [insert number] entries, each song is accompanied by details such as its title, artist, album, release year, genre, and a snippet of its lyrics.

    The dataset offers a rich variety of music spanning multiple genres including pop, rock, hip hop, R&B, indie, EDM, and more. From timeless classics to contemporary hits, it provides a comprehensive selection of songs that have captivated audiences around the world.

    Researchers, music enthusiasts, and data analysts can leverage this dataset for a myriad of purposes including:

    Exploratory data analysis to uncover trends and patterns in music genres over time. Building recommendation systems for personalized music suggestions based on user preferences. Sentiment analysis of song lyrics to understand emotional themes and sentiments associated with different genres. Studying the evolution of popular music and its cultural impact over the years. Training machine learning models for music classification, genre prediction, and more. Whether you're a data scientist, a music aficionado, or simply curious about the world of music, this dataset provides a valuable resource for exploring the diverse landscape of popular songs. Download it now and embark on a journey through the melodies that have shaped our musical landscape.

    License

    CC0

    Original Data Source: Collection of Diverse Songs with Lyrics

  20. Rap Lyrics

    • kaggle.com
    Updated Sep 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jamie Welsh (2023). Rap Lyrics [Dataset]. https://www.kaggle.com/datasets/jamiewelsh2/rap-lyrics/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 21, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jamie Welsh
    Description

    Rap lyrics were obtained for 100 of the most influential rappers of all time (see https://beats-rhymes-lists.com/lists/best-rappers-of-all-time/) via web scraping. The data was then augmented into an easy to understand format using pandas. Each row corresponds to an individual lyric in a song and the song name and artist name appear as columns as well.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bruno Kreiner (2017). genius-lyrics [Dataset]. https://huggingface.co/datasets/brunokreiner/genius-lyrics

genius-lyrics

brunokreiner/genius-lyrics

Explore at:
207 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 25, 2017
Authors
Bruno Kreiner
Description

Dataset Card for Dataset Name

  Dataset Description





  Dataset Summary

This dataset consists of roughly 480k english (classified using nltk language classifier) lyrics with some more meta data. The id corresponds to the spotify id. The meta data was taken from the million playlist challenge @ AICrowd. The lyrics were crawled using "[song name] [artist name]" as string using the lyricsgenius python package which uses the genius.com search function. There is no… See the full description on the dataset page: https://huggingface.co/datasets/brunokreiner/genius-lyrics.

Search
Clear search
Close search
Google apps
Main menu