100+ datasets found

h
genius-lyrics
huggingface.co
Updated Aug 25, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bruno Kreiner (2017). genius-lyrics [Dataset]. https://huggingface.co/datasets/brunokreiner/genius-lyrics
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 25, 2017
Authors
Bruno Kreiner
Description
Dataset Card for Dataset Name

Dataset Description Dataset Summary

This dataset consists of roughly 480k english (classified using nltk language classifier) lyrics with some more meta data. The id corresponds to the spotify id. The meta data was taken from the million playlist challenge @ AICrowd. The lyrics were crawled using "[song name] [artist name]" as string using the lyricsgenius python package which uses the genius.com search function. There is no… See the full description on the dataset page: https://huggingface.co/datasets/brunokreiner/genius-lyrics.
m
Music Dataset: Lyrics and Metadata from 1950 to 2019
data.mendeley.com
Updated Aug 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luan Moura (2020). Music Dataset: Lyrics and Metadata from 1950 to 2019 [Dataset]. http://doi.org/10.17632/3t9vbwxgr5.2
Explore at:
Unique identifier
https://doi.org/10.17632/3t9vbwxgr5.2
Dataset updated
Aug 24, 2020
Authors
Luan Moura
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset provides a list of lyrics from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc. We also provide some informations as lyrics which can be used to natural language processing.

The audio data was scraped using Echo Nest® API integrated engine with spotipy Python’s package. The spotipy API permits the user to search for specific genres, artists,songs, release date, etc. To obtain the lyrics we used the Lyrics Genius® API as baseURL for requesting data based on the song title and artist name.
h
autonlp-data-song-lyrics
huggingface.co
Updated Mar 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julien Simon (2022). autonlp-data-song-lyrics [Dataset]. https://huggingface.co/datasets/juliensimon/autonlp-data-song-lyrics
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 1, 2022
Authors
Julien Simon
Description
AutoNLP Dataset for project: song-lyrics

Table of content

Dataset Description Languages

Dataset Structure Data Instances Data Fields Data Splits

Dataset Descritpion

This dataset has been automatically processed by AutoNLP for project song-lyrics.

Languages

The BCP-47 code for the dataset's language is en.

Dataset Structure Data Instances

A sample from this dataset looks as follows: [ {… See the full description on the dataset page: https://huggingface.co/datasets/juliensimon/autonlp-data-song-lyrics.
Data from: Song Interpretation Dataset
zenodo.org
data.niaid.nih.gov
bin
Updated Jan 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yixiao Zhang; Junyan Jiang; Gus Xia; Simon Dixon; Yixiao Zhang; Junyan Jiang; Gus Xia; Simon Dixon (2023). Song Interpretation Dataset [Dataset]. http://doi.org/10.5281/zenodo.7019124
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7019124
Dataset updated
Jan 9, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yixiao Zhang; Junyan Jiang; Gus Xia; Simon Dixon; Yixiao Zhang; Junyan Jiang; Gus Xia; Simon Dixon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Song Interpretation Dataset combines data from two sources: (1) music and metadata from the Music4All Dataset and (2) lyrics and user interpretations from SongMeanings.com. We design a music metadata-based matching algorithm that aligns matching items in the two datasets with each other. In the end, we successfully match 25.47% of the tracks in the Music4All Dataset.

The dataset contains audio excerpts from 27,834 songs (30 seconds each, recorded at 44.1 kHz), the corresponding music metadata, about 490,000 user interpretations of the lyric text, and the number of votes given for each of these user interpretations. The average length of the interpretations is 97 words. Music in the dataset covers various genres, of which the top 5 are: Rock (11,626), Pop (6,071), Metal (2,516), Electronic (2,213) and Folk (1,760).

For more details, please refer to our paper "Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model".
h
turkish-lyric-to-genre
huggingface.co
Updated Aug 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Efe (2023). turkish-lyric-to-genre [Dataset]. https://huggingface.co/datasets/Veucci/turkish-lyric-to-genre
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 10, 2023
Authors
Efe
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Song Lyrics Dataset

Description

This dataset contains a collection of song lyrics from various artists and genres in Turkish. It is intended to be used for research, analysis, and other non-commercial purposes.

Dataset Details

The dataset is organized in a tabular format with the following columns:

Genre (int): Genre of the lyrics

Lyrics (str): The lyrics of the song.

Pop: 1085 rows

Rock: 765 rows

Hip-Hop: 969 rows

Arabesk: 353 rows

Usage… See the full description on the dataset page: https://huggingface.co/datasets/Veucci/turkish-lyric-to-genre.
Z
LFM2b Lyrics Descriptor Analyses
data.niaid.nih.gov
zenodo.org
Updated Apr 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefan Brandl (2024). LFM2b Lyrics Descriptor Analyses [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7740044
Explore at:
Dataset updated
Apr 15, 2024
Dataset provided by
Elisabeth Lex
Emilia Parada-Cabaleiro
Markus Schedl
Marcin Skowron
Maximilian Mayerl
Eva Zangerle
Stefan Brandl
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LFM2b Lyrics Descriptor Analyses

This dataset provides lyrics descriptors for 580,000 songs, including lexical, structural, diversity-related, readability, rhyme, structural, and emotional descriptors. This dataset was the basis of an analysis of the evolution of song lyrics over the course of five decades and five genres (pop, rock, rap, country, and R&B).

Dataset Generation As a basis for the dataset, we relied on the LFM-2b dataset (http://www.cp.jku.at/datasets/LFM-2b) of listening events by Last.fm. It contains more than two billion listening records, and more than fifty million songs by more than five million artists. We enrich the dataset with information about songs' release year, genre, lyrics, and popularity information. For quantifying the popularity of tracks and lyrics, we distinguish between the listening count, i.e., the number of listening events in the LFM-2b dataset, and lyrics view count, i.e., the number of views of lyrics on the Genius platform (https://genius.com). Release years, genre information, and lyrics are obtained from the Genius platform. Genres are expressed by one primary genre. We used https://polyglot.readthedocs.io/ to automatically infer the language of the lyrics and considered only English lyrics. Adopting this procedure, we ultimately obtain complete information for 582,759 songs.

Data and Features We provide the full dataset, containing features for 582,759 songs (full_dataset.json.gz). For each song, the dataset contains track title and artist information, genre, popularity, and release date information, and a wide variety of lexical, structural, diversity-related, readability, rhyme, structural, and emotional descriptors.

For further information on the semantics of the features, we provided a short overview in the following. Please check the implementation of the feature extractor at https://github.com/MaximilianMayerl/CorrelatesOfSongLyrics/ for further details.

Track and artist

Genre

Popularity descriptors:

Lyrics view count

Last.fm playcount

Lexical descriptors:

Line counts: Total number of lines, blank lines, unique lines, ratio of blank and repeated lines

Token counts: Number of tokens, characters, repeated token ratio, unique tokens per line, and avg. tokens per line

Character counts: Number of \texttt{[!?.,:;"-()]} (total amount of these characters and individual counts per character) and digits, ratio of punctuation and digits

Token length: Average length of tokens

n-gram ratios: Ratio of unique bigrams and trigrams

Legomenon ratios: Ratio of hapax legomena, dis legomena and tris legomena

Parts of speech: Frequency of adjectives, adverbs, nouns, pronouns, verbs

Past tense: Ratio of verbs in past tense to other verbs

Stop words: Number and ratio of stop words, stop words per line

Uncommon words: Number of uncommon words (i.e., words not contained WordNet)

Diversity descriptors

Compression ratio: Ratio of the size of zlib compressed lyrics vs the original lyrics

Diversity measures: Measure of Textual Lexical Diversity (MTLD), Herdan's C, Summer's S, Dugast's U^2, and Maas' a^2

Readability Descriptors

Readability formulas: Flesch Reading Ease, Flesch Kincaid Grade, SMOG (Simple Measure of Gobbledygook), Automated Readability Index, Coleman Liau Index, Dale Chall Readability Score, Linsear Write Formula, Gunning Fog, Fernandez Huerta, Szigriszt Pazos and Gutierrez Polini

Difficult words: Number of difficult words (three or more syllables)

Rhyme Descriptors

Rhyme structures: Numbers of couplets, clerihews, alternating rhymes and nested rhymes

Rhyme words: Number of unique rhyming words, percentage of rhyming lines in the lyrics

Alliterations: Number of alliterations of length two, three, and four or more

Structural Descriptors

Element counts: Number of sections and verses

Distribution: Relation between the number of verses vs. sections and number of choruses vs sections

Title occurrences: Number of times the song's title appears

Pattern: Verse and chorus alternating, two verses and at least one chorus, two choruses and at least one verse

Start: Starts with chorus (binary attribute)

Ending: Ends with two chorus repetitions (binary attribute)

Emotional/Affective Descriptors

Sentiment scores: Positivity and negativity scores via AFINN, the sentiment lexicon by Bing Liu et al., the MPQA opinion corpus, the sentiment140 dataset, and the SentiWordNetlexicon

NRC: Emotion scores according to the NRC affect intensity lexicon

LIWC: Descriptors provided by LIWC

Happiness: Happiness score according to labMT
Lyrics for Billboards Top 100 Songs 1946 - 2022
kaggle.com
Updated Mar 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rhaam Rozenberg (2024). Lyrics for Billboards Top 100 Songs 1946 - 2022 [Dataset]. https://www.kaggle.com/datasets/rhaamrozenberg/billboards-top-100-song-1946-to-2022-lyrics
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 21, 2024
Dataset provided by
Kaggle
Authors
Rhaam Rozenberg
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
A dataset with Billboards top 100 song from 1946 to 2022 lyrics. This dataset is about ~85% complete. The dataset contains information like song name, artist name and, of course, the song lyrics. Note: not every year has 100 songs.
h
lyrics-dataset
huggingface.co
Updated Sep 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Younes Matrab (2024). lyrics-dataset [Dataset]. https://huggingface.co/datasets/mrYou/lyrics-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 12, 2024
Authors
Younes Matrab
Description
mrYou/lyrics-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
MUSDB18 lyrics extension
zenodo.org
data.niaid.nih.gov
text/x-python, txt +1
Updated Jun 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kilian Schulze-Forster; Clement S. J. Doire; Gaël Richard; Roland Badeau; Kilian Schulze-Forster; Clement S. J. Doire; Gaël Richard; Roland Badeau (2021). MUSDB18 lyrics extension [Dataset]. http://doi.org/10.5281/zenodo.3989267
Explore at:
zip, txt, text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3989267
Dataset updated
Jun 25, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kilian Schulze-Forster; Clement S. J. Doire; Gaël Richard; Roland Badeau; Kilian Schulze-Forster; Clement S. J. Doire; Gaël Richard; Roland Badeau
Description
This is a set of annotated lyrics transcripts for songs belonging to the MUSDB18 dataset. The set comprises lyrics of all songs which have English lyrics, i.e. 96 out of 100 songs for the training set and 45 out of 50 songs for the test set. MUSDB18 is a dataset for music source separation and provides the following separated tracks for each song: vocals, bass, drums, other (rest of the accompaniment), mixture.

The lyrics transcripts, together with the audio files of MUSDB18, are a valuable resource for research on tasks such as text-informed singing voice separation, automatic lyrics alignment, automatic lyrics transcription, and singing voice synthesis and analysis. The provided data should be used for research purposes only.

Disclaimer

The lyrics were transcribed manually by the authors who are not native English speakers. It is likely that the transcriptions are not 100% correct. The composers of the songs are the copyright holders of the original lyrics.

The songs were divided into sections of lengths between 3 and 12 seconds. The priority when choosing the section boundaries was that they correspond to natural pauses and do not cut vocal sounds. The sections do not necessarily correspond to lyrically meaningful lines. Most of the sections do not overlap, some have an overlap of 1 second. In some difficult cases, e.g. shouting in metal songs or mumbled words, where the words are barely intelligible, we made an effort to make the transcriptions as accurate as possible phonetically and did not prioritize semantically meaningful phrases.

Citation

The dataset was built for the paper

Schulze-Forster, K., Doire, C., Richard, G., & Badeau, R. "Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation." IEEE/ACM Transactions on Audio, Speech and Language Processing (2021).

If you use the data for your research, please cite the corresponding paper:

@article{schulze2021phoneme, title={Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation}, author={Schulze-Forster, Kilian and Doire, Clement and Richard, Ga{\"e}l and Badeau, Roland}, journal={IEEE/ACM Transactions on Audio, Speech and Language Processing}, year={2021}, publisher={IEEE} }

Annotations

For each section, the annotations comprise: the start and end time, the corresponding lyrics, and a label indicating one of the following four properties:

(a) only one person is singing
(b) several singers are pronouncing the same phonemes at the same time (possibly singing different notes)
(c) several singers are pronouncing different phonemes simultaneously (possibly singing different notes)
(d) no singing

Segments that are labelled with the property (b) or (c) do not necessarily have this property over the whole segment duration. As soon as somewhere in a segment several singers are present, label (b) was assigned; as soon as they sung different phonemes somewhere at the same time, label (c) was assigned. Property (a) and (d) are valid for the entire segment. Furthermore, segments with property (c) can contain either some (lead) singer(s) singing some words in the presence of background singers singing long vowels such as ’ah’ or ’oh’ or they can contain multiple singers who sing different words at the same time. In the latter case, it was very difficult to recognise the sung words and to decide in which order to transcribe words or phrases sung simultaneously. These segments are marked with a '*' and it is recommended to reject them for most use cases.

The annotations have the following format:

Example:
00:18 00:23 a i know the reasons why --> starts at 18 sec., ends at 23 sec., vocals type (a), lyrics: i know the reasons why

The Python script musdb_lyrics_cut_audio.py is provided to automatically cut the MUSDB songs into the annotated segments. The script requires the musdb and soundfile package. The user needs to update the paths and select the desired sources and vocals types in lines 19-26. The script saves wav-files for each selected source for each annotated segment as well as the corresponding lyrics as txt-file. The MUSDB training partition is divided into a training and validation set. The tracks for the validation set can be changed below line 29.

The file words_and_phonemes.txt contains a list of all words and their decomposition into phonemes. The phonemes are written in 2-letter ARPABET style and obtained with the LOGIOS Lexicon Tool.

License

The data is licensed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, read the provided LICENSE.txt file, visit https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

The creators of MUSDB18 lyrics extension and their corresponding affiliation institutes are not liable for, and expressly exclude, all liability for loss or damage however and whenever caused to anyone by any use of MUSDB18 lyrics extension or any part of it.

Acknowledgment

The authors would like to thank Olumide Okubadejo and Sinead Namur for their help with transcribing and correcting part of the lyrics.
Lyrics Dataset of French Rapper
kaggle.com
Updated Jan 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carl-Dimitri FOGUE (2025). Lyrics Dataset of French Rapper [Dataset]. https://www.kaggle.com/datasets/carldimitrifogue/rap-lyrics-dataset/versions/2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Carl-Dimitri FOGUE
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
French
Description
Description:

This dataset was created as part of my master's degree project focusing on programming. I used genius api and kedro to create this data frame. One of the objective was to fine-tune a large language model (LLM) to mimic the unique styles and linguistic patterns of French rappers.

Contents: The dataset includes song lyrics from various French rap artists.

Use Case: This dataset is particularly valuable for those who are interested in:

Analyzing linguistic characteristics of French rap.

Generating rap lyrics using fine-tuned language models.

Understanding stylistic elements and vocabulary unique to French rap culture.

Key Features: - Language Focused: in french for the vast majority but some verse could be in english, arab or creole (Martinique, guadeloupe), it offers a rich source of non-English text for NLP tasks. - Artist Diversity: represents a wide range of French rappers, from mainstream artists to underground talents, and from the 90's to today - Structure: the data is already cleanded and ready-to-use, but there could be some problems here and there, so feel free to tell me !!

Disclaimer: This dataset is intended for educational and research purposes only. Users are responsible for ensuring their use complies with applicable copyright laws.
h
artist-lyrics-dataset
huggingface.co
Updated Apr 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Connor Homayouni (2024). artist-lyrics-dataset [Dataset]. https://huggingface.co/datasets/SpartanCinder/artist-lyrics-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 22, 2024
Authors
Connor Homayouni
License
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Description
SpartanCinder/artist-lyrics-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
m
Music Dataset: Lyrics and Metadata from 1950 to 2019
data.mendeley.com
narcis.nl
Updated Oct 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luan Moura (2020). Music Dataset: Lyrics and Metadata from 1950 to 2019 [Dataset]. http://doi.org/10.17632/3t9vbwxgr5.3
Explore at:
Unique identifier
https://doi.org/10.17632/3t9vbwxgr5.3
Dataset updated
Oct 23, 2020
Authors
Luan Moura
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was studied on Temporal Analysis and Visualisation of Music paper, in the following link:

https://sol.sbc.org.br/index.php/eniac/article/view/12155

This dataset provides a list of lyrics from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc. We also provide some informations as lyrics which can be used to natural language processing.

The audio data was scraped using Echo Nest® API integrated engine with spotipy Python’s package. The spotipy API permits the user to search for specific genres, artists,songs, release date, etc. To obtain the lyrics we used the Lyrics Genius® API as baseURL for requesting data based on the song title and artist name.
[2021] 1000+ GENIUS Artist Lyrics (Updated 4/27)
kaggle.com
Updated Apr 27, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KingOfSpades (2021). [2021] 1000+ GENIUS Artist Lyrics (Updated 4/27) [Dataset]. https://www.kaggle.com/datasets/kingofspades/2021-1-genius-artists-lyrics-updated/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 27, 2021
Dataset provided by
Kaggle
Authors
KingOfSpades
Description
DISCLAIMER: explicit song lyrics Only released song lyrics were scraped Duplicate albums (those with Deluxe versions) were ignored

Context

I wanted a high-quality lyrics dataset for the most popular artists today to use for text generation. I saw no better place to get this data than from genius.com, where song lyrics are hosted along with tags indicating Intros, Outros, Choruses, etc. Dataset was started 4/27/2021, and will be updated.

Content

Each .txt file is named appropriately to match the artist that it corresponds to. Inside each file there is a continuous stream of song lyrics, along with tags in the format [*section name*]. Each song is separated by a single .

Acknowledgements

genius.com for the lyrics

Inspiration

What patterns do you see in the text generation for different artists? Is there repetition, common phrases, etc.?
P
Jamendo Lyrics Dataset
paperswithcode.com
Updated Oct 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Stoller; Simon Durand; Sebastian Ewert (2019). Jamendo Lyrics Dataset [Dataset]. https://paperswithcode.com/dataset/jamendo-lyrics
Explore at:
Dataset updated
Oct 18, 2019
Authors
Daniel Stoller; Simon Durand; Sebastian Ewert
Description
Dataset for lyrics alignment and transcription evaluation. It contains 20 music pieces under CC license from the Jamendo website along with their lyrics, with:

Manual annotations indicating the start time of each word in the audio file Predictions of start and end times for each word from both of the models presented in the paper
Reasons for getting lyrics to songs worldwide 2017
statista.com
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Reasons for getting lyrics to songs worldwide 2017 [Dataset]. https://www.statista.com/statistics/799899/music-song-lyrics-reasons/
Explore at:
Dataset updated
Jul 9, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2017
Area covered
Worldwide
Description
The statistic shows the most common reasons why music consumers get lyrics to songs worldwide as of *************. During the survey, ** percent of respondents stated that they got the lyrics to songs in order to be able to sing along.
E
Arab-Andalusian music lyrics dataset
live.european-language-grid.eu
data.niaid.nih.gov
+1more
json
Updated Oct 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Arab-Andalusian music lyrics dataset [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7554
Explore at:
jsonAvailable download formats
Dataset updated
Oct 28, 2023
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The dataset contains lyrics for the songs in the Arab-Anadalusian music collection curated within the CompMusic project, that belong to the nawbas "Isbahan", "Maya”, “Raml Maya”, “Gharibat al-Husayn”, “Hijaz Kabir”, “Hijaz Msharqi”, “Istihlal”, “Rasd”, and ”Rasd Dayl”.
Lyrics are stored in two formats: as Tab Separated Values (TSV) files and as JSON files.
Each file is identified by its MusicBrainz recording ID (MBID).
The lyrics are stored both in their original Arabic script (folder 'original') and a romanized/transliterated version (folder 'transliterated') using the American Library of Congress (ALA-LC standard).
Corresponding audio files are available from the Arab-Andalusian music corpus, as well as the Internet Archive URL included in the metadata file ('metadata.csv').
For more information about the exact format and contents of the dataset, please consult the README provided in the archive.
For more information, please refer to http://compmusic.upf.edu/corpora.
P
Jam-ALT Dataset
paperswithcode.com
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ondřej Cífka; Constantinos Dimitriou; Cheng-i Wang; Hendrik Schreiber; Luke Miner; Fabian-Robert Stöter (2023). Jam-ALT Dataset [Dataset]. https://paperswithcode.com/dataset/jam-alt
Explore at:
Dataset updated
Nov 22, 2023
Authors
Ondřej Cífka; Constantinos Dimitriou; Cheng-i Wang; Hendrik Schreiber; Luke Miner; Fabian-Robert Stöter
Description
JamALT is a revision of the JamendoLyrics dataset (80 songs in 4 languages), adapted for use as an automatic lyrics transcription (ALT) benchmark.

The lyrics have been revised according to the newly compiled annotation guide, which include rules about spelling, punctuation, and formatting. The audio is identical to the JamendoLyrics dataset. However, only 79 songs are included, as one of the 20 French songs has been removed due to concerns about potentially harmful content.
h
Lyrics
huggingface.co
Updated Oct 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Finch Research (2023). Lyrics [Dataset]. https://huggingface.co/datasets/FinchResearch/Lyrics
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 20, 2023
Dataset authored and provided by
Finch Research
Description
FinchResearch/Lyrics dataset hosted on Hugging Face and contributed by the HF Datasets community
150K Lyrics Labeled with Spotify Valence
kaggle.com
Updated Apr 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edenbd (2020). 150K Lyrics Labeled with Spotify Valence [Dataset]. https://www.kaggle.com/edenbd/150k-lyrics-labeled-with-spotify-valence/kernels
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 28, 2020
Dataset provided by
Kaggle
Authors
Edenbd
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context

Based on 250K lyrics database. Created to perform Supervised NLP sentiment analysis task using Spotify valence audio feature, a measure of the positiveness of the song.

Content

Preparation of the dataset is explained in this notebook.

Acknowledgements

Thank you Nikita Detkov and Ilya for making the great 250K Lyrics1.csv file that I used for this data set. Thank you Madeline Zhang for the commented Spotify access example code and Spotify for the detailed Developers Spotify API.

Inspiration

Improve a song's positiveness measure by combining lyrics and audio mood measures.
h
lyric-to-3genre
huggingface.co
Updated Aug 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Efe (2023). lyric-to-3genre [Dataset]. https://huggingface.co/datasets/Veucci/lyric-to-3genre
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 10, 2023
Authors
Efe
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Song Lyrics Dataset

Description

This dataset contains a collection of song lyrics from various artists and genres in english. It is intended to be used for research, analysis, and other non-commercial purposes.

Dataset Details

The dataset is organized in a tabular format with the following columns:

Genre (int): Genre of the lyrics

Lyrics (str): The lyrics of the song.

Pop: 979 rows

Rock: 995 rows

Hip-Hop: 1040 rows

Usage

Feel free to use this… See the full description on the dataset page: https://huggingface.co/datasets/Veucci/lyric-to-3genre.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bruno Kreiner (2017). genius-lyrics [Dataset]. https://huggingface.co/datasets/brunokreiner/genius-lyrics

genius-lyrics

brunokreiner/genius-lyrics

Explore at:

207 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 25, 2017

Authors

Bruno Kreiner

Description

Dataset Card for Dataset Name

  Dataset Description





  Dataset Summary

This dataset consists of roughly 480k english (classified using nltk language classifier) lyrics with some more meta data. The id corresponds to the spotify id. The meta data was taken from the million playlist challenge @ AICrowd. The lyrics were crawled using "[song name] [artist name]" as string using the lyricsgenius python package which uses the genius.com search function. There is no… See the full description on the dataset page: https://huggingface.co/datasets/brunokreiner/genius-lyrics.

Clear search

Close search

Google apps

Main menu

genius-lyrics

Music Dataset: Lyrics and Metadata from 1950 to 2019

autonlp-data-song-lyrics

Data from: Song Interpretation Dataset

turkish-lyric-to-genre

LFM2b Lyrics Descriptor Analyses

Lyrics for Billboards Top 100 Songs 1946 - 2022

lyrics-dataset

MUSDB18 lyrics extension

Lyrics Dataset of French Rapper

artist-lyrics-dataset

Music Dataset: Lyrics and Metadata from 1950 to 2019

[2021] 1000+ GENIUS Artist Lyrics (Updated 4/27)

Context

Content

Acknowledgements

Inspiration

Jamendo Lyrics Dataset

Reasons for getting lyrics to songs worldwide 2017

Arab-Andalusian music lyrics dataset

Jam-ALT Dataset

Lyrics

150K Lyrics Labeled with Spotify Valence

Context

Content

Acknowledgements

Inspiration

lyric-to-3genre

genius-lyrics

brunokreiner/genius-lyrics