This statistic shows the results of a survey about the emotions respondents experienced when listening to sad music in the United Kingdom (UK) in 2016, broken down by frequency. The survey found that ** percent of respondents stated that they frequently felt melancholic when listening to sad music.
Full datasets containing three samples (N=1577, N=414, N=445) that answered to questions about memorable experiences related to sad music. Includes 131 variables containing reasons, emotions, physical reactions, attitudes, mechanisms etc. The study was published as Eerola, T. & Peltola, H.-R. (2016). Memorable Experiences with Sad Music – Reasons, Reactions and Mechanisms of Three Types of Experiences, PLOS ONE, http://dx.doi.org/10.1371/journal.pone.0157444
This statistic shows the results of a survey that asked respondents why they chose to listen to sad music in the United Kingdom (UK) in 2016. The survey found that **** percent of respondents stated that they listened to sad music to reminisce about past events, places or people.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a list of lyrics from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc. We also provide some informations as lyrics which can be used to natural language processing.
The audio data was scraped using Echo Nest® API integrated engine with spotipy Python’s package. The spotipy API permits the user to search for specific genres, artists,songs, release date, etc. To obtain the lyrics we used the Lyrics Genius® API as baseURL for requesting data based on the song title and artist name.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). The dataset contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound). Note, there are no song files for Actor_18.
The RAVDESS was developed by Dr Steven R. Livingstone, who now leads the Affective Data Science Lab, and Dr Frank A. Russo who leads the SMART Lab.
Citing the RAVDESS
The RAVDESS is released under a Creative Commons Attribution license, so please cite the RAVDESS if it is used in your work in any form. Published academic papers should use the academic paper citation for our PLoS1 paper. Personal works, such as machine learning projects/blog posts, should provide a URL to this Zenodo page, though a reference to our PLoS1 paper would also be appreciated.
Academic paper citation
Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.
Personal use citation
Include a link to this Zenodo page - https://zenodo.org/record/1188976
Commercial Licenses
Commercial licenses for the RAVDESS can be purchased. For more information, please visit our license page of fees, or contact us at ravdess@gmail.com.
Contact Information
If you would like further information about the RAVDESS, to purchase a commercial license, or if you experience any issues downloading files, please contact us at ravdess@gmail.com.
Example Videos
Watch a sample of the RAVDESS speech and song videos.
Emotion Classification Users
If you're interested in using machine learning to classify emotional expressions with the RAVDESS, please see our new RAVDESS Facial Landmark Tracking data set [Zenodo project page].
Construction and Validation
Full details on the construction and perceptual validation of the RAVDESS are described in our PLoS ONE paper - https://doi.org/10.1371/journal.pone.0196391.
The RAVDESS contains 7356 files. Each file was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability, and test-retest intrarater reliability were reported. Validation data is open-access, and can be downloaded along with our paper from PLoS ONE.
Contents
Audio-only files
Audio-only files of all actors (01-24) are available as two separate zip files (~200 MB each):
Speech file (Audio_Speech_Actors_01-24.zip, 215 MB) contains 1440 files: 60 trials per actor x 24 actors = 1440.
Song file (Audio_Song_Actors_01-24.zip, 198 MB) contains 1012 files: 44 trials per actor x 23 actors = 1012.
Audio-Visual and Video-only files
Video files are provided as separate zip downloads for each actor (01-24, ~500 MB each), and are split into separate speech and song downloads:
Speech files (Video_Speech_Actor_01.zip to Video_Speech_Actor_24.zip) collectively contains 2880 files: 60 trials per actor x 2 modalities (AV, VO) x 24 actors = 2880.
Song files (Video_Song_Actor_01.zip to Video_Song_Actor_24.zip) collectively contains 2024 files: 44 trials per actor x 2 modalities (AV, VO) x 23 actors = 2024.
File Summary
In total, the RAVDESS collection includes 7356 files (2880+2024+1440+1012 files).
File naming convention
Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics: Filename identifiers
Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
Vocal channel (01 = speech, 02 = song).
Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
Repetition (01 = 1st repetition, 02 = 2nd repetition).
Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).
Filename example: 02-01-06-01-02-01-12.mp4
Video-only (02)
Speech (01)
Fearful (06)
Normal intensity (01)
Statement "dogs" (02)
1st Repetition (01)
12th Actor (12)
Female, as the actor ID number is even.
License information
The RAVDESS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NC-SA 4.0
Commercial licenses for the RAVDESS can also be purchased. For more information, please visit our license fee page, or contact us at ravdess@gmail.com.
Related Data sets
RAVDESS Facial Landmark Tracking data set [Zenodo project page].
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was studied on Temporal Analysis and Visualisation of Music paper, in the following link:
https://sol.sbc.org.br/index.php/eniac/article/view/12155
This dataset provides a list of lyrics from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc. We also provide some informations as lyrics which can be used to natural language processing.
The audio data was scraped using Echo Nest® API integrated engine with spotipy Python’s package. The spotipy API permits the user to search for specific genres, artists,songs, release date, etc. To obtain the lyrics we used the Lyrics Genius® API as baseURL for requesting data based on the song title and artist name.
During that difficult period that is the coronavirus pandemic that has been spreading all over the world since the beginning of 2020, many people turned to music in order to lift their spirits. Still, rap music topped the ranking as the music genre that negatively impacted the listeners's mood the most in France.
Speech is the most natural way of expressing ourselves as humans. It is only natural then to extend this communication medium to computer applications. We define speech emotion recognition (SER) systems as a collection of methodologies that process and classify speech signals to detect the embedded emotions. SER is not a new field, it has been around for over two decades, and has regained attention thanks to the recent advancements. These novel studies make use of the advances in all fields of computing and technology, making it necessary to have an update on the current methodologies and techniques that make SER possible. We have identified and discussed distinct areas of SER, provided a detailed survey of current literature of each, and also listed the current challenges.
Here 4 most popular datasets in English: Crema, Ravdess, Savee and Tess. Each of them contains audio in .wav format with some main labels.
Ravdess:
Here is the filename identifiers as per the official RAVDESS website:
So, here's an example of an audio filename. 02-01-06-01-02-01-12.wav This means the meta data for the audio file is:
Crema:
The third component is responsible for the emotion label: * SAD - sadness; * ANG - angry; * DIS - disgust; * FEA - fear; * HAP - happy; * NEU - neutral.
Tess:
Very similar to Crema - label of emotion is contained in the name of file.
Savee:
The audio files in this dataset are named in such a way that the prefix letters describes the emotion classes as follows:
My pleasure to show you a notebook of this guy which inspire me to contain this dataset publicly.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Likelihood ratio tests statistics and post-hoc comparisons, to assess effects of stimulus themes and trait empathy on warm, cold and moving chills (*** = p < .001, ** = p < .01, * = p < .05).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data for an experiment demonstrating the usefulness of covariates in increasing the power of experimental studies, as reported in a tutorial in the Journal of Consumer Research.The study manipulates mood (happy/sad) through autobiographic recall and measures its effect on the enjoyment of a piece of classical music, while controlling for self-reported enjoyment of another piece of classical music (measured prior to the mood manipulation).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Numerous studies have explored crossmodal correspondences, yet have so far lacked insight into how crossmodal correspondences influence audiovisual emotional integration and aesthetic beauty. Our study investigated the behavioral and neural underpinnings of audiovisual emotional congruency in art perception. Participants viewed ‘happy’ or ‘sad’ paintings in an unimodal (visual) condition or paired with congruent or incongruent music (crossmodal condition). In the crossmodal condition, the music could be emotionally congruent (e.g., happy painting, happy music) or incongruent with the painting (e.g., happy painting, sad music). We also created Fourier Scrambled versions of each painting to test for the influence of semantics. We tested 21 participants with fMRI while they rated the presentations. Beauty ratings did not differ for unimodal and crossmodal presentations (when aggregating across incongruent and congruent crossmodal presentations). We found that crossmodal conditions activated sensory and emotion-processing areas. When zooming in on the crossmodal conditions, the results revealed that emotional congruency between the visual and auditory information resulted in higher beauty ratings than incongruent pairs. Furthermore, semantic information enhanced beauty ratings in congruent trials, which elicited distinct activations in related sensory areas, emotion-processing areas, and frontal areas for cognitive processing. The significant interaction effect for Congruency × Semantics, controlling for low-level features like color and brightness, observed in the behavioral results was further revealed in the fMRI findings, which showed heightened activation in the ventral stream and emotion-related areas for the congruent conditions. This demonstrates that emotional congruency not only increased beauty ratings but also increased the in-depth processing of the paintings. For incongruent versus congruent comparisons, the results suggest that a frontoparietal network and caudate may be involved in emotional incongruency. Our study reveals specific neural mechanisms, like ventral stream activation, that connect emotional congruency with aesthetic judgments in crossmodal experiences. This study contributes to the fields of art perception, neuroaesthetics, and audiovisual affective integration by using naturalistic art stimuli in combination with behavioral and fMRI analyses.
Film clips, music, and self-referential statements (termed Velten, after their originator) have been successfully used to temporarily induce sadness and happiness. However, there is little research on the effectiveness of these procedures combined, particularly in internet-based settings, and whether Velten statements contribute to alter mood beyond the effect of simple instructions to close one's eyes and enter the targeted mood. In Study 1 (N = 106) we examined the effectiveness 80 Velten statements (positive, negative, neutral-self, neutral-facts) to create brief and effective sets that might be used in future research. In Study 2 (N = 445) we examined the effect size of 8-min combined mood induction procedures, which presented video clips in the first half and music excerpts with Velten statements or closed eyes instructions in the second half. Participants answered questionnaires on social desirability, joviality, and sadness before being randomly assigned to 1 of 7 groups varying in Valence (positive, negative, neutral) and Velten (closed eyes control, self-referential Velten, and, in the case of neutral condition, factual statements). Subsequently, participants completed the joviality and sadness scales a second time. Compared to the neutral conditions, the positive mood inductions increased joviality (Hedges G = 1.35, 95% CI [1.07, 1.63]), whereas the negative mood inductions increased sadness (Hedges G = 1.28, 95% CI [1.01, 1.55]). We did not observe any significant difference between Velten and closed eyes instructions in inducing joviality or sadness, nor did we observe any significant difference between neutral Velten statements referring to self and facts. Although social desirability bias was associated with reports of greater joviality and lower sadness, it could not account for the effects of the positive and negative mood induction procedures. We conclude that these combined mood induction procedures can be used in online research to study happy and sad mood.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The 15 stimuli used in the study; the thematic category labels are derived from a data-driven, agglomerative hierarchical cluster analysis (see Data Analysis and Results sections).
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This statistic shows the results of a survey about the emotions respondents experienced when listening to sad music in the United Kingdom (UK) in 2016, broken down by frequency. The survey found that ** percent of respondents stated that they frequently felt melancholic when listening to sad music.