43 datasets found

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)
zenodo.org
data.niaid.nih.gov
zip
Updated Oct 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo (2024). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [Dataset]. http://doi.org/10.5281/zenodo.1188976
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1188976
Dataset updated
Oct 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Description

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). The dataset contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound). Note, there are no song files for Actor_18.

The RAVDESS was developed by Dr Steven R. Livingstone, who now leads the Affective Data Science Lab, and Dr Frank A. Russo who leads the SMART Lab.

Citing the RAVDESS

The RAVDESS is released under a Creative Commons Attribution license, so please cite the RAVDESS if it is used in your work in any form. Published academic papers should use the academic paper citation for our PLoS1 paper. Personal works, such as machine learning projects/blog posts, should provide a URL to this Zenodo page, though a reference to our PLoS1 paper would also be appreciated.

Academic paper citation

Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.

Personal use citation

Include a link to this Zenodo page - https://zenodo.org/record/1188976

Commercial Licenses

Commercial licenses for the RAVDESS can be purchased. For more information, please visit our license page of fees, or contact us at ravdess@gmail.com.

Contact Information

If you would like further information about the RAVDESS, to purchase a commercial license, or if you experience any issues downloading files, please contact us at ravdess@gmail.com.

Example Videos

Watch a sample of the RAVDESS speech and song videos.

Emotion Classification Users

If you're interested in using machine learning to classify emotional expressions with the RAVDESS, please see our new RAVDESS Facial Landmark Tracking data set [Zenodo project page].

Construction and Validation

Full details on the construction and perceptual validation of the RAVDESS are described in our PLoS ONE paper - https://doi.org/10.1371/journal.pone.0196391.

The RAVDESS contains 7356 files. Each file was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability, and test-retest intrarater reliability were reported. Validation data is open-access, and can be downloaded along with our paper from PLoS ONE.

Contents

Audio-only files

Audio-only files of all actors (01-24) are available as two separate zip files (~200 MB each):

Speech file (Audio_Speech_Actors_01-24.zip, 215 MB) contains 1440 files: 60 trials per actor x 24 actors = 1440.

Song file (Audio_Song_Actors_01-24.zip, 198 MB) contains 1012 files: 44 trials per actor x 23 actors = 1012.

Audio-Visual and Video-only files

Video files are provided as separate zip downloads for each actor (01-24, ~500 MB each), and are split into separate speech and song downloads:

Speech files (Video_Speech_Actor_01.zip to Video_Speech_Actor_24.zip) collectively contains 2880 files: 60 trials per actor x 2 modalities (AV, VO) x 24 actors = 2880.

Song files (Video_Song_Actor_01.zip to Video_Song_Actor_24.zip) collectively contains 2024 files: 44 trials per actor x 2 modalities (AV, VO) x 23 actors = 2024.

File Summary

In total, the RAVDESS collection includes 7356 files (2880+2024+1440+1012 files).

File naming convention

Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics:

Filename identifiers

Modality (01 = full-AV, 02 = video-only, 03 = audio-only).

Vocal channel (01 = speech, 02 = song).

Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).

Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.

Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").

Repetition (01 = 1st repetition, 02 = 2nd repetition).

Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).

Filename example: 02-01-06-01-02-01-12.mp4

Video-only (02)

Speech (01)

Fearful (06)

Normal intensity (01)

Statement "dogs" (02)

1st Repetition (01)

12th Actor (12)

Female, as the actor ID number is even.

License information

The RAVDESS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NC-SA 4.0

Commercial licenses for the RAVDESS can also be purchased. For more information, please visit our license fee page, or contact us at ravdess@gmail.com.

Related Data sets

RAVDESS Facial Landmark Tracking data set [Zenodo project page].
h
RAVDESS
huggingface.co
Updated Oct 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maha Tufail Agro (2024). RAVDESS [Dataset]. https://huggingface.co/datasets/MahiA/RAVDESS
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 12, 2024
Authors
Maha Tufail Agro
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
RAVDESS

This is an audio classification dataset for Emotion Recognition. Classes = 8 , Split = Train-Test

Structure

audios folder contains audio files. train.csv for training split and test.csv for the testing split.

Download

import os import huggingface_hub audio_datasets_path = "DATASET_PATH/Audio-Datasets" if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending with… See the full description on the dataset page: https://huggingface.co/datasets/MahiA/RAVDESS.
Z
Enhanced RAVDESS Speech Dataset
data.niaid.nih.gov
Updated Oct 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pardo, Bryan (2021). Enhanced RAVDESS Speech Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4783520
Explore at:
Dataset updated
Oct 2, 2021
Dataset provided by
Jin, Zeyu
Bryan, Nicholas J.
Caceres, Juan-Pablo
Pardo, Bryan
Morrison, Max
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This is a modified version of the speech audio contained within the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset. The original dataset can be found here. The unmodified version of just the speech audio used as source material for this dataset can be found here. This dataset performs speech enhancement and bandwidth extension on the original speech using HiFi-GAN. HiFi-GAN produces high-quality speech at 48 kHz that contains significantly less noise and reverb relative to the original recordings.

If you use this work as part of an academic publication, please cite the papers corresponding to both the original dataset as well as HiFi-GAN:

Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.

Su, Jiaqi, Zeyu Jin, and Adam Finkelstein. "HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks." Proc. Interspeech. October 2020.

Note that there are two recent papers with the name "HiFi-GAN". Please be sure to cite the correct paper as listed here.
Z
Facial Expression and Landmark Tracking (FELT) dataset
data.niaid.nih.gov
zenodo.org
Updated Oct 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liao, Zhenghao (2024). Facial Expression and Landmark Tracking (FELT) dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13243599
Explore at:
Dataset updated
Oct 19, 2024
Dataset provided by
Russo, Frank A.
Livingstone, Steven
Liao, Zhenghao
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Contact Information

If you would like further information about the Facial expression and landmark tracking data set, or if you experience any issues downloading files, please contact us at ravdess@gmail.com.

Facial Expression examples

Watch a sample of the facial expression tracking results.

Commercial Licenses

Commercial licenses for this dataset can be purchased. For more information, please contact us at ravdess@gmail.com.

Description

The Facial Expression and Landmark Tracking (FELT) dataset dataset contains tracked facial expression movements and animated videos from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [RAVDESS Zenodo page]. Tracking data and videos were produced by Py-Feat 0.6.2 (2024-03-29 release) (Cheong, J.H., Jolly, E., Xie, T. et al. Py-Feat: Python Facial Expression Analysis Toolbox. Affec Sci 4, 781–796 (2023). https://doi.org/10.1007/s42761-023-00191-4) and custom code (github repo). Tracked information includes: facial emotion classification estimates, facial landmark detection (68 points), head pose estimation (yaw, pitch, roll, x, y), and facial Action Unit (AU) recognition. Videos include: landmark overlay videos, AU activation animations, and landmark plot animations.

The FELT dataset was created at the Affective Data Science Lab.

This dataset contains tracking data and videos for all 2452 RAVDESS trials. Raw and smoothed tracking data are provided. All tracking movement data are contained in the following archives: raw_motion_speech.zip, smoothed_motion_speech.zip, raw_motion_song.zip, and smoothed_motion_song.zip. Each actor has 104 tracked trials (60 speech, 44 song). Note, there are no song files for Actor 18.

Total Tracked Files = (24 Actors x 60 Speech trials) + (23 Actors x 44 Song trials) = 2452 CSV files.

Tracking results for each trial are provided as individual comma separated value files (CSV format). File naming convention of raw and smoothed tracked files is identical to that of the RAVDESS. For example, smoothed tracked file "01-01-01-01-01-01-01.csv" corresponds to RAVDESS audio-video file "01-01-01-01-01-01-01.mp4". For a complete description of the RAVDESS file naming convention and experimental manipulations, please see the RAVDESS Zenodo page.

Landmark overlays, AU activation, and landmark plot videos for all trials are also provided (720p h264, .mp4). Landmark overlays present tracked landmarks and head pose overlaid on the original RAVDESS actor video. As the RAVDESS does not contain "ground truth" facial landmark locations, the overlay videos provide a visual 'sanity check' for researchers to confirm the general accuracy of the tracking results. Landmark plot animations present landmarks only, anchored to the top left corner of the head bounding box with translational head motion removed. AU activation animations visualize intensity of AU activations (0-1 normalized) as a heatmap over time. The file naming convention of all videos also matches that of the RAVDESS. For example, "Landmark_Overlay/01-01-01-01-01-01-01.mp4", "Landmark_Plot/01-01-01-01-01-01-01.mp4", "ActionUnit_Animation/01-01-01-01-01-01-01.mp4", all correspond to RAVDESS audio-video file "01-01-01-01-01-01-01.mp4".

Smoothing procedure

Raw tracking data were first low-pass filtered with a 5th order butterworth filter (cutoff_freq = 6, sampling_freq = 29.97, order = 5) to remove high-frequency noise. Data were then smoothed with a Savitzky-Golay filter (window_length = 11, poly_order = 5). Scipy.signal (v 1.13.1) was used for both procedures.

Landmark Tracking models

Six separate machine learning models were used by Py-Feat to perform various aspects of tracking and classification. Video outputs generated by different combinations of ML models were visually compared, with final model choice determined by voting of first and second authors. Models were specified in the call to Detector class (described here). Exact function call as follows:

Detector(face_model='img2pose', landmark_model='mobilenet', au_model='xgb', emotion_model='resmasknet', facepose_model='img2pose-c', identity_model='facenet', device='cuda', n_jobs=1, verbose=False, )

Default Py_feat parameters to each model were used in most cases. Non-defaults were specified in the call to detect_video function (described here). Exact function call as follows: (video_path, skip_frames=None, output_size=(720, 1280), batch_size=5, num_workers=0, pin_memory=False, face_detection_threshold=0.83, face_identity_threshold=0.8 )

Tracking File Output Format

This data set retained Py-Feat's data output format. The resolution of all input videos was 1280x720. Tracking output units are in pixels, their range of values is (0,0) (top left corner) to (1280,720) (bottom right corner).

Column 1 = Timing information

frame - The number of the frame (source videos 29.97 fps), range = 1 to n

Columns 2-5 = Head bounding box

2-3. FaceRectX, FaceRectY - X and Y coordinates of top-left corner of head bounding box (pixels)

4-5. FaceRectWidth, FaceRectHeightF - Width and Height of head bounding box (pixels)

Column 6 = Face detection confidence

FaceScore - Confidence level that a human face was deteceted, range = 0 to 1

Columns 7-142 = Facial landmark locations in 2D

7-142. x_0, ..., x_67, y_0,...y_67 - Location of 2D landmarks in pixels. A figure describing the landmark index can be found here.

Columns 143-145 = Head pose

143-145. Pitch, Roll, Yaw - Rotation of the head in degrees (described here). The rotation is in world coordinates with the camera being located at the origin.

Columns 146-165 = Facial Action Units

Facial Action Units (AUs) are a way to describe human facial movements (Ekman, Friesen, and Hager, 2002) [wiki link]. More information on Py-Feat's implementation of AUs can be found here.

145-150, 152-153, 155-158, 160-165. AU01, AU02, AU04, AU05, AU06, AU09, AU10, AU12, AU14, AU15, AU17, AU23, AU24, AU25, AU26, AU28, AU43 - Intensity of AU movement, range from 0 (no muscle contraction) to 1 (maximal muscle contraction).

151, 154, 159. AU07, AU11, AU20 - Presence or absence of AUs, range 0 (absent, not detected) to 1 (present, detected).

Columns 166-172 = Emotion classification confidence

162-172. anger, disgust, fear, happiness, sadness, surprise, neutral - Confidence of classified emotion category, range 0 (0%) to 1 (100%) confidence.

Columns 173-685 = Face identity score

Identity of faces contained in the video were classified using the FaceNet model (described here). This procedure generates at 512 dimension Euclidean embedding space.

Identity - Predicated individual identifyed in the RAVDESS video. Note, value is always Person_0, as each video only contains a single actor at all times (categorical).

174-685. Identity_1, ..., Identity_512 - Face embedding vector used by FaceNet to perform facial identity matching.

Column 686 = Input video

frame - The number of the frame (source videos 29.97 fps), range = 1 to n

Columns 687-688 = Timing information

frame.1 - The number of the frame (source videos 29.97 fps), duplicated column, range = 1 to n

approx_time - Approximate time of current frame (0.0 to x.x, in seconds)

Tracking videos

Landmark Overlay and Landmark Plot videos were produced with plot_detections function call (described here). This function generated invidual images for each frame, which were then compiled into a video using the imageio library (described here).

AU Activation videos were produced with plot_face function call (described here). This function also generated invidual images for each frame, which were then compiled into a video using the imageio library. Some frames could not be correctly generated by Py-Feat, producing only the AU heatmap but failing to plot/locate facial landmarks. These frames were dropped prior to compositing the output video. Drop rate was approximately 10% of all frames, in each video. Dropped frames were distributed evenly across the video timeline (i.e. no apparent clustering).

License information

The RAVDESS Facial expression and landmark tracking data set is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NA-SC 4.0.

How to cite the RAVDESS Facial Tracking data set

Academic citation If you use the RAVDESS Facial Tracking data set in an academic publication, please cite both references:

Liao, Z., Livingstone, SR., & Russo, FA. (2024). RAVDESS Facial expression and landmark tracking (Version 1.0.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.13243600

Livingstone SR, Russo FA (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.

All other attributions If you use the RAVDESS Facial expression and landmark tracking dataset in a form other than an academic publication, such as in a blog post, data science project or competition, school project, or non-commercial product, please use the following attribution: "RAVDESS Facial expression and landmark tracking" by Liao, Livingstone, & Russo is licensed under CC BY-NA-SC 4.0.

Related Data sets

The Ryerson Audio-Visual Database of Emotional Speech and Song [Zenodo project page].
RAVDESS Facial Landmark Tracking
zenodo.org
data.niaid.nih.gov
zip
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Riley Swanson; Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo; Riley Swanson (2025). RAVDESS Facial Landmark Tracking [Dataset]. http://doi.org/10.5281/zenodo.3255102
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3255102
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Riley Swanson; Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo; Riley Swanson
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Contact Information

If you would like further information about the RAVDESS Facial Landmark Tracking data set, or if you experience any issues downloading files, please contact us at ravdess@gmail.com.

Tracking Examples

Watch a sample of the facial tracking results.

Description

The RAVDESS Facial Landmark Tracking dataset set contains tracked facial landmark movements from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [RAVDESS Zenodo page]. Motion tracking of actors' faces was produced by OpenFace 2.1.0 (Baltrusaitis, T., Zadeh, A., Lim, Y. C., & Morency, L. P., 2018). Tracked information includes: facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

The Facial Landmark Tracking dataset was created in the Affective Data Science Lab.

This data set contains tracking for all 2452 RAVDESS trials. All tracking movement data are contained in "FacialTracking_Actors_01-24.zip", which contains 2452 .CSV files. Each actor has 104 tracked trials (60 speech, 44 song). Note, there are no song files for Actor 18.

Total Tracked Files = (24 Actors x 60 Speech trials) + (23 Actors x 44 Song trials) = 2452 files.

Tracking results for each trial are provided as individual comma separated value files (CSV format). File naming convention of tracked files is identical to that of the RAVDESS. For example, tracked file "01-01-01-01-01-01-01.csv" corresponds to RAVDESS audio-video file "01-01-01-01-01-01-01.mp4". For a complete description of the RAVDESS file naming convention and experimental manipulations, please see the RAVDESS Zenodo page.

Tracking overlay videos for all trials are also provided (720p Xvid, .avi), one zip file per Actor. As the RAVDESS does not contain "ground truth" facial landmark locations, the overlay videos provide a visual 'sanity check' for researchers to confirm the general accuracy of the tracking results. The file naming convention of tracking overlay videos also matches that of the RAVDESS. For example, tracking video "01-01-01-01-01-01-01.avi" corresponds to RAVDESS audio-video file "01-01-01-01-01-01-01.mp4".

Tracking File Output Format

This data set retained OpenFace's data output format, described here in detail. The resolution of all input videos was 1280x720. When tracking output units are in pixels, their range of values is (0,0) (top left corner) to (1280,720) (bottom right corner).

Columns 1-3 = Timing and Detection Confidence

1. Frame - The number of the frame (source videos 30 fps), range = 1 to n

2. Timestamp - Time of frame, range = 0 to m

3. Confidence - Tracker confidence level in current landmark detection estimate, range = 0 to 1

Columns 4-291 = Eye Gaze Detection

4-6. gaze_0_x, gaze_0_y, gaze_0_z - Eye gaze direction vector in world coordinates for eye 0 (normalized), eye 0 is the leftmost eye in the image (think of it as a ray going from the left eye in the image in the direction of the eye gaze).

7-9. gaze_1_x, gaze_1_y, gaze_1_z - Eye gaze direction vector in world coordinates for eye 1 (normalized), eye 1 is the rightmost eye in the image (think of it as a ray going from the right eye in the image in the direction of the eye gaze).

10-11. gaze_angle_x, gaze_angle_y - Eye gaze direction in radians in world coordinates, averaged for both eyes. If a person is looking left-right this will results in the change of gaze_angle_x (from positive to negative) and, if a person is looking up-down this will result in change of gaze_angle_y (from negative to positive), if a person is looking straight ahead both of the angles will be close to 0 (within measurement error).

12-123. eye_lmk_x_0, ..., eye_lmk_x55, eye_lmk_y_0,..., eye_lmk_y_55 - Location of 2D eye region landmarks in pixels. A figure describing the landmark index can be found here.

124-291. eye_lmk_X_0, ..., eye_lmk_X55, eye_lmk_Y_0,..., eye_lmk_Y_55,..., eye_lmk_Z_0,..., eye_lmk_Z_55 - Location of 3D eye region landmarks in millimeters. A figure describing the landmark index can be found here.

Columns 292-297 = Head pose

292-294. pose_Tx, pose_Ty, pose_Tz - Location of the head with respect to camera in millimeters (positive Z is away from the camera).

295-297. pose_Rx, pose_Ry, pose_Rz - Rotation of the head in radians around X,Y,Z axes with the convention R = Rx * Ry * Rz, left-handed positive sign. This can be seen as pitch (Rx), yaw (Ry), and roll (Rz). The rotation is in world coordinates with the camera being located at the origin.

Columns 298-433 = Facial Landmarks locations in 2D

298-433. x_0, ..., x_67, y_0,...y_67 - Location of 2D landmarks in pixels. A figure describing the landmark index can be found here.

Columns 434-637 = Facial Landmarks locations in 3D

434-637. X_0, ..., X_67, Y_0,..., Y_67, Z_0,..., Z_67 - Location of 3D landmarks in millimetres. A figure describing the landmark index can be found here. For these values to be accurate, OpenFace needs to have good estimates for fx,fy,cx,cy.

Columns 638-677 = Rigid and non-rigid shape parameters

Parameters of a point distribution model (PDM) that describe the rigid face shape (location, scale and rotation) and non-rigid face shape (deformation due to expression and identity). For more details, please refer to chapter 4.2 of my Tadas Baltrusaitis's PhD thesis [download link].

638-643. p_scale, p_rx, p_ry, p_rz, p_tx, p_ty - Scale, rotation, and translation terms of the PDM.

644-677. p_0, ..., p_33 - Non-rigid shape parameters.

Columns 687-712 = Facial Action Units

Facial Action Units (AUs) are a way to describe human facial movements (Ekman, Friesen, and Hager, 2002) [wiki link]. More information on OpenFace's implementation of AUs can be found here.

678-694. AU01_r, AU02_r, AU04_r, AU05_r, AU06_r, AU07_r, AU09_r, AU10_r, AU12_r, AU14_r, AU15_r, AU17_r, AU20_r, AU23_r, AU25_r, AU26_r, AU45_r - Intensity of AU movement, range from 0 (no muscle contraction) to 5 (maximal muscle contraction).

695-712. AU01_c, AU02_c, AU04_c, AU05_c, AU06_c, AU07_c, AU09_c, AU10_c, AU12_c, AU14_c, AU15_c, AU17_c, AU20_c, AU23_c, AU25_c, AU26_c, AU28_c, AU45_c - Presence or absence of 18 AUs, range 0 (absent, not detected) to 1 (present, detected).

Note, OpenFace's columns 2 and 5 (face_id and success, respectively) were not included in this data set. These values were redundant as a single face was detected in all frames, in all 2452 trials.

Tracking Overlay Videos

Tracking overlay videos visualize most aspects of the tracking output described above.

Frame - Column 1, Top left corner of video

Eye Gaze - Columns 4-11. Indicated by green ray emanating from left and right eyes.

Eye region landmarks 2D - Columns 12-123. Red landmarks around left and right eyes, and black circles surrounding left and right irises.

Head pose - Columns 292-297. Blue bounding box surrounding the actor's head.

Facial landmarks 2D - Columns 298-433. Red landmarks on the participant's left and right eyebrows, nose, lips, and jaw.

Facial Action Unit Intensity - Columns 687-694. All 17 AUs are listed on the left side of the video in black text. Intensity level (0-5) of each AU is indicated by the numeric value and blue bar.

Facial Action Unit Presence - Columns 695-712. All 18 AUs are listed on the right side of the video in black & green text. Absence of an AU (0) is in black text with the numeric value 0.0. Presence of an AU (1) is in green text with the numeric value 1.0.

Camera Parameters and 3D Calibration Procedure

This data set contains accurate estimates of actors' 3D head poses. To produce these, camera parameters at the time of recording were required (distance from camera to actor, and camera field of view). These values were used with OpenCV's camera calibration procedure, described here, to produce estimates of the camera's focal length and optical center at the time of actor recordings. The four values produced by the calibration procedure (fx,fy,cx,cy) were input to OpenFace as command line arguments during facial tracking, described here, to produce accurate estimates of 3D head pose.

Camera
h
RAVDESS
huggingface.co
Updated Mar 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hoang Phuc (2025). RAVDESS [Dataset]. https://huggingface.co/datasets/HoangPhuc7679/RAVDESS
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 4, 2025
Authors
Hoang Phuc
License
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
Description
HoangPhuc7679/RAVDESS dataset hosted on Hugging Face and contributed by the HF Datasets community
RAVDESS as .csv
kaggle.com
zip
Updated Sep 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kartik Khandelwal (2021). RAVDESS as .csv [Dataset]. https://www.kaggle.com/kartik2khandelwal/speech-emotion-dataset
Explore at:
zip(781110 bytes)Available download formats
Dataset updated
Sep 15, 2021
Authors
Kartik Khandelwal
Description
After extracting the features of the RAVDESS dataset for Speech Emotion Recognition this csv file is prepared so, you don't need to invest a lot of time and effort again.
h
RAVDESS
huggingface.co
Updated Apr 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuka Tatsumi (2025). RAVDESS [Dataset]. https://huggingface.co/datasets/yukat237/RAVDESS
Explore at:
Dataset updated
Apr 13, 2025
Authors
Yuka Tatsumi
Description
yukat237/RAVDESS dataset hosted on Hugging Face and contributed by the HF Datasets community
RAVDESS (ZENODO) DATASET PRE-PROCESSED
kaggle.com
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manish Prajapati (2025). RAVDESS (ZENODO) DATASET PRE-PROCESSED [Dataset]. https://www.kaggle.com/datasets/manishprajapati24/ravdess-zenodo-dataset-pre-processed/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 19, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Manish Prajapati
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
File name Identifiers:-

Example file_name: - 02-01-06-01-02-01-12.mp4

Modality (01 = full-AV, 02 = video-only, 03 = audio-only).

Vocal channel (01 = speech, 02 = song).

Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).

Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.

Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").

Repetition (01 = 1st repetition, 02 = 2nd repetition).

Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).

FileName example: - 02-01-06-01-02-01-12.mp4 1. Video-only (02) 2. Speech (01) 3. Fearful (06) 4. Normal intensity (01) 5. Statement "dogs" (02) 6. 1st Repetition (01) 7. 12th Actor (12) 8. Female, as the actor ID number is even
ravdess dataset
kaggle.com
zip
Updated Feb 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Addy (2021). ravdess dataset [Dataset]. https://www.kaggle.com/addy02/ravdess-dataset
Explore at:
zip(225014016 bytes)Available download formats
Dataset updated
Feb 14, 2021
Authors
Addy
Description
Dataset

This dataset was created by Addy

Contents

It contains the following files:
z
ravdess in WebDataset Format
zenodo.org
tar
Updated Jan 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niu Yadong; Niu Yadong (2025). ravdess in WebDataset Format [Dataset]. http://doi.org/10.5281/zenodo.14722524
Explore at:
tarAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14722524
Dataset updated
Jan 23, 2025
Dataset provided by
xiaomi
Authors
Niu Yadong; Niu Yadong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is the ravdess dataset, formatted in the WebDataset format. WebDataset files are essentially tar archives, where each example in the dataset is represented by a pair of files: a WAV audio file and a corresponding JSON metadata file. The JSON file contains the class label and other relevant information for that particular audio sample.

$ tar tvf ravdess_fold_0_0000000.tar |head -r--r--r-- bigdata/bigdata 24 2025-01-10 15:44 03-01-08-01-01-01-11.json -r--r--r-- bigdata/bigdata 341912 2025-01-10 15:44 03-01-08-01-01-01-11.wav -r--r--r-- bigdata/bigdata 22 2025-01-10 15:44 03-01-07-02-01-02-05.json -r--r--r-- bigdata/bigdata 424184 2025-01-10 15:44 03-01-07-02-01-02-05.wav -r--r--r-- bigdata/bigdata 22 2025-01-10 15:44 03-01-06-01-01-02-10.json -r--r--r-- bigdata/bigdata 377100 2025-01-10 15:44 03-01-06-01-01-02-10.wav -r--r--r-- bigdata/bigdata 24 2025-01-10 15:44 03-01-08-01-02-01-16.json -r--r--r-- bigdata/bigdata 396324 2025-01-10 15:44 03-01-08-01-02-01-16.wav -r--r--r-- bigdata/bigdata 24 2025-01-10 15:44 03-01-08-01-02-02-22.json -r--r--r-- bigdata/bigdata 404388 2025-01-10 15:44 03-01-08-01-02-02-22.wav

$ cat 03-01-08-01-01-01-11.json {"emotion": "surprised"}
h
RAVDESS
huggingface.co
Updated Aug 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neo Ho (2024). RAVDESS [Dataset]. https://huggingface.co/datasets/windcrossroad/RAVDESS
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 19, 2024
Authors
Neo Ho
Description
windcrossroad/RAVDESS dataset hosted on Hugging Face and contributed by the HF Datasets community
ravdess-Emotional-speech_song-train_test-csv
kaggle.com
Updated Jan 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kuntal Das599 (2021). ravdess-Emotional-speech_song-train_test-csv [Dataset]. https://www.kaggle.com/kuntaldas599/ravdessemotionalspeech-songtrain-testcsv
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 8, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kuntal Das599
Description
Dataset

This dataset was created by Kuntal Das599

Contents
h
SER-RAVDESS-Augmented
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
yuval ratzabi, SER-RAVDESS-Augmented [Dataset]. https://huggingface.co/datasets/yuvalira/SER-RAVDESS-Augmented
Explore at:
Authors
yuval ratzabi
Description
RAVDESS-SER-Augmented

This dataset is an augmented version of the RAVDESS speech subset, created to support robust training of Speech Emotion Recognition (SER) models such as MS-SincResNet.It contains .pt files for each sample, storing the raw waveform (3 seconds, 16kHz mono) and its corresponding emotion label (0–7).

Dataset Structure

Each .pt file is a dictionary containing:

"waveform": 1D float tensor of shape 48000 "label": integer in [0… See the full description on the dataset page: https://huggingface.co/datasets/yuvalira/SER-RAVDESS-Augmented.
h
ravdess
huggingface.co
Updated Sep 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DuanYu (2024). ravdess [Dataset]. https://huggingface.co/datasets/duanyu027/ravdess
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 12, 2024
Authors
DuanYu
Description
duanyu027/ravdess dataset hosted on Hugging Face and contributed by the HF Datasets community
h
RAVDESS
huggingface.co
Updated Apr 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandre Aguedo (2025). RAVDESS [Dataset]. https://huggingface.co/datasets/Aguedo/RAVDESS
Explore at:
Dataset updated
Apr 4, 2025
Authors
Alexandre Aguedo
Description
RAVDESS Dataset

This dataset contains a subset of RAVDESS for emotion detection.
ravdess
kaggle.com
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jubaer Ahamed Bhuiyan (2025). ravdess [Dataset]. https://www.kaggle.com/datasets/jubaerahamedbhuiyan/ravdess/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 25, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jubaer Ahamed Bhuiyan
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset

This dataset was created by Jubaer Ahamed Bhuiyan

Released under CC BY-SA 4.0

Contents
f
Description of factor-level coding of RAVDESS filenames.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steven R. Livingstone; Frank A. Russo (2023). Description of factor-level coding of RAVDESS filenames. [Dataset]. http://doi.org/10.1371/journal.pone.0196391.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0196391.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Steven R. Livingstone; Frank A. Russo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description of factor-level coding of RAVDESS filenames.
ravdess data
kaggle.com
Updated May 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
deepakk6 (2022). ravdess data [Dataset]. https://www.kaggle.com/datasets/deepakk6/ravdess-data/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 8, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
deepakk6
Description
Dataset

This dataset was created by deepakk6

Contents
Noised versions of RAVDESS
kaggle.com
Updated Apr 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marwa H22 (2024). Noised versions of RAVDESS [Dataset]. https://www.kaggle.com/datasets/marwah22/noised-versions-of-ravdess/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Marwa H22
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Marwa H22

Released under Apache 2.0

Contents

Facebook

Twitter

Click to copy link

Link copied

Cite

Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo (2024). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [Dataset]. http://doi.org/10.5281/zenodo.1188976

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)

Explore at:

62 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1188976

Dataset updated

Oct 19, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Description

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). The dataset contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound). Note, there are no song files for Actor_18.

The RAVDESS was developed by Dr Steven R. Livingstone, who now leads the Affective Data Science Lab, and Dr Frank A. Russo who leads the SMART Lab.

Citing the RAVDESS

The RAVDESS is released under a Creative Commons Attribution license, so please cite the RAVDESS if it is used in your work in any form. Published academic papers should use the academic paper citation for our PLoS1 paper. Personal works, such as machine learning projects/blog posts, should provide a URL to this Zenodo page, though a reference to our PLoS1 paper would also be appreciated.

Academic paper citation

Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.

Personal use citation

Include a link to this Zenodo page - https://zenodo.org/record/1188976

Commercial Licenses

Commercial licenses for the RAVDESS can be purchased. For more information, please visit our license page of fees, or contact us at ravdess@gmail.com.

Contact Information

If you would like further information about the RAVDESS, to purchase a commercial license, or if you experience any issues downloading files, please contact us at ravdess@gmail.com.

Example Videos

Watch a sample of the RAVDESS speech and song videos.

Emotion Classification Users

If you're interested in using machine learning to classify emotional expressions with the RAVDESS, please see our new RAVDESS Facial Landmark Tracking data set [Zenodo project page].

Construction and Validation

Full details on the construction and perceptual validation of the RAVDESS are described in our PLoS ONE paper - https://doi.org/10.1371/journal.pone.0196391.

The RAVDESS contains 7356 files. Each file was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability, and test-retest intrarater reliability were reported. Validation data is open-access, and can be downloaded along with our paper from PLoS ONE.

Contents

Audio-only files

Audio-only files of all actors (01-24) are available as two separate zip files (~200 MB each):

Speech file (Audio_Speech_Actors_01-24.zip, 215 MB) contains 1440 files: 60 trials per actor x 24 actors = 1440.
Song file (Audio_Song_Actors_01-24.zip, 198 MB) contains 1012 files: 44 trials per actor x 23 actors = 1012.

Audio-Visual and Video-only files

Video files are provided as separate zip downloads for each actor (01-24, ~500 MB each), and are split into separate speech and song downloads:

Speech files (Video_Speech_Actor_01.zip to Video_Speech_Actor_24.zip) collectively contains 2880 files: 60 trials per actor x 2 modalities (AV, VO) x 24 actors = 2880.
Song files (Video_Song_Actor_01.zip to Video_Song_Actor_24.zip) collectively contains 2024 files: 44 trials per actor x 2 modalities (AV, VO) x 23 actors = 2024.

File Summary

In total, the RAVDESS collection includes 7356 files (2880+2024+1440+1012 files).

File naming convention

Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics:

Filename identifiers

Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
Vocal channel (01 = speech, 02 = song).
Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
Repetition (01 = 1st repetition, 02 = 2nd repetition).
Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).

Filename example: 02-01-06-01-02-01-12.mp4

Video-only (02)
Speech (01)
Fearful (06)
Normal intensity (01)
Statement "dogs" (02)
1st Repetition (01)
12th Actor (12)
Female, as the actor ID number is even.

License information

The RAVDESS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NC-SA 4.0

Commercial licenses for the RAVDESS can also be purchased. For more information, please visit our license fee page, or contact us at ravdess@gmail.com.

Related Data sets

RAVDESS Facial Landmark Tracking data set [Zenodo project page].

Clear search

Close search

Google apps

Main menu

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)

RAVDESS

Enhanced RAVDESS Speech Dataset

Facial Expression and Landmark Tracking (FELT) dataset

RAVDESS Facial Landmark Tracking

RAVDESS

RAVDESS as .csv

RAVDESS

RAVDESS (ZENODO) DATASET PRE-PROCESSED

ravdess dataset

Dataset

Contents

ravdess in WebDataset Format

RAVDESS

ravdess-Emotional-speech_song-train_test-csv

Dataset

Contents

SER-RAVDESS-Augmented

ravdess

RAVDESS

ravdess

Dataset

Contents

Description of factor-level coding of RAVDESS filenames.

ravdess data

Dataset

Contents

Noised versions of RAVDESS

Dataset

Contents

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)See More Versions

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)