43 datasets found
  1. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo (2024). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [Dataset]. http://doi.org/10.5281/zenodo.1188976
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Description

    The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). The dataset contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound). Note, there are no song files for Actor_18.

    The RAVDESS was developed by Dr Steven R. Livingstone, who now leads the Affective Data Science Lab, and Dr Frank A. Russo who leads the SMART Lab.

    Citing the RAVDESS

    The RAVDESS is released under a Creative Commons Attribution license, so please cite the RAVDESS if it is used in your work in any form. Published academic papers should use the academic paper citation for our PLoS1 paper. Personal works, such as machine learning projects/blog posts, should provide a URL to this Zenodo page, though a reference to our PLoS1 paper would also be appreciated.

    Academic paper citation

    Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.

    Personal use citation

    Include a link to this Zenodo page - https://zenodo.org/record/1188976

    Commercial Licenses

    Commercial licenses for the RAVDESS can be purchased. For more information, please visit our license page of fees, or contact us at ravdess@gmail.com.

    Contact Information

    If you would like further information about the RAVDESS, to purchase a commercial license, or if you experience any issues downloading files, please contact us at ravdess@gmail.com.

    Example Videos

    Watch a sample of the RAVDESS speech and song videos.

    Emotion Classification Users

    If you're interested in using machine learning to classify emotional expressions with the RAVDESS, please see our new RAVDESS Facial Landmark Tracking data set [Zenodo project page].

    Construction and Validation

    Full details on the construction and perceptual validation of the RAVDESS are described in our PLoS ONE paper - https://doi.org/10.1371/journal.pone.0196391.

    The RAVDESS contains 7356 files. Each file was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability, and test-retest intrarater reliability were reported. Validation data is open-access, and can be downloaded along with our paper from PLoS ONE.

    Contents

    Audio-only files

    Audio-only files of all actors (01-24) are available as two separate zip files (~200 MB each):

    • Speech file (Audio_Speech_Actors_01-24.zip, 215 MB) contains 1440 files: 60 trials per actor x 24 actors = 1440.
    • Song file (Audio_Song_Actors_01-24.zip, 198 MB) contains 1012 files: 44 trials per actor x 23 actors = 1012.

    Audio-Visual and Video-only files

    Video files are provided as separate zip downloads for each actor (01-24, ~500 MB each), and are split into separate speech and song downloads:

    • Speech files (Video_Speech_Actor_01.zip to Video_Speech_Actor_24.zip) collectively contains 2880 files: 60 trials per actor x 2 modalities (AV, VO) x 24 actors = 2880.
    • Song files (Video_Song_Actor_01.zip to Video_Song_Actor_24.zip) collectively contains 2024 files: 44 trials per actor x 2 modalities (AV, VO) x 23 actors = 2024.

    File Summary

    In total, the RAVDESS collection includes 7356 files (2880+2024+1440+1012 files).

    File naming convention

    Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics:

    Filename identifiers

    • Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
    • Vocal channel (01 = speech, 02 = song).
    • Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
    • Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
    • Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
    • Repetition (01 = 1st repetition, 02 = 2nd repetition).
    • Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).


    Filename example: 02-01-06-01-02-01-12.mp4

    1. Video-only (02)
    2. Speech (01)
    3. Fearful (06)
    4. Normal intensity (01)
    5. Statement "dogs" (02)
    6. 1st Repetition (01)
    7. 12th Actor (12)
    8. Female, as the actor ID number is even.

    License information

    The RAVDESS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NC-SA 4.0

    Commercial licenses for the RAVDESS can also be purchased. For more information, please visit our license fee page, or contact us at ravdess@gmail.com.

    Related Data sets

  2. h

    RAVDESS

    • huggingface.co
    Updated Oct 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maha Tufail Agro (2024). RAVDESS [Dataset]. https://huggingface.co/datasets/MahiA/RAVDESS
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 12, 2024
    Authors
    Maha Tufail Agro
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    RAVDESS

    This is an audio classification dataset for Emotion Recognition. Classes = 8 , Split = Train-Test

      Structure
    

    audios folder contains audio files. train.csv for training split and test.csv for the testing split.

      Download
    

    import os import huggingface_hub audio_datasets_path = "DATASET_PATH/Audio-Datasets" if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=} does not exist. Specify a valid path ending withโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/MahiA/RAVDESS.

  3. Z

    Enhanced RAVDESS Speech Dataset

    • data.niaid.nih.gov
    Updated Oct 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pardo, Bryan (2021). Enhanced RAVDESS Speech Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4783520
    Explore at:
    Dataset updated
    Oct 2, 2021
    Dataset provided by
    Jin, Zeyu
    Bryan, Nicholas J.
    Caceres, Juan-Pablo
    Pardo, Bryan
    Morrison, Max
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This is a modified version of the speech audio contained within the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset. The original dataset can be found here. The unmodified version of just the speech audio used as source material for this dataset can be found here. This dataset performs speech enhancement and bandwidth extension on the original speech using HiFi-GAN. HiFi-GAN produces high-quality speech at 48 kHz that contains significantly less noise and reverb relative to the original recordings.

    If you use this work as part of an academic publication, please cite the papers corresponding to both the original dataset as well as HiFi-GAN:

    Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.

    Su, Jiaqi, Zeyu Jin, and Adam Finkelstein. "HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks." Proc. Interspeech. October 2020.

    Note that there are two recent papers with the name "HiFi-GAN". Please be sure to cite the correct paper as listed here.

  4. Z

    Facial Expression and Landmark Tracking (FELT) dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liao, Zhenghao (2024). Facial Expression and Landmark Tracking (FELT) dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13243599
    Explore at:
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    Russo, Frank A.
    Livingstone, Steven
    Liao, Zhenghao
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Contact Information

    If you would like further information about the Facial expression and landmark tracking data set, or if you experience any issues downloading files, please contact us at ravdess@gmail.com.

    Facial Expression examples

    Watch a sample of the facial expression tracking results.

    Commercial Licenses

    Commercial licenses for this dataset can be purchased. For more information, please contact us at ravdess@gmail.com.

    Description

    The Facial Expression and Landmark Tracking (FELT) dataset dataset contains tracked facial expression movements and animated videos from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [RAVDESS Zenodo page]. Tracking data and videos were produced by Py-Feat 0.6.2 (2024-03-29 release) (Cheong, J.H., Jolly, E., Xie, T. et al. Py-Feat: Python Facial Expression Analysis Toolbox. Affec Sci 4, 781โ€“796 (2023). https://doi.org/10.1007/s42761-023-00191-4) and custom code (github repo). Tracked information includes: facial emotion classification estimates, facial landmark detection (68 points), head pose estimation (yaw, pitch, roll, x, y), and facial Action Unit (AU) recognition. Videos include: landmark overlay videos, AU activation animations, and landmark plot animations.

    The FELT dataset was created at the Affective Data Science Lab.

    This dataset contains tracking data and videos for all 2452 RAVDESS trials. Raw and smoothed tracking data are provided. All tracking movement data are contained in the following archives: raw_motion_speech.zip, smoothed_motion_speech.zip, raw_motion_song.zip, and smoothed_motion_song.zip. Each actor has 104 tracked trials (60 speech, 44 song). Note, there are no song files for Actor 18.

    Total Tracked Files = (24 Actors x 60 Speech trials) + (23 Actors x 44 Song trials) = 2452 CSV files.

    Tracking results for each trial are provided as individual comma separated value files (CSV format). File naming convention of raw and smoothed tracked files is identical to that of the RAVDESS. For example, smoothed tracked file "01-01-01-01-01-01-01.csv" corresponds to RAVDESS audio-video file "01-01-01-01-01-01-01.mp4". For a complete description of the RAVDESS file naming convention and experimental manipulations, please see the RAVDESS Zenodo page.

    Landmark overlays, AU activation, and landmark plot videos for all trials are also provided (720p h264, .mp4). Landmark overlays present tracked landmarks and head pose overlaid on the original RAVDESS actor video. As the RAVDESS does not contain "ground truth" facial landmark locations, the overlay videos provide a visual 'sanity check' for researchers to confirm the general accuracy of the tracking results. Landmark plot animations present landmarks only, anchored to the top left corner of the head bounding box with translational head motion removed. AU activation animations visualize intensity of AU activations (0-1 normalized) as a heatmap over time. The file naming convention of all videos also matches that of the RAVDESS. For example, "Landmark_Overlay/01-01-01-01-01-01-01.mp4", "Landmark_Plot/01-01-01-01-01-01-01.mp4", "ActionUnit_Animation/01-01-01-01-01-01-01.mp4", all correspond to RAVDESS audio-video file "01-01-01-01-01-01-01.mp4".

    Smoothing procedure

    Raw tracking data were first low-pass filtered with a 5th order butterworth filter (cutoff_freq = 6, sampling_freq = 29.97, order = 5) to remove high-frequency noise. Data were then smoothed with a Savitzky-Golay filter (window_length = 11, poly_order = 5). Scipy.signal (v 1.13.1) was used for both procedures.

    Landmark Tracking models

    Six separate machine learning models were used by Py-Feat to perform various aspects of tracking and classification. Video outputs generated by different combinations of ML models were visually compared, with final model choice determined by voting of first and second authors. Models were specified in the call to Detector class (described here). Exact function call as follows:

    Detector(face_model='img2pose',  landmark_model='mobilenet',  au_model='xgb',  emotion_model='resmasknet',  facepose_model='img2pose-c',  identity_model='facenet',  device='cuda',  n_jobs=1,  verbose=False,  )
    

    Default Py_feat parameters to each model were used in most cases. Non-defaults were specified in the call to detect_video function (described here). Exact function call as follows: (video_path, skip_frames=None, output_size=(720, 1280), batch_size=5, num_workers=0, pin_memory=False, face_detection_threshold=0.83, face_identity_threshold=0.8 )

    Tracking File Output Format

    This data set retained Py-Feat's data output format. The resolution of all input videos was 1280x720. Tracking output units are in pixels, their range of values is (0,0) (top left corner) to (1280,720) (bottom right corner).

    Column 1 = Timing information

    1. frame - The number of the frame (source videos 29.97 fps), range = 1 to n

    Columns 2-5 = Head bounding box

    2-3. FaceRectX, FaceRectY - X and Y coordinates of top-left corner of head bounding box (pixels)

    4-5. FaceRectWidth, FaceRectHeightF - Width and Height of head bounding box (pixels)

    Column 6 = Face detection confidence

    FaceScore - Confidence level that a human face was deteceted, range = 0 to 1

    Columns 7-142 = Facial landmark locations in 2D

    7-142. x_0, ..., x_67, y_0,...y_67 - Location of 2D landmarks in pixels. A figure describing the landmark index can be found here.

    Columns 143-145 = Head pose

    143-145. Pitch, Roll, Yaw - Rotation of the head in degrees (described here). The rotation is in world coordinates with the camera being located at the origin.

    Columns 146-165 = Facial Action Units

    Facial Action Units (AUs) are a way to describe human facial movements (Ekman, Friesen, and Hager, 2002) [wiki link]. More information on Py-Feat's implementation of AUs can be found here.

    145-150, 152-153, 155-158, 160-165. AU01, AU02, AU04, AU05, AU06, AU09, AU10, AU12, AU14, AU15, AU17, AU23, AU24, AU25, AU26, AU28, AU43 - Intensity of AU movement, range from 0 (no muscle contraction) to 1 (maximal muscle contraction).

    151, 154, 159. AU07, AU11, AU20 - Presence or absence of AUs, range 0 (absent, not detected) to 1 (present, detected).

    Columns 166-172 = Emotion classification confidence

    162-172. anger, disgust, fear, happiness, sadness, surprise, neutral - Confidence of classified emotion category, range 0 (0%) to 1 (100%) confidence.

    Columns 173-685 = Face identity score

    Identity of faces contained in the video were classified using the FaceNet model (described here). This procedure generates at 512 dimension Euclidean embedding space.

    1. Identity - Predicated individual identifyed in the RAVDESS video. Note, value is always Person_0, as each video only contains a single actor at all times (categorical).

    174-685. Identity_1, ..., Identity_512 - Face embedding vector used by FaceNet to perform facial identity matching.

    Column 686 = Input video

    1. frame - The number of the frame (source videos 29.97 fps), range = 1 to n

    Columns 687-688 = Timing information

    1. frame.1 - The number of the frame (source videos 29.97 fps), duplicated column, range = 1 to n

    2. approx_time - Approximate time of current frame (0.0 to x.x, in seconds)

    Tracking videos

    Landmark Overlay and Landmark Plot videos were produced with plot_detections function call (described here). This function generated invidual images for each frame, which were then compiled into a video using the imageio library (described here).

    AU Activation videos were produced with plot_face function call (described here). This function also generated invidual images for each frame, which were then compiled into a video using the imageio library. Some frames could not be correctly generated by Py-Feat, producing only the AU heatmap but failing to plot/locate facial landmarks. These frames were dropped prior to compositing the output video. Drop rate was approximately 10% of all frames, in each video. Dropped frames were distributed evenly across the video timeline (i.e. no apparent clustering).

    License information

    The RAVDESS Facial expression and landmark tracking data set is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NA-SC 4.0.

    How to cite the RAVDESS Facial Tracking data set

    Academic citation If you use the RAVDESS Facial Tracking data set in an academic publication, please cite both references:

    Liao, Z., Livingstone, SR., & Russo, FA. (2024). RAVDESS Facial expression and landmark tracking (Version 1.0.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.13243600

    Livingstone SR, Russo FA (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.

    All other attributions If you use the RAVDESS Facial expression and landmark tracking dataset in a form other than an academic publication, such as in a blog post, data science project or competition, school project, or non-commercial product, please use the following attribution: "RAVDESS Facial expression and landmark tracking" by Liao, Livingstone, & Russo is licensed under CC BY-NA-SC 4.0.

    Related Data sets

    The Ryerson Audio-Visual Database of Emotional Speech and Song [Zenodo project page].

  5. RAVDESS Facial Landmark Tracking

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Riley Swanson; Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo; Riley Swanson (2025). RAVDESS Facial Landmark Tracking [Dataset]. http://doi.org/10.5281/zenodo.3255102
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Riley Swanson; Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo; Riley Swanson
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Contact Information

    If you would like further information about the RAVDESS Facial Landmark Tracking data set, or if you experience any issues downloading files, please contact us at ravdess@gmail.com.

    Tracking Examples

    Watch a sample of the facial tracking results.

    Description

    The RAVDESS Facial Landmark Tracking dataset set contains tracked facial landmark movements from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [RAVDESS Zenodo page]. Motion tracking of actors' faces was produced by OpenFace 2.1.0 (Baltrusaitis, T., Zadeh, A., Lim, Y. C., & Morency, L. P., 2018). Tracked information includes: facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

    The Facial Landmark Tracking dataset was created in the Affective Data Science Lab.

    This data set contains tracking for all 2452 RAVDESS trials. All tracking movement data are contained in "FacialTracking_Actors_01-24.zip", which contains 2452 .CSV files. Each actor has 104 tracked trials (60 speech, 44 song). Note, there are no song files for Actor 18.

    Total Tracked Files = (24 Actors x 60 Speech trials) + (23 Actors x 44 Song trials) = 2452 files.

    Tracking results for each trial are provided as individual comma separated value files (CSV format). File naming convention of tracked files is identical to that of the RAVDESS. For example, tracked file "01-01-01-01-01-01-01.csv" corresponds to RAVDESS audio-video file "01-01-01-01-01-01-01.mp4". For a complete description of the RAVDESS file naming convention and experimental manipulations, please see the RAVDESS Zenodo page.

    Tracking overlay videos for all trials are also provided (720p Xvid, .avi), one zip file per Actor. As the RAVDESS does not contain "ground truth" facial landmark locations, the overlay videos provide a visual 'sanity check' for researchers to confirm the general accuracy of the tracking results. The file naming convention of tracking overlay videos also matches that of the RAVDESS. For example, tracking video "01-01-01-01-01-01-01.avi" corresponds to RAVDESS audio-video file "01-01-01-01-01-01-01.mp4".

    Tracking File Output Format

    This data set retained OpenFace's data output format, described here in detail. The resolution of all input videos was 1280x720. When tracking output units are in pixels, their range of values is (0,0) (top left corner) to (1280,720) (bottom right corner).

    Columns 1-3 = Timing and Detection Confidence

    • 1. Frame - The number of the frame (source videos 30 fps), range = 1 to n
    • 2. Timestamp - Time of frame, range = 0 to m
    • 3. Confidence - Tracker confidence level in current landmark detection estimate, range = 0 to 1

    Columns 4-291 = Eye Gaze Detection

    • 4-6. gaze_0_x, gaze_0_y, gaze_0_z - Eye gaze direction vector in world coordinates for eye 0 (normalized), eye 0 is the leftmost eye in the image (think of it as a ray going from the left eye in the image in the direction of the eye gaze).
    • 7-9. gaze_1_x, gaze_1_y, gaze_1_z - Eye gaze direction vector in world coordinates for eye 1 (normalized), eye 1 is the rightmost eye in the image (think of it as a ray going from the right eye in the image in the direction of the eye gaze).
    • 10-11. gaze_angle_x, gaze_angle_y - Eye gaze direction in radians in world coordinates, averaged for both eyes. If a person is looking left-right this will results in the change of gaze_angle_x (from positive to negative) and, if a person is looking up-down this will result in change of gaze_angle_y (from negative to positive), if a person is looking straight ahead both of the angles will be close to 0 (within measurement error).
    • 12-123. eye_lmk_x_0, ..., eye_lmk_x55, eye_lmk_y_0,..., eye_lmk_y_55 - Location of 2D eye region landmarks in pixels. A figure describing the landmark index can be found here.
    • 124-291. eye_lmk_X_0, ..., eye_lmk_X55, eye_lmk_Y_0,..., eye_lmk_Y_55,..., eye_lmk_Z_0,..., eye_lmk_Z_55 - Location of 3D eye region landmarks in millimeters. A figure describing the landmark index can be found here.

    Columns 292-297 = Head pose

    • 292-294. pose_Tx, pose_Ty, pose_Tz - Location of the head with respect to camera in millimeters (positive Z is away from the camera).
    • 295-297. pose_Rx, pose_Ry, pose_Rz - Rotation of the head in radians around X,Y,Z axes with the convention R = Rx * Ry * Rz, left-handed positive sign. This can be seen as pitch (Rx), yaw (Ry), and roll (Rz). The rotation is in world coordinates with the camera being located at the origin.

    Columns 298-433 = Facial Landmarks locations in 2D

    • 298-433. x_0, ..., x_67, y_0,...y_67 - Location of 2D landmarks in pixels. A figure describing the landmark index can be found here.

    Columns 434-637 = Facial Landmarks locations in 3D

    • 434-637. X_0, ..., X_67, Y_0,..., Y_67, Z_0,..., Z_67 - Location of 3D landmarks in millimetres. A figure describing the landmark index can be found here. For these values to be accurate, OpenFace needs to have good estimates for fx,fy,cx,cy.

    Columns 638-677 = Rigid and non-rigid shape parameters

    Parameters of a point distribution model (PDM) that describe the rigid face shape (location, scale and rotation) and non-rigid face shape (deformation due to expression and identity). For more details, please refer to chapter 4.2 of my Tadas Baltrusaitis's PhD thesis [download link].

    • 638-643. p_scale, p_rx, p_ry, p_rz, p_tx, p_ty - Scale, rotation, and translation terms of the PDM.
    • 644-677. p_0, ..., p_33 - Non-rigid shape parameters.

    Columns 687-712 = Facial Action Units

    Facial Action Units (AUs) are a way to describe human facial movements (Ekman, Friesen, and Hager, 2002) [wiki link]. More information on OpenFace's implementation of AUs can be found here.

    • 678-694. AU01_r, AU02_r, AU04_r, AU05_r, AU06_r, AU07_r, AU09_r, AU10_r, AU12_r, AU14_r, AU15_r, AU17_r, AU20_r, AU23_r, AU25_r, AU26_r, AU45_r - Intensity of AU movement, range from 0 (no muscle contraction) to 5 (maximal muscle contraction).
    • 695-712. AU01_c, AU02_c, AU04_c, AU05_c, AU06_c, AU07_c, AU09_c, AU10_c, AU12_c, AU14_c, AU15_c, AU17_c, AU20_c, AU23_c, AU25_c, AU26_c, AU28_c, AU45_c - Presence or absence of 18 AUs, range 0 (absent, not detected) to 1 (present, detected).

    Note, OpenFace's columns 2 and 5 (face_id and success, respectively) were not included in this data set. These values were redundant as a single face was detected in all frames, in all 2452 trials.

    Tracking Overlay Videos

    Tracking overlay videos visualize most aspects of the tracking output described above.

    • Frame - Column 1, Top left corner of video
    • Eye Gaze - Columns 4-11. Indicated by green ray emanating from left and right eyes.
    • Eye region landmarks 2D - Columns 12-123. Red landmarks around left and right eyes, and black circles surrounding left and right irises.
    • Head pose - Columns 292-297. Blue bounding box surrounding the actor's head.
    • Facial landmarks 2D - Columns 298-433. Red landmarks on the participant's left and right eyebrows, nose, lips, and jaw.
    • Facial Action Unit Intensity - Columns 687-694. All 17 AUs are listed on the left side of the video in black text. Intensity level (0-5) of each AU is indicated by the numeric value and blue bar.
    • Facial Action Unit Presence - Columns 695-712. All 18 AUs are listed on the right side of the video in black & green text. Absence of an AU (0) is in black text with the numeric value 0.0. Presence of an AU (1) is in green text with the numeric value 1.0.

    Camera Parameters and 3D Calibration Procedure

    This data set contains accurate estimates of actors' 3D head poses. To produce these, camera parameters at the time of recording were required (distance from camera to actor, and camera field of view). These values were used with OpenCV's camera calibration procedure, described here, to produce estimates of the camera's focal length and optical center at the time of actor recordings. The four values produced by the calibration procedure (fx,fy,cx,cy) were input to OpenFace as command line arguments during facial tracking, described here, to produce accurate estimates of 3D head pose.

    Camera

  6. h

    RAVDESS

    • huggingface.co
    Updated Mar 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hoang Phuc (2025). RAVDESS [Dataset]. https://huggingface.co/datasets/HoangPhuc7679/RAVDESS
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 4, 2025
    Authors
    Hoang Phuc
    License

    Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
    License information was derived automatically

    Description

    HoangPhuc7679/RAVDESS dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. RAVDESS as .csv

    • kaggle.com
    zip
    Updated Sep 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kartik Khandelwal (2021). RAVDESS as .csv [Dataset]. https://www.kaggle.com/kartik2khandelwal/speech-emotion-dataset
    Explore at:
    zip(781110 bytes)Available download formats
    Dataset updated
    Sep 15, 2021
    Authors
    Kartik Khandelwal
    Description

    After extracting the features of the RAVDESS dataset for Speech Emotion Recognition this csv file is prepared so, you don't need to invest a lot of time and effort again.

  8. h

    RAVDESS

    • huggingface.co
    Updated Apr 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuka Tatsumi (2025). RAVDESS [Dataset]. https://huggingface.co/datasets/yukat237/RAVDESS
    Explore at:
    Dataset updated
    Apr 13, 2025
    Authors
    Yuka Tatsumi
    Description

    yukat237/RAVDESS dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. RAVDESS (ZENODO) DATASET PRE-PROCESSED

    • kaggle.com
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manish Prajapati (2025). RAVDESS (ZENODO) DATASET PRE-PROCESSED [Dataset]. https://www.kaggle.com/datasets/manishprajapati24/ravdess-zenodo-dataset-pre-processed/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 19, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Manish Prajapati
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    File name Identifiers:-

    Example file_name: - 02-01-06-01-02-01-12.mp4

    • Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
    • Vocal channel (01 = speech, 02 = song).
    • Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
    • Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
    • Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
    • Repetition (01 = 1st repetition, 02 = 2nd repetition).
    • Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).

    FileName example: - 02-01-06-01-02-01-12.mp4 1. Video-only (02) 2. Speech (01) 3. Fearful (06) 4. Normal intensity (01) 5. Statement "dogs" (02) 6. 1st Repetition (01) 7. 12th Actor (12) 8. Female, as the actor ID number is even

  10. ravdess dataset

    • kaggle.com
    zip
    Updated Feb 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Addy (2021). ravdess dataset [Dataset]. https://www.kaggle.com/addy02/ravdess-dataset
    Explore at:
    zip(225014016 bytes)Available download formats
    Dataset updated
    Feb 14, 2021
    Authors
    Addy
    Description

    Dataset

    This dataset was created by Addy

    Contents

    It contains the following files:

  11. z

    ravdess in WebDataset Format

    • zenodo.org
    tar
    Updated Jan 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niu Yadong; Niu Yadong (2025). ravdess in WebDataset Format [Dataset]. http://doi.org/10.5281/zenodo.14722524
    Explore at:
    tarAvailable download formats
    Dataset updated
    Jan 23, 2025
    Dataset provided by
    xiaomi
    Authors
    Niu Yadong; Niu Yadong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is the ravdess dataset, formatted in the WebDataset format. WebDataset files are essentially tar archives, where each example in the dataset is represented by a pair of files: a WAV audio file and a corresponding JSON metadata file. The JSON file contains the class label and other relevant information for that particular audio sample.

    $ tar tvf ravdess_fold_0_0000000.tar |head
    -r--r--r-- bigdata/bigdata 24 2025-01-10 15:44 03-01-08-01-01-01-11.json
    -r--r--r-- bigdata/bigdata 341912 2025-01-10 15:44 03-01-08-01-01-01-11.wav
    -r--r--r-- bigdata/bigdata   22 2025-01-10 15:44 03-01-07-02-01-02-05.json
    -r--r--r-- bigdata/bigdata 424184 2025-01-10 15:44 03-01-07-02-01-02-05.wav
    -r--r--r-- bigdata/bigdata   22 2025-01-10 15:44 03-01-06-01-01-02-10.json
    -r--r--r-- bigdata/bigdata 377100 2025-01-10 15:44 03-01-06-01-01-02-10.wav
    -r--r--r-- bigdata/bigdata   24 2025-01-10 15:44 03-01-08-01-02-01-16.json
    -r--r--r-- bigdata/bigdata 396324 2025-01-10 15:44 03-01-08-01-02-01-16.wav
    -r--r--r-- bigdata/bigdata   24 2025-01-10 15:44 03-01-08-01-02-02-22.json
    -r--r--r-- bigdata/bigdata 404388 2025-01-10 15:44 03-01-08-01-02-02-22.wav

    $ cat 03-01-08-01-01-01-11.json {"emotion": "surprised"}
  12. h

    RAVDESS

    • huggingface.co
    Updated Aug 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neo Ho (2024). RAVDESS [Dataset]. https://huggingface.co/datasets/windcrossroad/RAVDESS
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2024
    Authors
    Neo Ho
    Description

    windcrossroad/RAVDESS dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. ravdess-Emotional-speech_song-train_test-csv

    • kaggle.com
    Updated Jan 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kuntal Das599 (2021). ravdess-Emotional-speech_song-train_test-csv [Dataset]. https://www.kaggle.com/kuntaldas599/ravdessemotionalspeech-songtrain-testcsv
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 8, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kuntal Das599
    Description

    Dataset

    This dataset was created by Kuntal Das599

    Contents

  14. h

    SER-RAVDESS-Augmented

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yuval ratzabi, SER-RAVDESS-Augmented [Dataset]. https://huggingface.co/datasets/yuvalira/SER-RAVDESS-Augmented
    Explore at:
    Authors
    yuval ratzabi
    Description

    RAVDESS-SER-Augmented

    This dataset is an augmented version of the RAVDESS speech subset, created to support robust training of Speech Emotion Recognition (SER) models such as MS-SincResNet.It contains .pt files for each sample, storing the raw waveform (3 seconds, 16kHz mono) and its corresponding emotion label (0โ€“7).

      Dataset Structure
    

    Each .pt file is a dictionary containing:

    "waveform": 1D float tensor of shape 48000 "label": integer in [0โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/yuvalira/SER-RAVDESS-Augmented.

  15. h

    ravdess

    • huggingface.co
    Updated Sep 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DuanYu (2024). ravdess [Dataset]. https://huggingface.co/datasets/duanyu027/ravdess
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2024
    Authors
    DuanYu
    Description

    duanyu027/ravdess dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. h

    RAVDESS

    • huggingface.co
    Updated Apr 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandre Aguedo (2025). RAVDESS [Dataset]. https://huggingface.co/datasets/Aguedo/RAVDESS
    Explore at:
    Dataset updated
    Apr 4, 2025
    Authors
    Alexandre Aguedo
    Description

    RAVDESS Dataset

    This dataset contains a subset of RAVDESS for emotion detection.

  17. ravdess

    • kaggle.com
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jubaer Ahamed Bhuiyan (2025). ravdess [Dataset]. https://www.kaggle.com/datasets/jubaerahamedbhuiyan/ravdess/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 25, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jubaer Ahamed Bhuiyan
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Jubaer Ahamed Bhuiyan

    Released under CC BY-SA 4.0

    Contents

  18. f

    Description of factor-level coding of RAVDESS filenames.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven R. Livingstone; Frank A. Russo (2023). Description of factor-level coding of RAVDESS filenames. [Dataset]. http://doi.org/10.1371/journal.pone.0196391.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Steven R. Livingstone; Frank A. Russo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description of factor-level coding of RAVDESS filenames.

  19. ravdess data

    • kaggle.com
    Updated May 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    deepakk6 (2022). ravdess data [Dataset]. https://www.kaggle.com/datasets/deepakk6/ravdess-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 8, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    deepakk6
    Description

    Dataset

    This dataset was created by deepakk6

    Contents

  20. Noised versions of RAVDESS

    • kaggle.com
    Updated Apr 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marwa H22 (2024). Noised versions of RAVDESS [Dataset]. https://www.kaggle.com/datasets/marwah22/noised-versions-of-ravdess/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 15, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Marwa H22
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Marwa H22

    Released under Apache 2.0

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo (2024). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) [Dataset]. http://doi.org/10.5281/zenodo.1188976
Organization logo

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)

Explore at:
62 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Oct 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Steven R. Livingstone; Steven R. Livingstone; Frank A. Russo; Frank A. Russo
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Description

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). The dataset contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound). Note, there are no song files for Actor_18.

The RAVDESS was developed by Dr Steven R. Livingstone, who now leads the Affective Data Science Lab, and Dr Frank A. Russo who leads the SMART Lab.

Citing the RAVDESS

The RAVDESS is released under a Creative Commons Attribution license, so please cite the RAVDESS if it is used in your work in any form. Published academic papers should use the academic paper citation for our PLoS1 paper. Personal works, such as machine learning projects/blog posts, should provide a URL to this Zenodo page, though a reference to our PLoS1 paper would also be appreciated.

Academic paper citation

Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.

Personal use citation

Include a link to this Zenodo page - https://zenodo.org/record/1188976

Commercial Licenses

Commercial licenses for the RAVDESS can be purchased. For more information, please visit our license page of fees, or contact us at ravdess@gmail.com.

Contact Information

If you would like further information about the RAVDESS, to purchase a commercial license, or if you experience any issues downloading files, please contact us at ravdess@gmail.com.

Example Videos

Watch a sample of the RAVDESS speech and song videos.

Emotion Classification Users

If you're interested in using machine learning to classify emotional expressions with the RAVDESS, please see our new RAVDESS Facial Landmark Tracking data set [Zenodo project page].

Construction and Validation

Full details on the construction and perceptual validation of the RAVDESS are described in our PLoS ONE paper - https://doi.org/10.1371/journal.pone.0196391.

The RAVDESS contains 7356 files. Each file was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability, and test-retest intrarater reliability were reported. Validation data is open-access, and can be downloaded along with our paper from PLoS ONE.

Contents

Audio-only files

Audio-only files of all actors (01-24) are available as two separate zip files (~200 MB each):

  • Speech file (Audio_Speech_Actors_01-24.zip, 215 MB) contains 1440 files: 60 trials per actor x 24 actors = 1440.
  • Song file (Audio_Song_Actors_01-24.zip, 198 MB) contains 1012 files: 44 trials per actor x 23 actors = 1012.

Audio-Visual and Video-only files

Video files are provided as separate zip downloads for each actor (01-24, ~500 MB each), and are split into separate speech and song downloads:

  • Speech files (Video_Speech_Actor_01.zip to Video_Speech_Actor_24.zip) collectively contains 2880 files: 60 trials per actor x 2 modalities (AV, VO) x 24 actors = 2880.
  • Song files (Video_Song_Actor_01.zip to Video_Song_Actor_24.zip) collectively contains 2024 files: 44 trials per actor x 2 modalities (AV, VO) x 23 actors = 2024.

File Summary

In total, the RAVDESS collection includes 7356 files (2880+2024+1440+1012 files).

File naming convention

Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics:

Filename identifiers

  • Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
  • Vocal channel (01 = speech, 02 = song).
  • Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
  • Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
  • Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
  • Repetition (01 = 1st repetition, 02 = 2nd repetition).
  • Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).


Filename example: 02-01-06-01-02-01-12.mp4

  1. Video-only (02)
  2. Speech (01)
  3. Fearful (06)
  4. Normal intensity (01)
  5. Statement "dogs" (02)
  6. 1st Repetition (01)
  7. 12th Actor (12)
  8. Female, as the actor ID number is even.

License information

The RAVDESS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NC-SA 4.0

Commercial licenses for the RAVDESS can also be purchased. For more information, please visit our license fee page, or contact us at ravdess@gmail.com.

Related Data sets

Search
Clear search
Close search
Google apps
Main menu