https://sail.usc.edu/iemocap/Data_Release_Form_IEMOCAP.pdfhttps://sail.usc.edu/iemocap/Data_Release_Form_IEMOCAP.pdf
The Interactive Emotional Dyadic Motion Capture (IEMOCAP) database is an acted, multimodal and multispeaker database, recently collected at SAIL lab at USC. It contains approximately 12 hours of audiovisual data, including video, speech, motion capture of face, text transcriptions. It consists of dyadic sessions where actors perform improvisations or scripted scenarios, specifically selected to elicit emotional expressions. IEMOCAP database is annotated by multiple annotators into categorical labels, such as anger, happiness, sadness, neutrality, as well as dimensional labels such as valence, activation and dominance. The detailed motion capture information, the interactive setting to elicit authentic emotions, and the size of the database make this corpus a valuable addition to the existing databases in the community for the study and modeling of multimodal and expressive human communication.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Face-to-face conversations are central to human communication and a fascinating example of joint action. Beyond verbal content, one of the primary ways in which information is conveyed in conversations is body language. Body motion in natural conversations has been difficult to study precisely due to the large number of coordinates at play. There is need for fresh approaches to analyze and understand the data, in order to ask whether dyads show basic building blocks of coupled motion. Here we present a method for analyzing body motion during joint action using depth-sensing cameras, and use it to analyze a sample of scientific conversations. Our method consists of three steps: defining modes of body motion of individual participants, defining dyadic modes made of combinations of these individual modes, and lastly defining motion motifs as dyadic modes that occur significantly more often than expected given the single-person motion statistics. As a proof-of-concept, we analyze the motion of 12 dyads of scientists measured using two Microsoft Kinect cameras. In our sample, we find that out of many possible modes, only two were motion motifs: synchronized parallel torso motion in which the participants swayed from side to side in sync, and still segments where neither person moved. We find evidence of dyad individuality in the use of motion modes. For a randomly selected subset of 5 dyads, this individuality was maintained for at least 6 months. The present approach to simplify complex motion data and to define motion motifs may be used to understand other joint tasks and interactions. The analysis tools developed here and the motion dataset are publicly available.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Parents can effortlessly assist their child to walk, but the mechanism behind such physical coordination is still unknown. Studies have suggested that physical coordination is achieved by interacting humans who update their movement or motion plan in response to the partner’s behaviour. Here, we tested rigidly coupled pairs in a joint reaching task to observe such changes in the partners’ motion plans. However, the joint reaching movements were surprisingly consistent across different trials. A computational model that we developed demonstrated that the two partners had a distinct motion plan, which did not change with time. These results suggest that rigidly coupled pairs accomplish joint reaching movements by relying on a pre-programmed motion plan that is independent of the partner’s behaviour.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this paper, we introduce a novel method for classifying and computing the frequencies of movement modes of intra- and interspecific dyads, focusing in particular on distance-mediated approach, retreat, following and side by side movement modes. Besides distance, other factors such as time of day, season, sex, or age can be included in the analysis to assess if they cause frequencies of movement modes to deviate from random. By subdividing the data according to selected factors, our method allows us to identify those responsible for (or correlated with) significant differences in the behaviour of dyadic pairs. We demonstrate and validate our method using both simulated and empirical data. Our simulated data were obtained from a relative-motion, biased random-walk (RM-BRW) model with attraction and repulsion components. Our empirical data were GPS relocation data collected from African elephants in Etosha National Park, Namibia. The simulated data were primarily used to validate our method while the empirical data were used to illustrate the types of behavioural assessment that our methodology reveals. Our method facilitates automated, observer-bias-free analysis of the locomotive interactions of dyads using GPS relocation data, which are becoming increasingly ubiquitous as telemetry and related technologies improve. It should open up a whole new vista of behavioural-interaction type analyses to movement and behavioural ecologists.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The current project holds raw and processed data from 12 couples performing multiperson and single person sports related movements. The movements are captures with 8 Sony camera's from Theia3D (markerless motion capture system) and 16 Vicon camera's (marker-based motion capture system). The dataset is more widely explained in the article "Examining the Concurrent Validity of Markerless Motion Capture in Dual-Athlete Team Sports Movements" by Oonk et al. (2025).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dyadic output for the “Blau” object generated from BSANet.rda.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
VAD Scoring: - VAD Scoring: Voice Activity Detection and emotion dimensionality (Valence, Arousal, Dominance) computed using audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim model.
Metadata: Includes audio duration, transcript length, character counts, and VAD scores
An example of a field in the metadata file is as follows { "file_id": "emovdb_amused_1-15_0001_1933", "original_path": "..\data_collection\tts_data\processed\emovdb\amused_1-15_0001.wav", "dataset": "emovdb", "status": "success", "error": null, "processed_audio_path": "None", "transcript_path": "processed_datasets\transcripts\emovdb_amused_1-15_0001_1933.json", "vad_path": "processed_datasets\vad_scores\emovdb_amused_1-15_0001_1933.json", "text": "Author of the Danger Trail, Phillips Deals, etc.", "language": "en", "audio_duration": 4.384671201814059, "text_length": 48, "valence": 0.7305971384048462, "arousal": 0.704948365688324, "dominance": 0.6887099146842957, "vad_confidence": 0.9830486676764241 },
This work is licensed under CC BY-NC-SA 4.0.
Required Citations: - CREMA-D: Cao, H., Cooper, D. G., Keutmann, M. K., Gur, R. C., Nenkova, A., & Verma, R. (2014). CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset. IEEE Transactions on Affective Computing, 5(4), 377-390.
EmoV-DB: Adigwe, A., Tits, N., Haddad, K. E., Ostadabbas, S., & Dutoit, T. (2018). The emotional voices database: Towards controlling the emotion dimension in voice generation systems. arXiv preprint arXiv:1806.09514.
IEMOCAP: Busso, C., Bulut, M., Lee, C. C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J. N., Lee, S., & Narayanan, S. S. (2008). IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42(4), 335-359.
RAVDESS: Livingstone, S. R., & Russo, F. A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE, 13(5), e0196391.
This Zenodo repository contains the main dataset for the GENEA Challenge 2023, which is based on the Talking With Hands 16.2M dataset.
Notation:
Please take note of the following nomenclature when reading this document:
main agent refers to the speaker in the dyadic interaction for which the systems generated motions.
interlocutor refers to the speaker in front of the main agent.
Contents:
The “genea2023_trn" and "genea2023_val" zip files contain audio files (in WAV format), time-aligned transcriptions (in TSV format), and motion files (in BVH format) for the training and validation datasets, respectively.
The "genea2023_test" zip file contains audio files (in WAV format) and transcriptions (in TSV format) for the test set, but no motion. The corresponding test motion is available at:
https://zenodo.org/record/8146027
Each zip file also contains a "metadata.csv" file that contains information for all files regarding the speaker ID and whether or not the motion files contain finger motion.
Note that the speech audio in the data sometimes has been replaced by silence for the purpose of anonymisation.
In the test set, files with indices from 0 to 40 correspond to "matched" interactions (the core test set), where main agent and interlocutor data come from the same conversation, whilst file indices from 41 to 69 correspond to "mismatched" interactions (the extended test set), where main agent and interlocutor data come from different conversations.
Folder structure:
main-agent/ (main agent): Encapsulates BVH, TSV, WAV data subfolders for the main agent.
interloctr/ (interlocutor): Encapsulates BVH, TSV, WAV data subfolders for the interlocutor.
bvh/ (motion): Time-aligned 3D full-body motion-capture data in BVH format from a speaking and gesticulating actor. Each file is a single person, but each data sample contains files for both the main agent and the interlocutor.
wav/ (audio): Recorded audio data in WAV format from a speaking and gesticulating actor with a close-talking microphone. Parts of the audio recordings have been muted to omit personally identifiable information.
tsv/ (text): Word-level time-aligned text transcriptions of the above audio recordings in TSV format (tab-separated values). For privacy reasons, the transcriptions do not include references to personally identifiable information, similar to the audio files.
Data processing scripts:
We provide a number of optional scripts for encoding and processing the challenge data:
Audio: Scripts for extracting basic audio features, such as spectrograms, prosodic features, and mel-frequency cepstral coefficients (MFCCs) can be found at this link.
Text: A script to encode text transcriptions to word vectors using FastText is available here: tsv2wordvectors.py
Motion: If you wish to encode the joint angles from the BVH files to and from an exponential map representation, you can use scripts by Simon Alexanderson based on the PyMo library, which are available here:
bvh2features.py
features2bvh.py
Attribution:
If you use this material, please cite our latest paper on the GENEA Challenge 2023. At the time of writing (2023-07-25) this is our ACM ICMI 2023 paper:
Taras Kucherenko, Rajmund Nagy, Youngwoo Yoon, Jieyeon Woo, Teodor Nikolov, Mihail Tsakov, and Gustav Eje Henter. 2023. The GENEA Challenge 2023: A large-scale evaluation of gesture generation models in monadic and dyadic settings. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI ’23). ACM.
Also, please cite the paper about the original dataset from Meta Research:
Gilwoo Lee, Zhiwei Deng, Shugao Ma, Takaaki Shiratori, Siddhartha S. Srinivasa, and Yaser Sheikh. 2019. Talking With Hands 16.2M: A large-scale dataset of synchronized body-finger motion and audio for conversational motion analysis and synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV ’19). IEEE, 763–772.
The motion and audio files are based on the Talking With Hands 16.2M dataset at https://github.com/facebookresearch/TalkingWithHands32M/. The material is available under a CC BY NC 4.0 Attribution-NonCommercial 4.0 International license, with the text provided in LICENSE.txt.
To find more GENEA Challenge 2023 material on the web, please see:
https://genea-workshop.github.io/2023/challenge/
If you have any questions or comments, please contact:
The GENEA Challenge organisers
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Zenodo repository contains 3D motion in the Biovision Hierarchy (BVH) format for all test-set motion submitted by teams participating in the GENEA Challenge 2023.
Contents:
The "genea_2023_test_bvh" zip file corresponds to the BVH files themselves (2GB).
The repository also contains the time stamps for cutting out the evaluation segments we used (matched and mismatched) for the two kinds of studies we conducted (monadic and dyadic) in the files "monadic_segment_selection_info.csv" and "dyadic_segment_selection_info.csv".
Attribution:
If you use this material, please cite our latest paper on the GENEA Challenge 2023. At the time of writing (2023-07-11) this is our ACM ICMI 2023 paper:
Taras Kucherenko, Rajmund Nagy, Youngwoo Yoon, Jieyeon Woo, Teodor Nikolov, Mihail Tsakov, and Gustav Eje Henter. 2023. The GENEA Challenge 2023: A large-scale evaluation of gesture generation models in monadic and dyadic settings. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI ’23). ACM.
Condition NA in the data contains motion from the Talking With Hands 16.2M dataset at https://github.com/facebookresearch/TalkingWithHands32M/. These are licensed under a CC BY NC 4.0 international license. The remaining material is available under a CC BY 4.0 international license, with the text provided in LICENSE.txt.
To find more GENEA Challenge 2023 material on the web, please see:
* https://genea-workshop.github.io/2023/challenge/
If you have any questions or comments, please contact:
* The GENEA Challenge organisers
The GENEA Challenge 2020 dataset is a large-scale open challenge for data-driven automatic co-speech gesture generation. It consists of 50 hours of full-body motion capture, including fingers, of different persons engaging in a dyadic conversation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundCoaching is increasingly viewed as a dyadic exchange of verbal and non-verbal interactions driving clients' progress. Yet, little is known about how the trajectory of dyadic interactions plays out in workplace coaching.MethodThis paper provides a multiple-step exploratory investigation of movement synchrony (MS) of dyads in workplace coaching. We analyzed a publicly available dataset of 173 video-taped dyads. Specifically, we averaged MS per session/dyad to explore the temporal patterns of MS across (a) the cluster of dyads that completed 10 sessions, and (b) a set of 173 dyadic interactions with a varied number of sessions. Additionally, we linked that pattern to several demographic predictors. The results indicate a differential downward trend of MS.ResultsDemographic factors do not predict best fitting MS curve types, and only client age and coach experience show a small but significant correlation.DiscussionWe provide contextualized interpretations of these findings and propose conceptual considerations and recommendations for future coaching process research and practice.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study addressed the question whether or not social collaboration has an effect on delay discounting, the tendency to prefer sooner but smaller over later but larger delivered rewards. We applied a novel paradigm in which participants executed choices between two gains in an individual and in a dyadic decision-making condition. We observed how participants reached mutual consent via joystick movement coordination and found lower discounting and a higher decisions’ efficiency. In order to establish the underlying mechanism for dyadic variation, we further tested whether these differences emerge from social facilitation or inner group interchange.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ICC, SEM and MDC for the two free hand polygon image analyses undertaken using the ThermaCam Researcher Professional 2.10 software.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://sail.usc.edu/iemocap/Data_Release_Form_IEMOCAP.pdfhttps://sail.usc.edu/iemocap/Data_Release_Form_IEMOCAP.pdf
The Interactive Emotional Dyadic Motion Capture (IEMOCAP) database is an acted, multimodal and multispeaker database, recently collected at SAIL lab at USC. It contains approximately 12 hours of audiovisual data, including video, speech, motion capture of face, text transcriptions. It consists of dyadic sessions where actors perform improvisations or scripted scenarios, specifically selected to elicit emotional expressions. IEMOCAP database is annotated by multiple annotators into categorical labels, such as anger, happiness, sadness, neutrality, as well as dimensional labels such as valence, activation and dominance. The detailed motion capture information, the interactive setting to elicit authentic emotions, and the size of the database make this corpus a valuable addition to the existing databases in the community for the study and modeling of multimodal and expressive human communication.