Motion-X is a large-scale 3D expressive whole-body motion dataset, which comprises 15.6M precise 3D whole-body pose annotations (i.e., SMPL-X) covering 81.1K motion sequences from massive scenes, meanwhile providing corresponding semantic labels and pose descriptions.
In this paper, we introduce Motion-X++, a large-scale multimodal 3D expressive whole-body human motion dataset. Existing motion datasets predominantly capture body-only poses, lacking facial expressions, hand gestures, and fine-grained pose descriptions, and are typically limited to lab settings with manually labeled text descriptions, thereby restricting their scalability. To address this issue, we develop a scalable annotation pipeline that can automatically capture 3D whole-body human motion and comprehensive textural labels from RGB videos and build the Motion-X dataset comprising 81.1K text-motion pairs. Furthermore, we extend Motion-X into Motion-X++ by improving the annotation pipeline, introducing more data modalities, and scaling up the data quantities. Motion-X++ provides 19.5M 3D whole-body pose annotations covering 120.5K motion sequences from massive scenes, 80.8K RGB videos, 45.3K audios, 19.5M frame-level whole-body pose descriptions, and 120.5K sequence-level semantic labels. Comprehensive experiments validate the accuracy of our annotation pipeline and highlight Motion-X++’s significant benefits for generating expressive, precise, and natural motion with paired multimodal labels supporting several downstream tasks, including text-driven whole-body motion generation, audio-driven motion generation, 3D whole-body human mesh recovery, and 2D whole-body keypoints estimation, etc.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Despite the diversity in fish auditory structures, it remains elusive how otolith morphology and swim bladder-inner ear (= otophysic) connections affect otolith motion and inner ear stimulation. A recent study visualized sound-induced otolith motion; but tank acoustics revealed a complex mixture of sound pressure and particle motion. To separate sound pressure and sound-induced particle motion, we constructed a transparent standing wave tube-like tank equipped with an inertial shaker at each end while using X-ray phase contrast imaging. Driving the shakers in phase resulted in maximised sound pressure at the tank centre, whereas particle motion was maximised when shakers were driven out of phase (180°). We studied the effects of two types of otophysic connections—i.e. the Weberian apparatus (Carassius auratus) and anterior swim bladder extensions contacting the inner ears (Etroplus canarensis)—on otolith motion when fish were subjected to a 200 Hz stimulus. Saccular otolith motion was more pronounced when the swim bladder walls oscillated under the maximised sound pressure condition. The otolith motion patterns mainly matched the orientation patterns of ciliary bundles on the sensory epithelia. Our setup enabled the characterization of the interplay between the auditory structures and provided first experimental evidence of how different types of otophysic connections affect otolith motion.
Dataset of high-resolution (4096×2160), high-fps (1000fps) video frames with extreme motion. X-TEST consists of 15 video clips with 33-length of 4K-1000fps frames. X-TRAIN consists of 4,408 clips from various types of 110 scenes. The clips are 65-length of 1000fps frames
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Motion-X is a large-scale 3D expressive whole-body motion dataset, which comprises 15.6M precise 3D whole-body pose annotations (i.e., SMPL-X) covering 81.1K motion sequences from massive scenes, meanwhile providing corresponding semantic labels and pose descriptions.