AMASS: Archive of Motion Capture As Surface Shapes

Mahmood, Naureen; Ghorbani, Nima; Troje, Nikolaus F.; Pons-Moll, Gerard; Black, Michael J.

doi:10.1109/iccv.2019.00554

Cited by 881 publications

(535 citation statements)

References 33 publications

Supporting

Mentioning

532

Contrasting

Unclassified

Order By: Relevance

“…To do so, we make several significant improvements over SMPLify. Specifically, we learn a new, and better performing, pose prior from a large dataset of motion capture data [47,50] using a variational auto-encoder. This prior is critical because the mapping from 2D features to 3D pose is ambiguous.…”

Section: Introductionmentioning

confidence: 99%

Expressive Body Capture: 3D Hands, Face, and Body From a Single Image

Pavlakos

Choutas

Ghorbani

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

1,282

991

View full text Add to dashboard Cite

To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. To achieve this, we use thousands of 3D scans to train a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with fully articulated hands and an expressive face. Learning to regress the parameters of SMPL-X directly from images is challenging without paired images and 3D ground truth. Consequently, we follow the approach of SMPLify, which estimates 2D features and then optimizes model parameters to fit the features. We improve on SMPLify in several significant ways: (1) we detect 2D features corresponding to the face, hands, and feet and fit the full SMPL-X model to these; (2) we train a new neural network pose prior using a large MoCap dataset; (3) we define a new interpenetration penalty that is both fast and accurate; (4) we automatically detect gender and the appropriate body models (male, female, or neutral); (5) our PyTorch implementation achieves a speedup of more than 8× over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild. We evaluate 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth. This is a step towards automatic expressive human capture from monocular RGB data. The models, code, and data are available for research purposes at https://smpl-x.is.tue.mpg.de.

show abstract

Section: Introductionmentioning

confidence: 99%

Expressive Body Capture: 3D Hands, Face, and Body From a Single Image

Pavlakos

Choutas

Ghorbani

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

1,282

991

View full text Add to dashboard Cite

show abstract

“…We employ our new quantitative dataset with mesh pseudo ground-truth based on Vicon and MoSh++ [41], as described in Section 4. The first row with only E J is an RGBonly baseline similar to SMPLify-X [49], that we adapt to our needs by using a fixed camera and estimating body translation γ, and gives the biggest "PJE" and "V2V" error.…”

Section: (A)mentioning

confidence: 99%

Resolving 3D Human Pose Ambiguities With 3D Scene Constraints

Hassan

Choutas

Tzionas

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

251

299

View full text Add to dashboard Cite

Figure 1: Standard 3D body estimation methods predict bodies that may be inconsistent with the 3D scene even though the results may look reasonable from the camera viewpoint. To address this, we exploit the 3D scene structure and introduce scene constraints for contact and inter-penetration. From left to right: (1) RGB image (top) and 3D scene reconstruction (bottom), (2) overlay of estimated bodies on the original RGB image without (yellow) and with (gray) scene constraints, 3D rendering of both the body and the scene from (3) camera view, (4) top view and (5) side view. AbstractTo understand and analyze human behavior, we need to capture humans moving in, and interacting with, the world. Most existing methods perform 3D human pose estimation without explicitly considering the scene. We observe however that the world constrains the body and vice-versa. To motivate this, we show that current 3D human pose estimation methods produce results that are not consistent with the 3D scene. Our key contribution is to exploit static 3D scene structure to better estimate human pose from monocular images. The method enforces Proximal Relationships with Object eXclusion and is called PROX. To test this, we collect a new dataset composed of 12 different 3D scenes and RGB sequences of 20 subjects moving in and interacting with the scenes. We represent human pose using the 3D human body model SMPL-X and extend SMPLify-X to estimate body pose using scene constraints. We make use of the 3D scene information by formulating two main constraints. The inter-penetration constraint penalizes intersection be-tween the body model and the surrounding 3D scene. The contact constraint encourages specific parts of the body to be in contact with scene surfaces if they are close enough in distance and orientation. For quantitative evaluation we capture a separate dataset with 180 RGB frames in which the ground-truth body pose is estimated using a motion capture system. We show quantitatively that introducing scene constraints significantly reduces 3D joint error and vertex error. Our code and data are available for research at https://prox.is.tue.mpg.de.

show abstract

“…To find such poses, we use 3D MoCap datasets [43,44,45] that capture 3D MoCap marker positions, glued onto the skin surface of real human subjects. We then employ MoSh [16,17] that fits our body model to these 3D markers by optimizing over parameters of the body model for articulated pose, translation and shape. The pose specifically is a vector of axis-angle parameters, that describes how to rotate each body part around its corresponding skeleton joint.…”

Section: Human Body Generationmentioning

confidence: 99%

“…We then place humans on random indoor backgrounds and simulate human activities like running, walking, dancing etc. using motion capture data [16,17]. Thus, we create a large virtual dataset that captures the statistics of natural human motion in multi-person scenarios.…”

Section: Introductionmentioning

confidence: 99%

Learning Multi-human Optical Flow

et al. 2020

View full text Add to dashboard Cite

The optical flow of humans is well known to be useful for the analysis of human action. Recent optical flow methods focus on training deep networks to approach the problem. However, the training data used by them does not cover the domain of human motion. Therefore, we develop a dataset of multi-human optical flow and train optical flow networks on this dataset. We use a 3D model of the human body and motion capture data to synthesize realistic flow fields in both single-and multi-person images. We then train optical flow networks to estimate human flow fields from pairs of images. We demonstrate that our trained networks are more accurate than a wide range of top methods on heldout test data and that they can generalize well to real image sequences. The code, trained models and the dataset are available for research.

show abstract

AMASS: Archive of Motion Capture As Surface Shapes

Cited by 881 publications

References 33 publications

Expressive Body Capture: 3D Hands, Face, and Body From a Single Image

Expressive Body Capture: 3D Hands, Face, and Body From a Single Image

Resolving 3D Human Pose Ambiguities With 3D Scene Constraints

Learning Multi-human Optical Flow

Contact Info

Product

Resources

About