Expressive Body Capture: 3D Hands, Face, and Body From a Single Image

Pavlakos, Georgios; Choutas, Vasileios; Ghorbani, Nima; Bolkart, Timo; Osman, Ahmed A. A.; Tzionas, Dimitrios; Black, Michael J.

doi:10.1109/cvpr.2019.01123

Cited by 1,283 publications

(1,076 citation statements)

References 65 publications

Supporting

Mentioning

1,070

Contrasting

Unclassified

Order By: Relevance

“…3D point clouds or meshes ( Figure 1c), have become a popular and elegant way to capture the soft-tissue and shape of bodies, which are highly important features for person identification, fashion (i.e. clothing sales), and in medicine (21,42,43). However, state-of-the-art performance currently requires body-scanning of many subjects to make body models.…”

Section: Dense-representations Of Bodiesmentioning

confidence: 99%

Deep learning tools for the measurement of animal behavior in neuroscience

Mathis

2020

Current Opinion in Neurobiology

340

237

View full text Add to dashboard Cite

Recent advances in computer vision have made accurate, fast and robust measurement of animal behavior a reality. In the past years powerful tools specifically designed to aid the measurement of behavior have come to fruition. Here we discuss how capturing the postures of animals -pose estimation -has been rapidly advancing with new deep learning methods. While challenges still remain, we envision that the fast-paced development of new deep learning tools will rapidly change the landscape of realizable real-world neuroscience. Highlights:1. Deep neural networks are shattering performance benchmarks in computer vision for various tasks.2. Using modern deep learning approaches (DNNs) in the lab is a fruitful approach for robust, fast, and efficient measurement of animal behavior.3. New DNN-based tools allow for customized tracking approaches, which opens new avenues for more flexible and ethologically relevant real-world neuroscience.

show abstract

Section: Dense-representations Of Bodiesmentioning

confidence: 99%

Deep learning tools for the measurement of animal behavior in neuroscience

Mathis

2020

Current Opinion in Neurobiology

340

237

View full text Add to dashboard Cite

show abstract

“…Recent methods based on deep learning, extend 3D human pose estimation to complex scenes [32,42,48,50] but the 3D accuracy is limited. To estimate human-scene interaction, however, more realistic body models are needed that include fully articulated hands such as in [31,49].…”

Section: Related Workmentioning

confidence: 99%

“…Thus, we penalize poses in which the body interpenetrates scene objects. We formulate this "exclusion principle" as a differentiable loss function that we incorporate into the SMPLify-X pose estimation method [49].…”

Section: Introductionmentioning

confidence: 99%

“…Our method extends SMPLify-X [49], which fits a 3D body model "top down" to "bottom up" features (e.g. 2D joint detections).…”

Section: Introductionmentioning

confidence: 99%

“…The former is easy to obtain today with many scanning technologies but, if the body model is not accurate, it does not make sense to reason about contact and inter-penetration. Consequently we use the SMPL-X body model [49], which is realistic enough to serve as a "proxy" for the real human in the 3D scene. In particular, the feet, hands, and body of the model have realistic shape and degrees of freedom.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Resolving 3D Human Pose Ambiguities With 3D Scene Constraints

Hassan

Choutas

Tzionas

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

251

299

View full text Add to dashboard Cite

Figure 1: Standard 3D body estimation methods predict bodies that may be inconsistent with the 3D scene even though the results may look reasonable from the camera viewpoint. To address this, we exploit the 3D scene structure and introduce scene constraints for contact and inter-penetration. From left to right: (1) RGB image (top) and 3D scene reconstruction (bottom), (2) overlay of estimated bodies on the original RGB image without (yellow) and with (gray) scene constraints, 3D rendering of both the body and the scene from (3) camera view, (4) top view and (5) side view. AbstractTo understand and analyze human behavior, we need to capture humans moving in, and interacting with, the world. Most existing methods perform 3D human pose estimation without explicitly considering the scene. We observe however that the world constrains the body and vice-versa. To motivate this, we show that current 3D human pose estimation methods produce results that are not consistent with the 3D scene. Our key contribution is to exploit static 3D scene structure to better estimate human pose from monocular images. The method enforces Proximal Relationships with Object eXclusion and is called PROX. To test this, we collect a new dataset composed of 12 different 3D scenes and RGB sequences of 20 subjects moving in and interacting with the scenes. We represent human pose using the 3D human body model SMPL-X and extend SMPLify-X to estimate body pose using scene constraints. We make use of the 3D scene information by formulating two main constraints. The inter-penetration constraint penalizes intersection be-tween the body model and the surrounding 3D scene. The contact constraint encourages specific parts of the body to be in contact with scene surfaces if they are close enough in distance and orientation. For quantitative evaluation we capture a separate dataset with 180 RGB frames in which the ground-truth body pose is estimated using a motion capture system. We show quantitatively that introducing scene constraints significantly reduces 3D joint error and vertex error. Our code and data are available for research at https://prox.is.tue.mpg.de.

show abstract

Single‐image human mesh reconstruction by parallel spatial feature aggregation

Liu

2022

Computer Animation & Virtual

View full text Add to dashboard Cite

Recovering human mesh from a single image with natural postures is a challenging task in human modeling and animation. Model-free methods regress the mesh vertices from the input image directly to avoid the 6-DoF human joint extraction from the 2D image. However, the missing of the global information in spatial feature aggregation of the existing GNNs may result in the undesired deformity and inaccuracy of the recovered human mesh. To address this issue, we propose a parallel-aggregating network with a novelly designed global layer for spatial feature extracting from random walk normalized matrix. Moreover, the coarse body mesh (head, hand, foot, etc.) provided by the coarsening network can add the human characteristic to the mesh. The local and global spatial features are aggregated to update vertice coordinates following an iterative, coarse-to-fine process to obtain an accurate and smooth human mesh. Experiments validated the effectiveness and robustness of the proposed approaches for single-image human mesh recovery.

show abstract

Expressive Body Capture: 3D Hands, Face, and Body From a Single Image

Cited by 1,283 publications

References 65 publications

Deep learning tools for the measurement of animal behavior in neuroscience

Deep learning tools for the measurement of animal behavior in neuroscience

Resolving 3D Human Pose Ambiguities With 3D Scene Constraints

Single‐image human mesh reconstruction by parallel spatial feature aggregation

Contact Info

Product

Resources

About