First-person pose recognition using egocentric workspaces

Rogez, Grégory; Supancic, James Steven; Ramanan, Deva

doi:10.1109/cvpr.2015.7299061

Cited by 93 publications

(80 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Hand-object reconstruction. Joint reconstruction of hands and objects has been studied with multi-view RGB [2,42,74] and RGB-D input with either optimization [16,17,43,47,62,[69][70][71] or classification [51][52][53][54] approaches. These works use rigid objects, except for a few that use articulated [70] or deformable objects [69].…”

Section: Related Workmentioning

confidence: 99%

Learning Joint Reconstruction of Hands and Manipulated Objects

Hasson

Varol

Tzionas

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

461

595

View full text Add to dashboard Cite

Estimating hand-object manipulations is essential for interpreting and imitating human actions. Previous work has made significant progress towards reconstruction of hand poses and object shapes in isolation. Yet, reconstructing hands and objects during manipulation is a more challenging task due to significant occlusions of both the hand and object. While presenting challenges, manipulations may also simplify the problem since the physics of contact restricts the space of valid hand-object configurations. For example, during manipulation, the hand and object should be in contact but not interpenetrate. In this work, we regularize the joint reconstruction of hands and objects with manipulation constraints. We present an end-to-end learnable model that exploits a novel contact loss that favors physically plausible hand-object constellations. Our approach improves grasp quality metrics over baselines, using RGB images as input. To train and evaluate the model, we also propose a new large-scale synthetic dataset, ObMan, with hand-object manipulations. We demonstrate the transferability of ObMan-trained models to real data.

show abstract

Section: Related Workmentioning

confidence: 99%

Learning Joint Reconstruction of Hands and Manipulated Objects

Hasson

Varol

Tzionas

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

461

595

View full text Add to dashboard Cite

show abstract

“…Capturing full 3D body motion from head-mounted cameras is considerably more challenging. Some head-mounted capture systems are based on RGB-D input and reconstruct mostly hand, arm and torso motions [40,57]. Jiang and Grauman [20] reconstruct full body pose from footage taken from a camera worn on the chest by estimating egomotion from the observed scene, but their estimates lack accuracy and have high uncertainty.…”

Section: Related Workmentioning

confidence: 99%

xR-EgoPose: Egocentric 3D Human Pose From an HMD Camera

Tomè

Peluse

Agapito

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

104

View full text Add to dashboard Cite

Figure 1: Left: Our xR-EgoPose Dataset setup: (a) external camera viewpoint showing a synthetic character wearing the headset; (b) example of photorealistic image rendered from the egocentric camera perspective; (c) 2D and (d) 3D poses estimated with our algorithm. Right: results on real images; (e) real image acquired with our HMD-mounted camera with predicted 2D heatmaps; (f) estimated 3D pose, showing good generalization to real images. AbstractWe present a new solution to egocentric 3D body pose estimation from monocular images captured from a downward looking fish-eye camera installed on the rim of a head mounted virtual reality device. This unusual viewpoint, just 2 cm. away from the user's face, leads to images with unique visual appearance, characterized by severe self-occlusions and strong perspective distortions that result in a drastic difference in resolution between lower and upper body. Our contribution is two-fold. Firstly, we propose a new encoderdecoder architecture with a novel dual branch decoder designed specifically to account for the varying uncertainty in the 2D joint locations. Our quantitative evaluation, both on synthetic and real-world datasets, shows that our strategy leads to substantial improvements in accuracy over state of the art egocentric pose estimation approaches. Our second contribution is a new large-scale photorealistic synthetic dataset -xR-EgoPose -offering 383K frames of high quality renderings of people with a diversity of skin tones, body shapes, clothing, in a variety of backgrounds and lighting conditions, performing a range of actions. Our experiments show that the high variability in our new synthetic training corpus leads to good generalization to real world footage and to state of the art results on real world datasets with ground truth. Moreover, an evaluation on the Human3.6M benchmark shows that the performance of our method is on par with top performing approaches on the more classic problem of 3D human pose from a third person viewpoint.

show abstract

“…An important difficulty in hand pose estimation lies in object occlusions and self-occlusions that make it hard to localize hidden joints/ parts of the hand. Some authors proposed the use of 3D cameras or depth sensors in conjunction with sensorbased techniques to train hand pose estimators more robust to self-occlusions [17], [97], [79], [69], [98]. However, as discussed above, the use of 3D imaging techniques might not be easily translated to FPV.…”

Section: Hand Pose Estimation and Fingertip Detectionmentioning

confidence: 99%

“…One of the advantages of using depth images for extracting the hand pose is the possibility to synthesize large training sets of realistic depth maps by using computer graphics [17], [97]. In [17], the authors tackled hand pose estimation as a multiclass classification problem by using a hierarchical cascade architecture.…”

Section: Hand Pose Estimation Using 3d/depth Sensorsmentioning

confidence: 99%

“…Compared to other localization tasks, hand pose estimation presents a higher proportion of approaches that use 3D and depth sensors. This choice has several advantages: 1) the possibility to use motion capture methods for automatically obtaining the ground truth joint positions [69], [102]; 2) the availability of multiple streams (color and depth) that can be combined to refine the estimations [79], [106]; and 3) the possibility to synthesize large datasets of realistic depth maps [17], [97]. In the past few years, human pose estimation approaches [103], [104] have been successfully adapted to the egocentric POV, in order to estimate the hand and arm pose from monocular color images [35], [99].…”

Section: Remarks On Hand Pose Estimationmentioning

confidence: 99%

See 1 more Smart Citation

Analysis of the Hands in Egocentric Vision: A Survey

Bandini

Zariffa

2023

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Egocentric vision (a.k.a. first-person vision -FPV) applications have thrived over the past few years, thanks to the availability of affordable wearable cameras and large annotated datasets. The position of the wearable camera (usually mounted on the head) allows recording exactly what the camera wearers have in front of them, in particular hands and manipulated objects. This intrinsic advantage enables the study of the hands from multiple perspectives: localizing hands and their parts within the images; understanding what actions and activities the hands are involved in; and developing human-computer interfaces that rely on hand gestures. In this survey, we review the literature that focuses on the hands using egocentric vision, categorizing the existing approaches into: localization (where are the hands or part of them?); interpretation (what are the hands doing?); and application (e.g., systems that used egocentric hand cues for solving a specific problem). Moreover, a list of the most prominent datasets with hand-based annotations is provided.

show abstract

First-person pose recognition using egocentric workspaces

Cited by 93 publications

References 34 publications

Learning Joint Reconstruction of Hands and Manipulated Objects

Learning Joint Reconstruction of Hands and Manipulated Objects

xR-EgoPose: Egocentric 3D Human Pose From an HMD Camera

Analysis of the Hands in Egocentric Vision: A Survey

Contact Info

Product

Resources

About