2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.425
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation

Abstract: Most recent approaches to monocular 3D human pose estimation rely on Deep Learning. They typically involve regressing from an image to either 3D joint coordinates directly or 2D joint locations from which 3D coordinates are inferred. Both approaches have their strengths and weaknesses and we therefore propose a novel architecture designed to deliver the best of both worlds by performing both simultaneously and fusing the information along the way. At the heart of our framework is a trainable fusion scheme that… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
129
1

Year Published

2018
2018
2020
2020

Publication Types

Select...
6
3
1

Relationship

2
8

Authors

Journals

citations
Cited by 226 publications
(130 citation statements)
references
References 69 publications
0
129
1
Order By: Relevance
“…Recently, deep architectures have been used to learn 3D representations from RGB images [69,57,37,56,38,46] thanks to the availability of high precise 3D data [24], and are now able to surpass depth-sensors [39]. Chen and Ramanan [11] divided the problem of 3D pose estimation into two parts.…”
Section: Pose Estimationmentioning
confidence: 99%
“…Recently, deep architectures have been used to learn 3D representations from RGB images [69,57,37,56,38,46] thanks to the availability of high precise 3D data [24], and are now able to surpass depth-sensors [39]. Chen and Ramanan [11] divided the problem of 3D pose estimation into two parts.…”
Section: Pose Estimationmentioning
confidence: 99%
“…Our work is also related to 3D pose estimation methods that try to recover 3D locations of joints from 2D images or directly from 3D point cloud and volumetric data (see also [30,50] for related surveys). Most recent methods use deep architectures to extract joints for humans [48,19,41,33,38,75,61,42], hands [15,37,20,63,15,64,14], and more recently some species of animals [43]. However, all these approaches aim to predict a pre-defined set of joints for a particular class of objects.…”
Section: Related Workmentioning
confidence: 99%
“…Monocular Image Based 3D Pose Estimation. Recently, several methods have been proposed to estimate the 3D pose on the monocular image [11,16,20,28,36,38,40,49].…”
Section: Related Workmentioning
confidence: 99%