2016
DOI: 10.1007/978-3-319-46475-6_16
|View full text |Cite
|
Sign up to set email alerts
|

Human Pose Estimation Using Deep Consensus Voting

Abstract: In this paper we consider the problem of human pose estimation from a single still image. We propose a novel approach where each location in the image votes for the position of each keypoint using a convolutional neural net. The voting scheme allows us to utilize information from the whole image, rather than rely on a sparse set of keypoint locations. Using dense, multi-target votes, not only produces good keypoint predictions, but also enables us to compute image-dependent joint keypoint probabilities by look… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
75
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 130 publications
(75 citation statements)
references
References 26 publications
0
75
0
Order By: Relevance
“…RefineNet [70] improves the combination of upsampled representations and the representations of the same resolution copied from the downsample process. Other works include: light upsample process [5], [19], [72], [124], possibly with dilated convolutions used in the backbone [47], [69], [91]; light downsample and heavy upsample processes [115], recombinator networks [40]; improving skip connections with more or complicated convolutional units [48], [89], [143], as well as sending information from low-resolution skip connections to highresolution skip connections [151] or exchanging information between them [34]; studying the details of the upsample process [120]; combining multi-scale pyramid representations [18], [125]; stacking multiple DeconvNets/U-Nets/Hourglass [31], [122] with dense connections [110].…”
Section: Related Workmentioning
confidence: 99%
“…RefineNet [70] improves the combination of upsampled representations and the representations of the same resolution copied from the downsample process. Other works include: light upsample process [5], [19], [72], [124], possibly with dilated convolutions used in the backbone [47], [69], [91]; light downsample and heavy upsample processes [115], recombinator networks [40]; improving skip connections with more or complicated convolutional units [48], [89], [143], as well as sending information from low-resolution skip connections to highresolution skip connections [151] or exchanging information between them [34]; studying the details of the upsample process [120]; combining multi-scale pyramid representations [18], [125]; stacking multiple DeconvNets/U-Nets/Hourglass [31], [122] with dense connections [110].…”
Section: Related Workmentioning
confidence: 99%
“…Monocular RGB body pose estimation in 2D has been widely researched, but estimates only the 2D skeletal pose [Bourdev and Malik 2009;Felzenszwalb et al 2010;Felzenszwalb and Huttenlocher 2005;Ferrari et al 2009;Pishchulin et al 2013;Wei et al 2016]. Learning-based discriminative methods, in particular deep learning methods Lifshitz et al 2016;Newell et al 2016;Tompson et al 2014], represent the current state of the art in 2D pose estimation, with some of these methods demonstrating real-time performance [Cao et al 2016;Wei et al 2016]. Monocular RGB estimation of the 3D skeletal pose is a much harder challenge tackled by relatively fewer methods [Bogo et al 2016;Tekin et al 2016b,c;Zhou et al , 2015b.…”
Section: Introductionmentioning
confidence: 99%
“…Our proposed method (i.e., Ours-weakC-2) used 9040 images in the MPII (i.e., half of the entire images) for the FS set and other images in "LSP+LSPext+MPII" dataset for the WS set. On the other hand, all images and annotations in MPII and "LSP+LSPext+MPII" were used for training in [74,50,76,49,46] (shown in the upper rows in the table) and [73,18] (shown in the lower rows), respectively. For reference, the results of the baseline [18] that used only half of the entire images in the MPII (i.e., Baseline-2 (HALF) in the table) are shown.…”
Section: Discussionmentioning
confidence: 99%
“…Unlike deformable part models, recent DCNN-based human pose estimation methods (e.g., [46,47,48,49,50,18,51]) acquire the position of each body joint from its corresponding heatmap. The heatmap of each joint is outputted from a DCNN as shown in Figure 2.…”
Section: Dcnn-based Heatmap Modelsmentioning
confidence: 99%