Human Pose Estimation via Convolutional Part Heatmap Regression

Bulat, Adrian; Tzimiropoulos, Georgios

doi:10.1007/978-3-319-46478-7_44

Cited by 424 publications

(345 citation statements)

References 40 publications

Supporting

Mentioning

344

Contrasting

Unclassified

Order By: Relevance

“…Recently, methods based on CNNs have been shown to produce stateof-the-art results for many Computer Vision tasks like image recognition [23], object detection [11] and semantic image segmentation [18]. In the context of landmark localisation, it is natural to formulate the problem as a regression one in which CNN features are regressed in order to provide a joint prediction of the landmarks, see for example recent works on human pose estimation [3,5,20,25]. The idea of joint regression of part detection scoremaps for localisation has been explored in [5], however in the context of human pose estimation.…”

Section: Related Workmentioning

confidence: 99%

Convolutional aggregation of local evidence for large pose face alignment

Bulat¹,

Tzimiropoulos²

2016

Procedings of the British Machine Vision Conference 2016

Self Cite

View full text Add to dashboard Cite

Methods for unconstrained face alignment must satisfy two requirements: they must not rely on accurate initialisation/face detection and they should perform equally well for the whole spectrum of facial poses. To the best of our knowledge, there are no methods meeting these requirements to satisfactory extent, and in this paper, we propose Convolutional Aggregation of Local Evidence (CALE), a Convolutional Neural Network (CNN) architecture particularly designed for addressing both of them. In particular, to remove the requirement for accurate face detection, our system firstly performs facial part detection, providing confidence scores for the location of each of the facial landmarks (local evidence). Next, these score maps along with early CNN features are aggregated by our system through joint regression in order to refine the landmarks' location. Besides playing the role of a graphical model, CNN regression is a key feature of our system, guiding the network to rely on context for predicting the location of occluded landmarks, typically encountered in very large poses. The whole system is trained end-to-end with intermediate supervision. When applied to AFLW-PIFA, the most challenging human face alignment test set to date, our method provides more than 50% gain in localisation accuracy when compared to other recently published methods for large pose face alignment. Going beyond human faces, we also demonstrate that CALE is effective in dealing with very large changes in shape and appearance, typically encountered in animal faces.

show abstract

Section: Related Workmentioning

confidence: 99%

Convolutional aggregation of local evidence for large pose face alignment

Bulat¹,

Tzimiropoulos²

2016

Procedings of the British Machine Vision Conference 2016

Self Cite

View full text Add to dashboard Cite

show abstract

“…Single person pose estimation in images has seen a remarkable progress over the past few years [39,30,6,40,14,16,27,5,32]. However, all these approaches assume that only a single person is visible in the image, and cannot handle realistic cases where several people appear in the scene, and interact with each other.…”

Section: Related Workmentioning

confidence: 99%

“…The field of human pose estimation in images has progressed remarkably over the past few years. The methods have advanced from pose estimation of single pre-localized persons [30,6,40,14,16,27,5,32] to the more challenging and realistic case of multiple, potentially overlapping and truncated persons [12,8,30,16,17]. Many applications, such as mentioned before, however, aim to analyze human body motion over time.…”

Section: Introductionmentioning

confidence: 99%

PoseTrack: Joint Multi-person Pose Estimation and Tracking

Iqbal

Milan

Gall

2017

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

192

146

View full text Add to dashboard Cite

In this work, we introduce the challenging problem of joint multi-person pose estimation and tracking of an unknown number of persons in unconstrained videos. Existing methods for multi-person pose estimation in images cannot be applied directly to this problem, since it also requires to solve the problem of person association over time in addition to the pose estimation for each person. We therefore propose a novel method that jointly models multi-person pose estimation and tracking in a single formulation. To this end, we represent body joint detections in a video by a spatio-temporal graph and solve an integer linear program to partition the graph into sub-graphs that correspond to plausible body pose trajectories for each person. The proposed approach implicitly handles occlusion and truncation of persons. Since the problem has not been addressed quantitatively in the literature, we introduce a challenging "Multi-Person PoseTrack" dataset, and also propose a completely unconstrained evaluation protocol that does not make any assumptions about the scale, size, location or the number of persons. Finally, we evaluate the proposed approach and several baseline methods on our new dataset.

show abstract

“…This is mainly due to the availability of deep learning based methods for detecting joints [1][2][3][4][5]. While earlier approaches in this direction [4,6,7] combine the body part detectors with tree structured graphical models, more recent methods [1][2][3][8][9][10] demonstrate that spatial relations between joints can be directly learned by a neural network without the need of an additional graphical model. These approaches, however, assume that only a single person is visible in the image and the location of the person is known a-priori.…”

Section: Introductionmentioning

confidence: 99%

“…In [3,8,9] multi-staged CNN architectures are proposed where each stage of the network takes as input the score maps of all parts from its preceding stage. This provides additional information about the interdependence, co-occurrence, and context of parts to each stage, and thereby allows the network to implicitly learn image dependent spatial relationships between parts.…”

Section: Related Workmentioning

confidence: 99%

Multi-person Pose Estimation with Local Joint-to-Person Associations

Iqbal

Gall

2016

Lecture Notes in Computer Science

128

View full text Add to dashboard Cite

Abstract. Despite of the recent success of neural networks for human pose estimation, current approaches are limited to pose estimation of a single person and cannot handle humans in groups or crowds. In this work, we propose a method that estimates the poses of multiple persons in an image in which a person can be occluded by another person or might be truncated. To this end, we consider multiperson pose estimation as a joint-to-person association problem. We construct a fully connected graph from a set of detected joint candidates in an image and resolve the joint-to-person association and outlier detection using integer linear programming. Since solving joint-to-person association jointly for all persons in an image is an NP-hard problem and even approximations are expensive, we solve the problem locally for each person. On the challenging MPII Human Pose Dataset for multiple persons, our approach achieves the accuracy of a state-of-the-art method, but it is 6,000 to 19,000 times faster.

show abstract

Human Pose Estimation via Convolutional Part Heatmap Regression

Cited by 424 publications

References 40 publications

Convolutional aggregation of local evidence for large pose face alignment

Convolutional aggregation of local evidence for large pose face alignment

PoseTrack: Joint Multi-person Pose Estimation and Tracking

Multi-person Pose Estimation with Local Joint-to-Person Associations

Contact Info

Product

Resources

About