2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.601
|View full text |Cite
|
Sign up to set email alerts
|

Multi-context Attention for Human Pose Estimation

Abstract: In this paper, we propose to incorporate convolutional neural networks with a multi-context attention mechanism into an end-to-end framework for human pose estimation. We adopt stacked hourglass networks to generate attention maps from features at multiple resolutions with various semantics. The Conditional Random Field (CRF) is utilized to model the correlations among neighboring regions in the attention map. We further combine the holistic attention model, which focuses on the global consistency of the full … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
383
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 674 publications
(384 citation statements)
references
References 46 publications
1
383
0
Order By: Relevance
“…Inspired by recent studies which show that features extracted from different layers of CNNs capture different semantic structures [11][12][13], in this paper we also estimate the attention map using multi-semantic contexts. Features from lower convolutional layers generally respond to low level image features, such as corners or edges, while those from higher layers focus on structures that are more semantically meaningful, such as human faces or buildings.…”
Section: Learning Context Flexible Attention Model Formentioning
confidence: 99%
See 3 more Smart Citations
“…Inspired by recent studies which show that features extracted from different layers of CNNs capture different semantic structures [11][12][13], in this paper we also estimate the attention map using multi-semantic contexts. Features from lower convolutional layers generally respond to low level image features, such as corners or edges, while those from higher layers focus on structures that are more semantically meaningful, such as human faces or buildings.…”
Section: Learning Context Flexible Attention Model Formentioning
confidence: 99%
“…In particular, the context information, which is generally addressed by selecting image areas surrounding the localized, target image regions [11], have been shown to provide useful cues to estimate the attention. However, the contextual areas around each target region are usually defined manually in the shape of rectangular bounding boxes [6,7,11,27]. Such manually defined focus areas often may be suboptimal, since the actualy contextual area can vary under different situations.…”
Section: B Attention Model For Place Recognitionmentioning
confidence: 99%
See 2 more Smart Citations
“…Only relying on detection with local information cannot distinguish the joint type from the left or right. To handle this problem, some previous methods [8][9] [10] were proposed by increasing the receptive fields to capture global context information to support the discrimination of left/right joint types. In this way, the discrimination of one joint may use the context information of multiple neighbouring body parts.…”
Section: Introductionmentioning
confidence: 99%