2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015
DOI: 10.1109/cvpr.2015.7298664
|View full text |Cite
|
Sign up to set email alerts
|

Efficient object localization using Convolutional Networks

Abstract: Recent state-of-the-art performance on human-body pose estimation has been achieved with Deep Convolutional Networks (ConvNets). Traditional ConvNet architectures include pooling and sub-sampling layers which reduce computational requirements, introduce invariance and prevent over-training. These benefits of pooling come at the cost of reduced localization accuracy. We introduce a novel architecture which includes an efficient 'position refinement' model that is trained to estimate the joint offset location wi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
831
2
1

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 1,232 publications
(898 citation statements)
references
References 18 publications
3
831
2
1
Order By: Relevance
“…This serves to increase efficency and reduce memory usage of their method while improving localization performance in the high precision range [16]. One consideration is that for many failure cases a refinement of position within a local window would not offer much improvement since error cases often consist of either occluded or misattributed limbs.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…This serves to increase efficency and reduce memory usage of their method while improving localization performance in the high precision range [16]. One consideration is that for many failure cases a refinement of position within a local window would not offer much improvement since error cases often consist of either occluded or misattributed limbs.…”
Section: Related Workmentioning
confidence: 99%
“…To improve performance at high precision thresholds the prediction is offset by a quarter of a pixel in the direction of its next highest neighbor before transforming back to the original coordinate space of the image. In MPII Human Pose, some joints do not have a corresponding [1] 76.5 59.1 Toshev et al [24] 92.3 82.0 Tompson et al [16] 93.1 89.0 Chen et al [25] 95.3 92.4 Wei et al [18] 97.6 95.0 Our model 99.0 97.0 ground truth annotation. In these cases the joint is either truncated or severely occluded, so for supervision a ground truth heatmap of all zeros is provided.…”
Section: Training Detailsmentioning
confidence: 99%
See 1 more Smart Citation
“…While the tree-structured models provide efficient inference, they struggle to model long-range characteristics of the human body. With the progress in convolutional neural network architectures, more recent works adopt CNNs to obtain stronger part detectors but still use graphical models to obtain coherent pose estimates [4,6,7].…”
Section: Related Workmentioning
confidence: 99%
“…This is mainly due to the availability of deep learning based methods for detecting joints [1][2][3][4][5]. While earlier approaches in this direction [4,6,7] combine the body part detectors with tree structured graphical models, more recent methods [1][2][3][8][9][10] demonstrate that spatial relations between joints can be directly learned by a neural network without the need of an additional graphical model. These approaches, however, assume that only a single person is visible in the image and the location of the person is known a-priori.…”
Section: Introductionmentioning
confidence: 99%