Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition

Fu, Jianlong; Zheng, Haizhong; Mei, Tao

doi:10.1109/cvpr.2017.476

Cited by 1,221 publications

(806 citation statements)

References 21 publications

Supporting

Mentioning

800

Contrasting

Unclassified

Order By: Relevance

“…Some recent works [14,17] adopt STN to localize bodyparts for person re-identification. Fu et al [3] attempt to recursively learn discriminative region for fine-grained image recognition. Wang et al [33] search the discriminative regions with STN and LSTM for multi-label classification, while not in a label-specific manner.…”

Section: Related Workmentioning

confidence: 99%

Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization

Tang

Sheng

Zhang

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

119

109

View full text Add to dashboard Cite

Pedestrian attribute recognition has been an emerging research topic in the area of video surveillance. To predict the existence of a particular attribute, it is demanded to localize the regions related to the attribute. However, in this task, the region annotations are not available. How to carve out these attribute-related regions remains challenging. Existing methods applied attribute-agnostic visual attention or heuristic body-part localization mechanisms to enhance the local feature representations, while neglecting to employ attributes to define local feature areas. We propose a flexible Attribute Localization Module (ALM) to adaptively discover the most discriminative regions and learns the regional features for each attribute at multiple levels. Moreover, a feature pyramid architecture is also introduced to enhance the attribute-specific localization at low-levels with high-level semantic guidance. The proposed framework does not require additional region annotations and can be trained end-to-end with multi-level deep supervision. Extensive experiments show that the proposed method achieves state-of-the-art results on three pedestrian attribute datasets, including PETA, RAP, and PA-100K.

show abstract

Section: Related Workmentioning

confidence: 99%

Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization

Tang

Sheng

Zhang

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

119

109

View full text Add to dashboard Cite

show abstract

“…Attention mechanisms, which highlight different positions or nodes according to their importance, have been widely adopted in the field of computer vision. Xu [11] which recursively explores discriminative spatial regions and harvests multi-scale region based features for fine-grained image recognition. Wu et al propose to employ a structured attention mechanism to integrate local spatial-temporal representation at trajectory level [46] for more fine-grained video description.…”

Section: Visual Attention Modelmentioning

confidence: 99%

Motion Guided Attention for Video Salient Object Detection

Chen²,

Li³

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

185

115

View full text Add to dashboard Cite

Video salient object detection aims at discovering the most visually distinctive objects in a video. How to effectively take object motion into consideration during video salient object detection is a critical issue. Existing stateof-the-art methods either do not explicitly model and harvest motion cues or ignore spatial contexts within optical flow images. In this paper, we develop a multi-task motion guided video salient object detection network, which learns to accomplish two sub-tasks using two sub-networks, one sub-network for salient object detection in still images and the other for motion saliency detection in optical flow images. We further introduce a series of novel motion guided attention modules, which utilize the motion saliency subnetwork to attend and enhance the sub-network for still images. These two sub-networks learn to adapt to each other by end-to-end training. Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on a wide range of benchmarks. We hope our simple and effective approach will serve as a solid baseline and help ease future research in video salient object detection. Code and models will be made available.

show abstract

“…A straightforward way of implementing a part-based recognition system is to employ the ground-truth part annotations if they exist (e.g., for the CUB-200-2011 birds dataset [21]). Since these annotations are expensive and most fine-grained datasets do not provide them, weakly supervised part detectors are a common choice [3,7,15,23]. The only supervision that these detectors use are class label annotations.…”

Section: Part-based Recognition Approachesmentioning

confidence: 99%

“…Finally, an overview about the part feature extraction from the classification-specific bounding-box-parts and about the part-based classification is given in Sect. 3 Fig. 3.…”

Section: Classification-specific Part Estimationmentioning

confidence: 99%

Classification-Specific Parts for Improving Fine-Grained Visual Categorization

Korsch

Bodesheim

Denzler

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Fine-grained visual categorization is a classification task for distinguishing categories with high intra-class and small inter-class variance. While global approaches aim at using the whole image for performing the classification, part-based solutions gather additional local information in terms of attentions or parts. We propose a novel classificationspecific part estimation that uses an initial prediction as well as backpropagation of feature importance via gradient computations in order to estimate relevant image regions. The subsequently detected parts are then not only selected by a-posteriori classification knowledge, but also have an intrinsic spatial extent that is determined automatically. This is in contrast to most part-based approaches and even to available groundtruth part annotations, which only provide point coordinates and no additional scale information. We show in our experiments on various widely-used fine-grained datasets the effectiveness of the mentioned part selection method in conjunction with the extracted part features.

show abstract

Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition

Cited by 1,221 publications

References 21 publications

Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization

Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization

Motion Guided Attention for Video Salient Object Detection

Classification-Specific Parts for Improving Fine-Grained Visual Categorization

Contact Info

Product

Resources

About