2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.58
|View full text |Cite
|
Sign up to set email alerts
|

Multi-label Image Recognition by Recurrently Discovering Attentional Regions

Abstract: This paper proposes a novel deep architecture to address multi-label image recognition, a fundamental and practical task towards general visual understanding. Current solutions for this task usually rely on an extra step of extracting hypothesis regions (i.e., region proposals), resulting in redundant computation and sub-optimal performance. In this work, we achieve the interpretable and contextualized multi-label image classification by developing a recurrent memorized-attention module. This module consists o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
191
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 299 publications
(191 citation statements)
references
References 31 publications
0
191
0
Order By: Relevance
“…While the former formulates the multi-label classification problem as a structural inference problem which may suffer from a scalability issue due to high computational complexity, the latter predicts the labels in a sequential fashion, based on some orders either pre-defined or learned. Another line of works implicitly model the label correlations via attention mechanisms [36,29]. They consider the relations between attended regions of an image, which can be viewed as local correlations, but still ignore the global correlations between labels which require to be inferred from knowledge beyond a single image.…”
Section: Introductionmentioning
confidence: 99%
“…While the former formulates the multi-label classification problem as a structural inference problem which may suffer from a scalability issue due to high computational complexity, the latter predicts the labels in a sequential fashion, based on some orders either pre-defined or learned. Another line of works implicitly model the label correlations via attention mechanisms [36,29]. They consider the relations between attended regions of an image, which can be viewed as local correlations, but still ignore the global correlations between labels which require to be inferred from knowledge beyond a single image.…”
Section: Introductionmentioning
confidence: 99%
“…These graph neural networks have been widely employed in various tasks of computer vision and have made very promising progress, e.g. object parsing [31,32], multi-label image recognition [52], visual question answer [46], social relationship understanding [51], person re-identification [42] and action recognition [49]. These work create knowledge graph based on the relationship of different entities, e.g.…”
Section: Datasetsmentioning
confidence: 99%
“…Wang et al [24] utilized recurrent neural networks (RNNs) to transform labels into embedded label vectors, so that the correlation between labels can be employed. Wang et al [25] introduced a spatial transformer layer and long short-term memory (LSTM) units to capture label correlation.…”
Section: Related Workmentioning
confidence: 99%