Multi-label Image Recognition by Recurrently Discovering Attentional Regions

Wang, Zhouxia; Chen, Tianshui; Li, Guanbin; Xu, Ruijia; Li, Lin

doi:10.1109/iccv.2017.58

Cited by 299 publications

(191 citation statements)

References 31 publications

Supporting

Mentioning

191

Contrasting

Order By: Relevance

“…While the former formulates the multi-label classification problem as a structural inference problem which may suffer from a scalability issue due to high computational complexity, the latter predicts the labels in a sequential fashion, based on some orders either pre-defined or learned. Another line of works implicitly model the label correlations via attention mechanisms [36,29]. They consider the relations between attended regions of an image, which can be viewed as local correlations, but still ignore the global correlations between labels which require to be inferred from knowledge beyond a single image.…”

Section: Introductionmentioning

confidence: 99%

Multi-Label Image Recognition With Graph Convolutional Networks

Chen

Wei

Wang

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

974

456

View full text Add to dashboard Cite

The task of multi-label image recognition is to predict a set of object labels that present in an image. As objects normally co-occur in an image, it is desirable to model the label dependencies to improve the recognition performance. To capture and explore such important dependencies, we propose a multi-label classification model based on Graph Convolutional Network (GCN). The model builds a directed graph over the object labels, where each node (label) is represented by word embeddings of a label, and GCN is learned to map this label graph into a set of inter-dependent object classifiers. These classifiers are applied to the image descriptors extracted by another sub-net, enabling the whole network to be end-to-end trainable. Furthermore, we propose a novel re-weighted scheme to create an effective label correlation matrix to guide information propagation among the nodes in GCN. Experiments on two multi-label image recognition datasets show that our approach obviously outperforms other existing state-of-the-art methods. In addition, visualization analyses reveal that the classifiers learned by our model maintain meaningful semantic topology.

show abstract

Section: Introductionmentioning

confidence: 99%

Multi-Label Image Recognition With Graph Convolutional Networks

Chen

Wei

Wang

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

974

456

View full text Add to dashboard Cite

show abstract

“…These graph neural networks have been widely employed in various tasks of computer vision and have made very promising progress, e.g. object parsing [31,32], multi-label image recognition [52], visual question answer [46], social relationship understanding [51], person re-identification [42] and action recognition [49]. These work create knowledge graph based on the relationship of different entities, e.g.…”

Section: Datasetsmentioning

confidence: 99%

Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid

Kuang¹,

Gao

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

View full text Add to dashboard Cite

Matching clothing images from customers and online shopping stores has rich applications in E-commerce. Existing algorithms encoded an image as a global feature vector and performed retrieval with the global representation. However, discriminative local information on clothes are submerged in this global representation, resulting in suboptimal performance. To address this issue, we propose a novel Graph Reasoning Network (GRNet) on a Similarity Pyramid, which learns similarities between a query and a gallery cloth by using both global and local representations in multiple scales. The similarity pyramid is represented by a Graph of similarity, where nodes represent similarities between clothing components at different scales, and the final matching score is obtained by message passing along edges. In GRNet, graph reasoning is solved by training a graph convolutional network, enabling to align salient clothing components to improve clothing retrieval. To facilitate future researches, we introduce a new benchmark FindFashion, containing rich annotations of bounding boxes, views, occlusions, and cropping. Extensive experiments show that GRNet obtains new state-of-the-art results on two challenging benchmarks, e.g. pushing the top-1, top-20, and top-50 accuracies on DeepFashion to 26%, 64%, and 75% (i.e. 4%, 10%, and 10% absolute improvements), outperforming competitors with large margins. On FindFashion, GRNet achieves considerable improvements on all empirical settings.

show abstract

“…Wang et al [24] utilized recurrent neural networks (RNNs) to transform labels into embedded label vectors, so that the correlation between labels can be employed. Wang et al [25] introduced a spatial transformer layer and long short-term memory (LSTM) units to capture label correlation.…”

Section: Related Workmentioning

confidence: 99%

Semi-supervised Graph Embedding for Multi-label Graph Node Classification

Gao

Zhang

Zhou

2019

Web Information Systems Engineering – WISE 2019

View full text Add to dashboard Cite

The graph convolution network (GCN) is a widely-used facility to realize graph-based semi-supervised learning, which usually integrates node features and graph topologic information to build learning models. However, as for multi-label learning tasks, the supervision part of GCN simply minimizes the cross-entropy loss between the last layer outputs and the ground-truth label distribution, which tends to lose some useful information such as label correlations, so that prevents from obtaining high performance. In this paper, we propose a novel GCN-based semi-supervised learning approach for multi-label classification, namely ML-GCN. ML-GCN first uses a GCN to embed the node features and graph topologic information. Then, it randomly generates a label matrix, where each row (i.e., label vector) represents a kind of labels. The dimension of the label vector is the same as that of the node vector before the last convolution operation of GCN. That is, all labels and nodes are embedded in a uniform vector space. Finally, during the ML-GCN model training, label vectors and node vectors are concatenated to serve as the inputs of the relaxed skipgram model to detect the node-label correlation as well as the label-label correlation. Experimental results on several graph classification datasets show that the proposed ML-GCN outperforms four state-of-the-art methods.

show abstract

Multi-label Image Recognition by Recurrently Discovering Attentional Regions

Cited by 299 publications

References 31 publications

Multi-Label Image Recognition With Graph Convolutional Networks

Multi-Label Image Recognition With Graph Convolutional Networks

Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid

Semi-supervised Graph Embedding for Multi-label Graph Node Classification

Contact Info

Product

Resources

About