Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management 2020
DOI: 10.1145/3340531.3411880
|View full text |Cite
|
Sign up to set email alerts
|

Fast Graph Convolution Network Based Multi-label Image Recognition via Cross-modal Fusion

Abstract: In multi-label image recognition, it has become a popular method to predict those labels that co-occur in an image via modeling the label dependencies. Previous works focus on capturing the correlation between labels, but neglect to effectively fuse the image features and label embeddings, which severely affects the convergence efficiency of the model and inhibits the further precision improvement of multi-label image recognition. To overcome this shortcoming, in this paper, we introduce Multi-modal Factorized… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 37 publications
(17 citation statements)
references
References 31 publications
0
13
0
Order By: Relevance
“…Xu et al [20] utilized GCN on an affinity graph to capture inherent similarity structure of cross-modal data. Wang et al [21] developed a fast GCN model to realized cross-modal fusion. Jiang et al [22] used knowledge-bridge graph to bridge the cross-modal semantic relations.…”
Section: Related Workmentioning
confidence: 99%
“…Xu et al [20] utilized GCN on an affinity graph to capture inherent similarity structure of cross-modal data. Wang et al [21] developed a fast GCN model to realized cross-modal fusion. Jiang et al [22] used knowledge-bridge graph to bridge the cross-modal semantic relations.…”
Section: Related Workmentioning
confidence: 99%
“…ML-GCN [2] and A-GCN [22] use GCN to learn the correlation between objects and achieve good results on multi-label images. F-GCN [39] utilizes a cross-modal component to fuse image features and label embeddings, which speeds up the model convergence and achieve comparable results as ML-GCN and A-GCN. In addition, EmotionGCN [9] utilizes GCN to model the correlation between emotions for emotion distribution learning and NRDH [38] adopts GCN to learn the similarity between images for image retrieval.…”
Section: Graph Convolution Networkmentioning
confidence: 99%
“…By learning the structural similarities between training data points, GCN can integrate the relationships into data features. Formally, GCN takes the correlation matrix A as input and produces the node-level output [38]. e forward propagation process in GCN is described as:…”
Section: Graph Convolutional Networkmentioning
confidence: 99%
“…Graph Convolutional Network Hashing (GCNH) [47] is a profound uni-modal hashing method, which performs spectral convolution operations on input data points and affinity graph to generate a binary embedding. Besides, GCN can also be applied to cross-modal hashing models [10,33,38,40]. Since each data pair, which contains two different modality features, is associated with neighboring data pairs, thus the interdependency of the graph structure helps to preserve the intramodality and inter-modality similarity.…”
Section: Graph Convolutional Networkmentioning
confidence: 99%
See 1 more Smart Citation