Proceedings of the 2021 International Conference on Multimedia Retrieval 2021
DOI: 10.1145/3460426.3463620
|View full text |Cite
|
Sign up to set email alerts
|

G-CAM: Graph Convolution Network Based Class Activation Mapping for Multi-label Image Recognition

Abstract: In most multi-label image recognition tasks, human visual perception keeps consistent for different spatial transforms of the same image. Existing approaches either learn the perceptual consistency with only image-level supervision or preserve the middle-level feature consistency of attention regions but neglect the (global) label dependencies between different objects over the dataset. To address this issue, we integrate graph convolution network (GCN) and propose G-CAM, which learns visual attention consiste… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 33 publications
0
3
0
Order By: Relevance
“…Despite saving a lot of extra computation required for precise positioning, the process of clipping local regions of the original image and repeating the feature extraction and classification is still not intuitively optimal. Wang et al [13] generated the class‐aware heatmaps via class activation mapping (CAM) and tested the consistency of visual attention for each input sample under different transforms to ensure that the network was making judgments according to the features of the category‐related regions. More granular, SST [9] utilized Transformer [24] to capture features' long‐range dependencies, which facilitate the feature aggregation of objects from the same category.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Despite saving a lot of extra computation required for precise positioning, the process of clipping local regions of the original image and repeating the feature extraction and classification is still not intuitively optimal. Wang et al [13] generated the class‐aware heatmaps via class activation mapping (CAM) and tested the consistency of visual attention for each input sample under different transforms to ensure that the network was making judgments according to the features of the category‐related regions. More granular, SST [9] utilized Transformer [24] to capture features' long‐range dependencies, which facilitate the feature aggregation of objects from the same category.…”
Section: Related Workmentioning
confidence: 99%
“…Hence, another avenue for improvement lies in related researches that optimize multi-label classification by identifying category-related regions and establishing connections between category-related features and labels. These methods [9][10][11][12][13][14][15][16][17][18][19] encourage the network to extract semantically consistent discriminative features for distinguishing different categories based on category-related local regions, which is crucial in multi-label image classification. However, they tend to neglect the potential impact of multi-scale variations of targets in multi-label image classification scenarios.…”
Section: Introductionmentioning
confidence: 99%
“…They are unable to finish end-to-end training because they update the entire network using a subpar multistep training workflow. The study [20] suggests employing www.ijacsa.thesai.org GCAM to record the label relationships between diverse image transforms. Graph convolution network (GCN) is used by MLGCN [11] and AGCN [12] to produce label cooccurrence embedding for multiple label picture categorizations.…”
Section: A Multiplelabel Image Classificationmentioning
confidence: 99%