2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022
DOI: 10.1109/wacv51458.2022.00271
|View full text |Cite
|
Sign up to set email alerts
|

Inferring the Class Conditional Response Map for Weakly Supervised Semantic Segmentation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0
2

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 16 publications
(9 citation statements)
references
References 39 publications
0
7
0
2
Order By: Relevance
“…Indeed, WSSS transforms the semantic segmentation task into the much less demanding effort of image-level class annotations. The problem is ill-conditioned and difficult, and a large literature is dedicated to the solution of it starting from Zeiler and Fergus ( 2014 ) and Zhou et al ( 2016 ), up to most recent contributions (Chang et al, 2020 ; Sun et al, 2020 , 2022 ; Wang et al, 2020 ; Wu et al, 2021 ; Zhang et al, 2021 ). Semantic segmentation is critical for detecting tip-burn on large canopies due to the difficulty of both identifying it on a dense set of plants and to individually localizing each tip-burned plant within the canopy, as shown in Figure 3 .…”
Section: Methodsmentioning
confidence: 99%
“…Indeed, WSSS transforms the semantic segmentation task into the much less demanding effort of image-level class annotations. The problem is ill-conditioned and difficult, and a large literature is dedicated to the solution of it starting from Zeiler and Fergus ( 2014 ) and Zhou et al ( 2016 ), up to most recent contributions (Chang et al, 2020 ; Sun et al, 2020 , 2022 ; Wang et al, 2020 ; Wu et al, 2021 ; Zhang et al, 2021 ). Semantic segmentation is critical for detecting tip-burn on large canopies due to the difficulty of both identifying it on a dense set of plants and to individually localizing each tip-burned plant within the canopy, as shown in Figure 3 .…”
Section: Methodsmentioning
confidence: 99%
“…To address this problem, we propose a linear self-attention that is aware of 2D position. Locality is a widely used assumption in computer vision [10]- [13], [34]- [38], i.e., neighbouring pixels should have higher possibilities to belong to the same object than distant pixels. Convolution-based networks are inherently coupled with this assumption as they always exploit feature locality [39], [40].…”
Section: Self-attention Linearization Localitymentioning
confidence: 99%
“…The vision features are usually continuous in temporal [11,43,101,59] and spatial [73,62,70,18] neighbourhood. Analogously, the prediction confidence increases if all the tokens within a local region can reach a consensus for feature importance.…”
Section: Neighborhood Associationmentioning
confidence: 99%
“…Meanwhile, the salient feature activations are usually locally accumulative [11,73]. In the continuous video, critical information contained in the target token could be shared by its spatial and/or temporal neighbour tokens.…”
Section: Introductionmentioning
confidence: 99%