Unlocking the Potential of Ordinary Classifier: Class-specific Adversarial Erasing Framework for Weakly Supervised Semantic Segmentation

Kweon, Hyeokjun; Yoon, Sung-Hoon; Kim, Hyeonseong; Park, Daehee; Yoon, Kuk-Jin

doi:10.1109/iccv48922.2021.00691

Cited by 107 publications

(34 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Affinity-based methods [2,17] train a network to learn the pixel-level affinity and apply random walk as post-processing. By guiding the network to keep searching for objects even after the most discriminative region is erased from the image, Adversarial Erasing (AE) methods [19,25,31,50,62] enlarge the CAMs to less-discriminative regions. Various techniques such as multi-dilated convolution [51], stochastic feature selection [27], integration at multiple phases [21], and context decoupling augmentation [41] are designed to make the network generate better CAMs.…”

Section: Related Workmentioning

confidence: 99%

“…3.2, for simplicity, we denote msinf -CAMs of SupportNet and X m as A, X, respectively. Class Region Masks Motivated by the observation that the CAMs have enough capability to localize the regions of each class even at the early stage of the training process [21,25], we obtain a regional self-supervision using the CAMs from the SupportNet during the training. We regard the CAM as a pixel-wise score map for being classified to that class.…”

Section: Regional Contrastive Module (Rcm)mentioning

confidence: 99%

“…Therefore, by using the multi-scale inference technique, existing segmentation networks have been able to improve their performance during inference. As the multi-scale inference approach can also relieve large variations in CAMs from multi-scale images, it has also been well-utilized in the field of WSSS [2,3,25,28,49,59] to generate pseudo-labels. CAMs obtained from an image of a specific resolution contain meaningful information that is difficult to be obtained at the other resolutions, so the performance is greatly improved through the multi-scale inference technique.…”

Section: Multi-scale Attentive Module (Mam)mentioning

confidence: 99%

“…To address this issue, we use the latest non-empty prototype for each class. For the segmentation network, we use Deeplab [4] with ResNet38 backbone as in [2,25,34,43,60].…”

Section: Implementation Detailsmentioning

confidence: 99%

“…As with many other previous weakly supervised semantic segmentation (WSSS) approaches [2,3,25,40,49,59], we employ the ResNet38 [53] as the backbone for both the MainNet and the SupportNet. To extract feature maps (X) for class-wise prototypes and pixel-wise contrastive learning, we add an intermediate 1×1 convolution layer.…”

Section: Network Architecturementioning

confidence: 99%

See 4 more Smart Citations

Exploring Pixel-level Self-supervision for Weakly Supervised Semantic Segmentation

Yoon¹,

Kweon²,

Jeong³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Existing studies in weakly supervised semantic segmentation (WSSS) have utilized class activation maps (CAMs) to localize the class objects. However, since a classification loss is insufficient for providing precise object regions, CAMs tend to be biased towards discriminative patterns (i.e., sparseness) and do not provide precise object boundary information (i.e., impreciseness). To resolve these limitations, we propose a novel framework (composed of MainNet and SupportNet.) that derives pixel-level selfsupervision from given image-level supervision. In our framework, with the help of the proposed Regional Contrastive Module (RCM) and Multi-scale Attentive Module (MAM), MainNet is trained by self-supervision from the SupportNet. The RCM extracts two forms of selfsupervision from SupportNet: (1) class region masks generated from the CAMs and (2) class-wise prototypes obtained from the features according to the class region masks. Then, every pixel-wise feature of the MainNet is trained by the prototype in a contrastive manner, sharpening the resulting CAMs. The MAM utilizes CAMs inferred at multiple scales from the SupportNet as self-supervision to guide the MainNet. Based on the dissimilarity between the multiscale CAMs from MainNet and SupportNet, CAMs from the MainNet are trained to expand to the less-discriminative regions. The proposed method shows state-of-the-art WSSS performance both on the train and validation sets on the PASCAL VOC 2012 dataset. For reproducibility, code will be available publicly soon.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Regional Contrastive Module (Rcm)mentioning

confidence: 99%

Section: Multi-scale Attentive Module (Mam)mentioning

confidence: 99%

“…To address this issue, we use the latest non-empty prototype for each class. For the segmentation network, we use Deeplab [4] with ResNet38 backbone as in [2,25,34,43,60].…”

Section: Implementation Detailsmentioning

confidence: 99%

Section: Network Architecturementioning

confidence: 99%

See 3 more Smart Citations

Exploring Pixel-level Self-supervision for Weakly Supervised Semantic Segmentation

Yoon¹,

Kweon²,

Jeong³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

Light Annotation Fine Segmentation: Histology Image Segmentation Based on VGG Fusion with Global Normalisation CAM

Wang

Liu

et al. 2022

Computational Mathematics Modeling in Cancer Analysis

View full text Add to dashboard Cite

Deep learning has been widely used to segment tumour regions in stained histopathology images. However, precise annotations are expensive and labour-consuming. To reduce the manual annotation workload, we propose a light annotation-based fine-level segmentation approach for histology images based on a VGG-based Fusion network with Global Normalisation CAM. The experts are only required to provide a rough segmentation annotation on the images, and then accurate fine-level segmentation boundaries can be produced using this method. To validate the proposed approach, three histopathology datasets with rough and fine quality segmentation annotation are built. The fine quality labels are used only as ground truth in evaluation. The VFGN-CAM method includes three main components: an annotation enhancement to boost boundary accuracy and model generalisability; a VGG Fusion module that integrates multi-scale tumour features; and a Global Normalisation CAM module that combines local and global gradient information of tumour regions. Our VGG fusion and Global Normalisation CAM outperform the existing methods with a Dice of 84.188%. The final improvement for our proposed methods against the original rough labels is around 22.8%. The codes are released at:xxx.

show abstract

Weakly Supervised Semantic Segmentation of Echocardiography Videos via Multi-level Features Selection

Chen

Cai

Lai

2022

Pattern Recognition and Computer Vision

View full text Add to dashboard Cite

Weakly supervised semantic segmentation (WSSS) models relying on class activation maps (CAMs) have achieved desirable performance comparing to the non-CAMs-based counterparts. However, to guarantee WSSS task feasible, we need to generate pseudo labels by expanding the seeds from CAMs which is complex and time-consuming, thus hindering the design of efficient end-to-end (single-stage) WSSS approaches. To tackle the above dilemma, we resort to the off-the-shelf and readily accessible saliency maps for directly obtaining pseudo labels given the image-level class labels. Nevertheless, the salient regions may contain noisy labels and cannot seamlessly fit the target objects, and saliency maps can only be approximated as pseudo labels for simple images containing single-class objects. As such, the achieved segmentation model with these simple images cannot generalize well to the complex images containing multi-class objects. To this end, we propose an end-to-end multi-granularity denoising and bidirectional alignment (MDBA) model, to alleviate the noisy label and multi-class generalization issues. Specifically, we propose the online noise filtering and progressive noise detection modules to tackle image-level and pixel-level noise, respectively. Moreover, a bidirectional alignment mechanism is proposed to reduce the data distribution gap at both input and output space with simple-to-complex image synthesis and complex-to-simple adversarial learning. MDBA can reach the mIoU of 69.5% and 70.2% on validation and test sets for the PASCAL VOC 2012 dataset. The source codes and models have been made available at https://github.com/

show abstract

Unlocking the Potential of Ordinary Classifier: Class-specific Adversarial Erasing Framework for Weakly Supervised Semantic Segmentation

Cited by 107 publications

References 27 publications

Exploring Pixel-level Self-supervision for Weakly Supervised Semantic Segmentation

Exploring Pixel-level Self-supervision for Weakly Supervised Semantic Segmentation

Light Annotation Fine Segmentation: Histology Image Segmentation Based on VGG Fusion with Global Normalisation CAM

Weakly Supervised Semantic Segmentation of Echocardiography Videos via Multi-level Features Selection

Contact Info

Product

Resources

About