Attention-Based Multi-Context Guiding for Few-Shot Semantic Segmentation

Hu, Tao; Yang, Pengwan; Zhang, Chiliang; Yu, Gang; Mu, Yadong; Snoek, Cees G. M.

doi:10.1609/aaai.v33i01.33018441

Cited by 141 publications

(96 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Segmentation masks with dashed border denote ground truth annotations. classification [25,23,24,18,6,20,12,14] and a few targeting at segmentation tasks [21,17,4,28,4,8].…”

Section: Introductionmentioning

confidence: 99%

PANet: Few-Shot Image Semantic Segmentation With Prototype Alignment

Wang

Liew²,

Zou

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

1,030

724

View full text Add to dashboard Cite

Despite the great progress made by deep CNNs in image semantic segmentation, they typically require a large number of densely-annotated images for training and are difficult to generalize to unseen object categories. Few-shot segmentation has thus been developed to learn to perform segmentation from only a few annotated examples. In this paper, we tackle the challenging few-shot segmentation problem from a metric learning perspective and present PANet, a novel prototype alignment network to better utilize the information of the support set. Our PANet learns classspecific prototype representations from a few support images within an embedding space and then performs segmentation over the query images through matching each pixel to the learned prototypes. With non-parametric metric learning, PANet offers high-quality prototypes that are representative for each semantic class and meanwhile discriminative for different classes. Moreover, PANet introduces a prototype alignment regularization between support and query. With this, PANet fully exploits knowledge from the support and provides better generalization on few-shot segmentation. Significantly, our model achieves the mIoU score of 48.1% and 55.7% on PASCAL-5 i for 1-shot and 5-shot settings respectively, surpassing the state-of-the-art method by 1.8% and 8.6%.

show abstract

“…Segmentation masks with dashed border denote ground truth annotations. classification [25,23,24,18,6,20,12,14] and a few targeting at segmentation tasks [21,17,4,28,4,8].…”

Section: Introductionmentioning

confidence: 99%

PANet: Few-Shot Image Semantic Segmentation With Prototype Alignment

Wang

Liew²,

Zou

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

1,030

724

View full text Add to dashboard Cite

show abstract

“…Models employing memory will allow artificial neural networks to retain previously learned features while incorporating new information about image transformations from subsequent training sets. Biologically inspired machine vision techniques are already playing an important role in the development of robust visual representations for challenging tasks such as few-shot learning [204]. As these approaches continue to evolve, an interesting prospect will be their integration into CNN models to facilitate general transformation-invariant visual understanding.…”

Section: Emerging Trends and Future Research Directionsmentioning

confidence: 99%

CNN Architectures for Geometric Transformation-Invariant Feature Representation in Computer Vision: A Review

Mumuni

2021

SN COMPUT. SCI.

View full text Add to dashboard Cite

One of the main challenges in machine vision relates to the problem of obtaining robust representation of visual features that remain unaffected by geometric transformations. This challenge arises naturally in many practical machine vision tasks. For example, in mobile robot applications like simultaneous localization and mapping (SLAM) and visual tracking, object shapes change depending on their orientation in the 3D world, camera proximity, viewpoint, or perspective. In addition, natural phenomena such as occlusion, deformation, and clutter can cause geometric appearance changes of the underlying objects, leading to geometric transformations of the resulting images. Recently, deep learning techniques have proven very successful in visual recognition tasks but they typically perform poorly with small data or when deployed in environments that deviate from training conditions. While convolutional neural networks (CNNs) have inherent representation power that provides a high degree of invariance to geometric image transformations, they are unable to satisfactorily handle nontrivial transformations. In view of this limitation, several techniques have been devised to extend CNNs to handle these situations. This article reviews some of the most promising approaches to extend CNN architectures to handle nontrivial geometric transformations. Key strengths and weaknesses, as well as the application domains of the various approaches are also highlighted. The review shows that although an adequate model for generalized geometric transformations has not yet been formulated, several techniques exist for solving specific problems. Using these methods, it is possible to develop task-oriented solutions to deal with nontrivial transformations.

show abstract

“…Additionally, annotating pixel-level labels is often costly in terms of both human efforts and finance, making the current state of the art segmentation models be not suitable for addressing generalized few-shot semantic segmentation problems. Few-shot object segmentation [15,29,35] has received much attention recently due to its advantages in learning novel categories [8,9] without much annotations. Most previous approaches [30,42] follow the metric-based few-shot learning scheme and make great efforts on developing robust feature embedding to measure the pixel-wise similarity between the object from the support image and the query one.…”

Section: Related Workmentioning

confidence: 99%

Meta Parsing Networks: Towards Generalized Few-shot Scene Parsing with Adaptive Metric Learning

Wei

Yang

2020

Proceedings of the 28th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Recent progress in few-shot segmentation usually aims at performing novel object segmentation using a few annotated examples as guidance. In this work, we advance this few-shot segmentation paradigm towards a more challenging yet general scenario, i.e., Generalized Few-shot Scene Parsing (GFSP). In this task, we take a fully annotated image as guidance to segment all pixels in a query image. Our mission is to study a generalizable and robust segmentation network from the meta-learning perspective so that both seen and unseen categories can be correctly recognized. Different from previous practices, this task performs segmentation on a joint label space consisting of both previously seen and novel categories. Moreover, pixels from these multiple categories need to be simultaneously taken into account, which is actually not well explored before. Accordingly, we present Meta Parsing Networks (MPNet) to better exploit the guidance information in the support set. Our MPNet contains two basic modules, i.e., the Adaptive Deep Metric Learning (ADML) module and the Contrastive Inter-class Distraction (CID) module. Specially, the ADML takes the annotated pixels from the support image as the guidance and adaptively produces high-quality prototypes for learning a deep comparison metric. In addition, MPNet further introduces the CID module learning to enlarge the feature discrepancy of different categories in the embedding space, leading the MPNet to generate more discriminative feature embeddings. We conduct experiments on two newly constructed benchmarks, i.e., GFSP-Cityscapes and GFSP-Pascal-Context. Extensive ablation studies well demonstrate the effectiveness and generalization ability of our MPNet. CCS CONCEPTS • Computing methodologies → Computer vision.

show abstract

Attention-Based Multi-Context Guiding for Few-Shot Semantic Segmentation

Cited by 141 publications

References 13 publications

PANet: Few-Shot Image Semantic Segmentation With Prototype Alignment

PANet: Few-Shot Image Semantic Segmentation With Prototype Alignment

CNN Architectures for Geometric Transformation-Invariant Feature Representation in Computer Vision: A Review

Meta Parsing Networks: Towards Generalized Few-shot Scene Parsing with Adaptive Metric Learning

Contact Info

Product

Resources

About