MCEENet: Multi-Scale Context Enhancement and Edge-Assisted Network for Few-Shot Semantic Segmentation

Zhang, Hongjie; Zhang, Rufei; He, Xiaoyu; Li, Nannan; Wang, Yong; Shen, Sheng

doi:10.3390/s23062922

Cited by 8 publications

(2 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Referring video object segmentation (R-VOS). R-VOS introduces the language expression for target object tracking and segmentation, following the trend of vision-language learning (Zhang et al 2022(Zhang et al , 2023bZhu et al 2023;Fang et al 2023). Existing R-VOS methods can be broadly classified into three categories.…”

Section: Related Workmentioning

confidence: 99%

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Yan,

Zhang,

Guo

et al. 2024

AAAI

View full text Add to dashboard Cite

Recently, video object segmentation (VOS) referred by multi-modal signals, e.g., language and audio, has evoked increasing attention in both industry and academia. It is challenging for exploring the semantic alignment within modalities and the visual correspondence across frames. However, existing methods adopt separate network architectures for different modalities, and neglect the inter-frame temporal interaction with references. In this paper, we propose MUTR, a Multi-modal Unified Temporal transformer for Referring video object segmentation. With a unified framework for the first time, MUTR adopts a DETR-style transformer and is capable of segmenting video objects designated by either text or audio reference. Specifically, we introduce two strategies to fully explore the temporal relations between videos and multi-modal signals. Firstly, for low-level temporal aggregation before the transformer, we enable the multi-modal references to capture multi-scale visual cues from consecutive video frames. This effectively endows the text or audio signals with temporal knowledge and boosts the semantic alignment between modalities. Secondly, for high-level temporal interaction after the transformer, we conduct inter-frame feature communication for different object embeddings, contributing to better object-wise correspondence for tracking along the video. On Ref-YouTube-VOS and AVSBench datasets with respective text and audio references, MUTR achieves +4.2% and +8.7% J&F improvements to state-of-the-art methods, demonstrating our significance for unified multi-modal VOS. Code is released at https://github.com/OpenGVLab/MUTR.

show abstract

Section: Related Workmentioning

confidence: 99%

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Yan,

Zhang,

Guo

et al. 2024

AAAI

View full text Add to dashboard Cite

show abstract

“…Excitability in few-shot image segmentation, particularly in the context of remote sensing aerial images, have focused on the development of novel models and techniques that enhance the performance of segmentation tasks and provide insights into the decision-making process of the models. One such advancement is the Self-Enhanced Mixed Attention Network (SEMANet) proposed (Song et al 2023). SEMANet utilizes three-modal (Visible-Depth-Thermal) images for few-shot semantic segmentation tasks.…”

Section: Few-shot Image Segmentation In Remote Sensingmentioning

confidence: 99%

Unlocking the capabilities of explainable few-shot learning in remote sensing

Lee,

Dam,

Ferdaus

et al. 2024

Artif Intell Rev

View full text Add to dashboard Cite

Recent advancements have significantly improved the efficiency and effectiveness of deep learning methods for image-based remote sensing tasks. However, the requirement for large amounts of labeled data can limit the applicability of deep neural networks to existing remote sensing datasets. To overcome this challenge, few-shot learning has emerged as a valuable approach for enabling learning with limited data. While previous research has evaluated the effectiveness of few-shot learning methods on satellite-based datasets, little attention has been paid to exploring the applications of these methods to datasets obtained from Unmanned Aerial Vehicles (UAVs), which are increasingly used in remote sensing studies. In this review, we provide an up-to-date overview of both existing and newly proposed few-shot classification techniques, along with appropriate datasets that are used for both satellite-based and UAV-based data. We demonstrate few-shot learning can effectively handle the diverse perspectives in remote sensing data. As an example application, we evaluate state-of-the-art approaches on a UAV disaster scene dataset, yielding promising results. Furthermore, we highlight the significance of incorporating explainable AI (XAI) techniques into few-shot models. In remote sensing, where decisions based on model predictions can have significant consequences, such as in natural disaster response or environmental monitoring, the transparency provided by XAI is crucial. Techniques like attention maps and prototype analysis can help clarify the decision-making processes of these complex models, enhancing their reliability. We identify key challenges including developing flexible few-shot methods to handle diverse remote sensing data effectively. This review aims to equip researchers with an improved understanding of few-shot learning’s capabilities and limitations in remote sensing, while pointing out open issues to guide progress in efficient, reliable and interpretable data-efficient techniques.

show abstract

Self-supervised Few-Shot Learning for Semantic Segmentation: An Annotation-Free Approach

Karimijafarbigloo,

Azad,

Merhof

2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

MCEENet: Multi-Scale Context Enhancement and Edge-Assisted Network for Few-Shot Semantic Segmentation

Cited by 8 publications

References 53 publications

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Unlocking the capabilities of explainable few-shot learning in remote sensing

Self-supervised Few-Shot Learning for Semantic Segmentation: An Annotation-Free Approach

Contact Info

Product

Resources

About