Interactive Video Object Segmentation Using Global and Local Transfer Modules

Heo, Yuk; Koh, Yeong Jun

doi:10.1007/978-3-030-58520-4_18

Cited by 25 publications

(58 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Caelles et al [4] introduced a round-based interactive VOS process and the automatic simulation algorithm to mimic human interactions in real applications. Many recent interactive algorithms [7,20,22,24] follow this round-based process.…”

Section: Semi-supervised Vosmentioning

confidence: 99%

“…They employed global and local distance maps in [31] to match a target frame to an annotated frame and the previous frame, respectively. Heo et al [7] designed global and local transfer modules to effectively transfer features in annotated and previous frames to a target frame. Oh et al [24] encoded annotation regions into keys and values in a non-local manner.…”

Section: Semi-supervised Vosmentioning

confidence: 99%

“…In contrast, a real user should spend considerable time to inspect the results and select poorly segmented regions. Since conventional interactive VOS algorithms [7,20,22,24] have been developed based on the simulation in [4], they do not consider the time for finding unsatisfactory results in practice.…”

Section: Introductionmentioning

confidence: 99%

“…Moreover, although interactive VOS can use the information in N annotated frames in the N th round, the conventional algorithms [7,20] do not exploit those multiple annotated frames thoroughly. Heo et al [7] simply average the features from multiple annotated frames. Miao et al [20] use only the best matching result between a target frame and multiple annotated frames.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Guided Interactive Video Object Segmentation Using Reliability-Based Attention Maps

Heo¹,

Koh²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

We propose a novel guided interactive segmentation (GIS) algorithm for video objects to improve the segmentation accuracy and reduce the interaction time. First, we design the reliability-based attention module to analyze the reliability of multiple annotated frames. Second, we develop the intersection-aware propagation module to propagate segmentation results to neighboring frames. Third, we introduce the GIS mechanism for a user to select unsatisfactory frames quickly with less effort. Experimental results demonstrate that the proposed algorithm provides more accurate segmentation results at a faster speed than conventional algorithms. Codes are available at https://github.com/yuk6heo/GIS-RAmap.

show abstract

Section: Semi-supervised Vosmentioning

confidence: 99%

Section: Semi-supervised Vosmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Guided Interactive Video Object Segmentation Using Reliability-Based Attention Maps

Heo¹,

Koh²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Video object segmentation (VOS) [16,41,115,118] is a fundamental technique to address this issue, whose purpose is to delineate pixellevel moving object 1 masks in each frame. Besides video analysis, many other applications have also benefited from VOS, such as robotic manipulation [1], autonomous cars [70], video editing [43], action segmentation [103], optical flow estimation [24], medical diagnosis [45], interactive segmentation [14,19,37,72,131], URVOS [87], and video captioning [77].…”

Section: Introductionmentioning

confidence: 99%

Full-Duplex Strategy for Video Object Segmentation

et al. 2021

Preprint

View full text Add to dashboard Cite

Previous video object segmentation approaches mainly focus on using simplex solutions between appearance and motion, limiting feature collaboration efficiency among and across these two cues. In this work, we study a novel and efficient full-duplex strategy network (FSNet) to address this issue, by considering a better mutual restraint scheme between motion and appearance in exploiting the crossmodal features from the fusion and decoding stage. Specifically, we introduce the relational cross-attention module (RCAM) to achieve bidirectional message propagation across embedding sub-spaces. To improve the model's robustness and update the inconsistent features from the spatial-temporal embeddings, we adopt the bidirectional purification module (BPM) after the RCAM. Extensive experiments on five popular benchmarks show that our FSNet is robust to various challenging scenarios (e.g., motion blur, occlusion) and achieves favourable performance against existing cutting-edges both in the video object segmentation and video salient object detection tasks. The project is publicly available at: https://dpfan.net/FSNet.

show abstract