2020
DOI: 10.48550/arxiv.2010.07958
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

Abstract: We propose a new matching-based framework for semi-supervised video object segmentation (VOS). Recently, state-of-the-art VOS performance has been achieved by matching-based algorithms, in which feature banks are created to store features for region matching and classification. However, how to effectively organize information in the continuously growing feature bank remains under-explored, and this leads to an inefficient design of the bank. We introduce an adaptive feature bank update scheme to dynamically ab… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(10 citation statements)
references
References 33 publications
(50 reference statements)
0
10
0
Order By: Relevance
“…Matching-based methods. Recently, state-of-the-art performance has been achieved by matchingbased methods [9,32,43,25,22,18,29], which perform feature matching to learn target object appearances offline. FEELVOS [32] and CFBI [43] perform the nearest neighbor matching between the current frame and the first and previous frames in the feature space.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Matching-based methods. Recently, state-of-the-art performance has been achieved by matchingbased methods [9,32,43,25,22,18,29], which perform feature matching to learn target object appearances offline. FEELVOS [32] and CFBI [43] perform the nearest neighbor matching between the current frame and the first and previous frames in the feature space.…”
Section: Related Workmentioning
confidence: 99%
“…Even though, TransVOS can still learn long-term dependency. Just like most STM-based methods [25,16,22,18,29], we synthesis video clips by applying data augmentations (random affine, color, flip, resize and crop) on a static image of datasets [4,20,15,7]. Then we use the synthetic videos to pretrain our model.…”
Section: Training and Inferencementioning
confidence: 99%
See 2 more Smart Citations
“…Semi-supervised video object segmentation. Following the taxonomy proposed by [19], recent VOS methods can be categorized into implicit and explicit according to the approach followed to address the problem.…”
Section: Introductionmentioning
confidence: 99%