2020
DOI: 10.1007/978-3-030-58536-5_46
|View full text |Cite
|
Sign up to set email alerts
|

Learning What to Learn for Video Object Segmentation

Abstract: Video object segmentation (VOS) is a highly challenging problem, since the target object is only defined during inference with a given first-frame reference mask. The problem of how to capture and utilize this limited target information remains a fundamental research question. We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shot learning module. This internal learner is designed to predict a powerful parametric model of the target by minimizing a seg… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
110
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 118 publications
(110 citation statements)
references
References 36 publications
0
110
0
Order By: Relevance
“…This extends paper D (and paper [2]) to improve the robustness of the correlationfilter model, by additionally modeling distracting objects. The author initiated the project, contributed to the idea development, implemen tation, experiment design and execution and writing.…”
Section: Contributionsmentioning
confidence: 62%
See 3 more Smart Citations
“…This extends paper D (and paper [2]) to improve the robustness of the correlationfilter model, by additionally modeling distracting objects. The author initiated the project, contributed to the idea development, implemen tation, experiment design and execution and writing.…”
Section: Contributionsmentioning
confidence: 62%
“…At 76 percent, our method is approximately 7 percentage points below STM, but is significantly faster at 22 frames per second. Paper D was subsequently extended into the learningwhattolearn (LWL) method [2] for video object segmentation. The main feature of LWL is that, unlike the method in paper D, it can backpropagate through the target model trainer.…”
Section: Performance Evaluationmentioning
confidence: 99%
See 2 more Smart Citations
“…Typically, these layers comprise simple operators like convolutional filters, standard linear operators, point-wise non-linear activation functions, or self-attention modules [110], but they compose a highly descriptive and powerful function. We further note that the layers can also include more complex processes and even optimization algorithms [41,5,6].…”
Section: Representations and Learningmentioning
confidence: 99%