YouTube-VOS: Sequence-to-Sequence Video Object Segmentation

Xu, Ning; Yang, Linjie; Yi, Fan; Yang, Jianchao; Yue, Dingcheng; Liang, Yuchen; Price, Brian; Cohen, Scott; Huang, Thomas S.

doi:10.1007/978-3-030-01228-1_36

Cited by 512 publications

(533 citation statements)

References 43 publications

Supporting

Mentioning

532

Contrasting

Order By: Relevance

“…Datasets We evaluate our method on two video object segmentation datasets: Youtube-VOS [37] and DAVIS-2017 [25]. Training The network is trained using the objective function described in 3.4.…”

Section: Methodsmentioning

confidence: 99%

“…Semi-supervised video object segmentation: Earlier works in video object segmentation used hand-crafted features based on appearance, boundary and optical flow [1,9,15,27,23]. The availability of large-scale video object segmentation datasets [25,37] enabled us to explore deep learning methods for this problem. Most of the early works are mainly motivated by the image segmentation methods [3,35,20].…”

Section: Related Workmentioning

confidence: 99%

“…This is a challenging problem because of issues like occlusion, changes in object appearance over time, motion blur, fast motions, and scale variations of different objects. Deep learning approaches have achieved impressive results and the recent release of the Youtube-VOS dataset [37] has allowed for the training and evaluation of new methods on a wider variety of videos and objects.…”

Section: Introductionmentioning

confidence: 99%

“…The majority of current approaches can be divided into two categories. The first are detection-based methods [2,4,14] that learn representations of the object segmented in the first frame and attempt to perform the pixel-wise detection of this object in future frames; the second is propagationbased methods [7,12,28,33,36] that formulate the task as a tracking problem and attempt to propagate the mask to fit the object over time. The first set of methods tends to segment single frames independently and rarely employ temporal information, while the later set segments single frames sequentially and makes use of temporal information, usually in the form of optical flow or RNNs.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing

Duarte

Rawat

Shah

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

In this work we propose a capsule-based approach for semi-supervised video object segmentation. Current video object segmentation methods are frame-based and often require optical flow to capture temporal consistency across frames which can be difficult to compute. To this end, we propose a video based capsule network, CapsuleVOS, which can segment several frames at once conditioned on a reference frame and segmentation mask. This conditioning is performed through a novel routing algorithm for attention-based efficient capsule selection. We address two challenging issues in video object segmentation: 1) segmentation of small objects and 2) occlusion of objects across time. The issue of segmenting small objects is addressed with a zooming module which allows the network to process small spatial regions of the video. Apart from this, the framework utilizes a novel memory module based on recurrent networks which helps in tracking objects when they move out of frame or are occluded. The network is trained end-to-end and we demonstrate its effectiveness on two benchmark video object segmentation datasets; it outperforms current offline approaches on the Youtube-VOS dataset while having a run-time that is almost twice as fast as competing methods. The code is publicly available at https://github.com/KevinDuarte/CapsuleVOS.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing

Duarte

Rawat

Shah

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

show abstract

“…Video object and instance segmentation problems have received significant attention [1,35,25] attributed to the recent availability of high-quality datasets, e.g., YouTube-VOS [45], DAVIS [29,30]. Given an input video, the aim is to separate the objects or instances from the background at the pixel-level.…”

Section: Introductionmentioning

confidence: 99%

DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation

Zeng

Liao

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

In this paper, we propose the differentiable maskmatching network (DMM-Net) for solving the video object segmentation problem where the initial object masks are provided. Relying on the Mask R-CNN backbone, we extract mask proposals per frame and formulate the matching between object templates and proposals at one time step as a linear assignment problem where the cost matrix is predicted by a CNN. We propose a differentiable matching layer by unrolling a projected gradient descent algorithm in which the projection exploits the Dykstra's algorithm. We prove that under mild conditions, the matching is guaranteed to converge to the optimum. In practice, it performs similarly to the Hungarian algorithm during inference. Meanwhile, we can back-propagate through it to learn the cost matrix. After matching, a refinement head is leveraged to improve the quality of the matched mask. Our DMM-Net achieves competitive results on the largest video object segmentation dataset YouTube-VOS. On DAVIS 2017, DMM-Net achieves the best performance without online learning on the first frames. Without any fine-tuning, DMM-Net performs comparably to state-of-the-art methods on SegTrack v2 dataset. At last, our matching layer is very simple to implement; we attach the PyTorch code (< 50 lines) in the supplementary material. Our code is released at https://github.com/ZENGXH/DMM_Net.

show abstract

PReMVOS: Proposal-Generation, Refinement and Merging for Video Object Segmentation

Luiten

Voigtlaender

Leibe

2019

Lecture Notes in Computer Science

239

263

View full text Add to dashboard Cite

We address semi-supervised video object segmentation, the task of automatically generating accurate and consistent pixel masks for objects in a video sequence, given the first-frame ground truth annotations. Towards this goal, we present the PReMVOS algorithm (Proposalgeneration, Refinement and Merging for Video Object Segmentation). Our method separates this problem into two steps, first generating a set of accurate object segmentation mask proposals for each video frame and then selecting and merging these proposals into accurate and temporally consistent pixel-wise object tracks over a video sequence in a way which is designed to specifically tackle the difficult challenges involved with segmenting multiple objects across a video sequence. Our approach surpasses all previous state-of-the-art results on the DAVIS 2017 video object segmentation benchmark with a J &F mean score of 71.6 on the test-dev dataset, and achieves first place in both the DAVIS 2018 Video Object Segmentation Challenge and the YouTube-VOS 1st Large-scale Video Object Segmentation Challenge.

show abstract

YouTube-VOS: Sequence-to-Sequence Video Object Segmentation

Cited by 512 publications

References 43 publications

CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing

CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing

DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation

PReMVOS: Proposal-Generation, Refinement and Merging for Video Object Segmentation

Contact Info

Product

Resources

About