2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00130
|View full text |Cite
|
Sign up to set email alerts
|

Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning

Abstract: This paper tackles the problem of video object segmentation, given some user annotation which indicates the object of interest. The problem is formulated as pixel-wise retrieval in a learned embedding space: we embed pixels of the same object instance into the vicinity of each other, using a fully convolutional network trained by a modified triplet loss as the embedding model. Then the annotated pixels are set as reference and the rest of the pixels are classified using a nearest-neighbor approach. The propose… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
233
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 289 publications
(233 citation statements)
references
References 51 publications
0
233
0
Order By: Relevance
“…It has created a large number of synthetic video training data from Pascal VOC [11,12], ECSSD [49] and MSRA10K [7] DAVIS 2017 benchmark, we exclude PReMVOS [38] and OSVOS+ [39] as they both use multiple specialized networks in multiple processes to refine their results. For DAVIS 2016, we compare with OnAVOS [52], FAVOS [5], OSVOS [3], MSK [42], PML [4], SFL [6], OSMN [57], CTN [27] and VPN [26]. We detect multiple objects and evaluate in the way for single-object.…”
Section: Compare With Other Methodsmentioning
confidence: 99%
“…It has created a large number of synthetic video training data from Pascal VOC [11,12], ECSSD [49] and MSRA10K [7] DAVIS 2017 benchmark, we exclude PReMVOS [38] and OSVOS+ [39] as they both use multiple specialized networks in multiple processes to refine their results. For DAVIS 2016, we compare with OnAVOS [52], FAVOS [5], OSVOS [3], MSK [42], PML [4], SFL [6], OSMN [57], CTN [27] and VPN [26]. We detect multiple objects and evaluate in the way for single-object.…”
Section: Compare With Other Methodsmentioning
confidence: 99%
“…These faster semi-supervised approaches come in many flavours. For instance, Chen et al [7] learn a metric space for pixel embeddings, which is then used to establish associations between pixels across frames, while Cheng et al [8] suggest to individually track object parts from the first frame with a visual object tracker [2] and then aggregate them according to their similarity with the initialisation mask.…”
Section: Related Workmentioning
confidence: 99%
“…PReMVOS [40] 84.9 88.6 -OSVOS [3] 79.8 80.6 -MSK [50] 79.7 75.4 -PML [7] 75.5 79.3 -SFL [9] 76.1 76.0 -VPN [52] 70. 2 pruning.…”
Section: Comparison With the State Of The Artmentioning
confidence: 99%
See 1 more Smart Citation
“…Matching or propagation based methods have also been proposed for VOS. Matching based methods [8,19] segment pixels according to the pixel-level matching scores between the features of the first frame and of each subsequent frame ( Fig. 1 (a)), while propagation based methods [9,10,38,40,54,59] mainly rely on temporally deforming the annotated mask of the first frame via predictions of the previous frame [40] ( Fig.…”
Section: Introductionmentioning
confidence: 99%