Interactive Video Object Segmentation Using Global and Local Transfer Modules

Heo, Yuk; Koh, Yeong Jun

doi:10.48550/arxiv.2007.08139

Cited by 1 publication

(3 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Oh et al [38] measure a 44% increase in error rate when extra training data is omitted, indicating that methods with many parameters are data-hungry and underperform without the help of additional data. Method AUC J AUC J &F Extra data Heo et al [39] 0.771 0.809 Heo et al [47] 0.704 -Oh et al [38] 0.691 -Miao et al [40] 0.749 --Oh et al [48] 0.702 --Oh et al [38] 0 In order to gain some insight into the data structure, we show the error rate of prediction in case of individual videos in Fig. 4 after the second and the eighth interaction steps.…”

Section: B Interactive Vos Resultsmentioning

confidence: 99%

“…A slight deficiency of their method is the lack of weight sharing between the two networks in the convolutional layers. Heo et al achieve superior results with their feature information transfer modules [39]. A drawback of their method is the need to use multiple additional segmentation datasets for their training process.…”

Section: Interactive Video Object Segmentationmentioning

confidence: 99%

“…Miao et al proposes a more efficient solution by computing all feature representations during the preprocessing stage and only utilizing shallow networks during prediction [40]. They implement global and local memory modules in a somewhat similar fashion to [39].…”

Section: Interactive Video Object Segmentationmentioning

confidence: 99%

See 2 more Smart Citations

Fast Interactive Video Object Segmentation with Graph Neural Networks

Varga¹,

Lörincz²

2021

Preprint

View full text Add to dashboard Cite

Pixelwise annotation of image sequences can be very tedious for humans. Interactive video object segmentation aims to utilize automatic methods to speed up the process and reduce the workload of the annotators. Most contemporary approaches rely on deep convolutional networks to collect and process information from human annotations throughout the video. However, such networks contain millions of parameters and need huge amounts of labeled training data to avoid overfitting. Beyond that, label propagation is usually executed as a series of frame-by-frame inference steps, which is difficult to be parallelized and is thus time consuming. In this paper we present a graph neural network based approach for tackling the problem of interactive video object segmentation. Our network operates on superpixel-graphs which allow us to reduce the dimensionality of the problem by several magnitudes. We show, that our network possessing only a few thousand parameters is able to achieve state-of-the-art performance, while inference remains fast and can be trained quickly with very little data.

show abstract