Recently, the memory-based approach, which performs non-local matching between previously segmented frames and a query frame, has led to significant improvement in video object segmentation. However, the positional proximity of the target objects between the query and the local memory (previous frame), i.e. temporal smoothness, is often neglected. There are some attempts to solve the problem, but they are vulnerable and sensitive to large movements of target objects. In this paper, we propose local memory read-and-compare operations to address the problem. First, we propose local memory read and sequential local memory read modules to explore temporal smoothness between neighboring frames. Second, we propose the memory comparator to read the global memory and local memory adaptively by comparing the affinities of the global and local memories. Experimental results demonstrate that the proposed algorithm yields more strict segmentation results than the recent state-of-the-art algorithms. For example, the proposed algorithm improves the video object segmentation performance by 0.4% and 0.5% in terms of J &F on the most commonly used datasets, DAVIS2016 and DAVIS2017, respectively.
An interactive video object segmentation algorithm, which takes scribble annotations on query objects as input, is proposed in this paper. We develop a deep neural network, which consists of the annotation network (A-Net) and the transfer network (T-Net). First, given user scribbles on a frame, A-Net yields a segmentation result based on the encoder-decoder architecture. Second, T-Net transfers the segmentation result bidirectionally to the other frames, by employing the global and local transfer modules. The global transfer module conveys the segmentation information in an annotated frame to a target frame, while the local transfer module propagates the segmentation information in a temporally adjacent frame to the target frame. By applying A-Net and T-Net alternately, a user can obtain desired segmentation results with minimal efforts. We train the entire network in two stages, by emulating user scribbles and employing an auxiliary loss. Experimental results demonstrate that the proposed interactive video object segmentation algorithm outperforms the state-of-the-art conventional algorithms. Codes and models are available at https://github.com/yuk6heo/IVOS-ATNet.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.