“…Video object segmentation (VOS) [16,41,115,118] is a fundamental technique to address this issue, whose purpose is to delineate pixellevel moving object 1 masks in each frame. Besides video analysis, many other applications have also benefited from VOS, such as robotic manipulation [1], autonomous cars [70], video editing [43], action segmentation [103], optical flow estimation [24], medical diagnosis [45], interactive segmentation [14,19,37,72,131], URVOS [87], and video captioning [77].…”