“…For propagation-based models [3,23,42], the guidance of segmentation masks from past frames are introduced during the process of mask decoding. For matching-based models [5,10,17,17,26,27,30,35,49,55,63,65,67], an embedding space is learnt for target objects. Recently, STM-based networks [6, 16, 28, 32, 39, 46, 47, 52? ] achieve impressive results with memory networks that memorize and read information from past frames.…”