Object segmentation and object tracking are fundamental research area in the computer vision community. These two topics are di cult to handle some common challenges, such as occlusion, deformation, motion blur, and scale variation. The former contains heterogeneous object, interacting object, edge ambiguity, and shape complexity. And the latter su ers from di culties in handling fast motion, out-of-view, and real-time processing. Combining the two problems of video object segmentation and tracking (VOST) can overcome their respective di culties and improve their performance. VOST can be widely applied to many practical applications such as video summarization, high de nition video compression, human computer interaction, and autonomous vehicles. This article aims to provide a comprehensive review of the state-of-theart tracking methods, and classify these methods into di erent categories, and identify new trends. First, we provide a hierarchical categorization existing approaches, including unsupervised VOS, semi-supervised VOS, interactive VOS, weakly supervised VOS, and segmentation-based tracking methods. Second, we provide a detailed discussion and overview of the technical characteristics of the di erent methods. Third, we summarize the characteristics of the related video dataset, and provide a variety of evaluation metrics. Finally, we point out a set of interesting future works and draw our own conclusions.