4 ByteDance https://henghuiding.github.io/MOSE Figure 1. Examples of video clips from the coMplex video Object SEgmentation (MOSE) dataset. The selected target objects are masked in orange ◼. The most notable feature of MOSE is complex scenes, including the disappearance-reappearance of objects, small/inconspicuous objects, heavy occlusions, crowded environments, etc. For example, the target player in the 2nd row turns around when reappearing in the 4th and 5th columns after disappearing in the 3rd column, bringing challenges in re-identifying him. Most videos in MOSE contain crowded and occluded objects with the target object seldom being the salient one. The goal of MOSE dataset is to provide a platform that promotes the development of more comprehensive and robust video object segmentation algorithms.