“…In order to avoid this, we consider motion similarity between these segments. 39 We assume that segments from the same object should have similar motion patterns, which distinguish them from other objects. Therefore, based on motion similarity, spatial adjacency, and labels, we design a more reliable merging process to group the segments while separating the occluded semantic regions.…”