I) Input image with extreme points provided by annotator II) Machine predictions from extreme points III) corrective scribbles provided by annotator IV) Machine predictions from extreme points and corrective scribbles Figure 1. Illustration of our interactive full image segmentation workflow. First (I) the annotator marks extreme points. Then (II) our model (Sec. 3) uses them to generate a segmentation. This is presented to the annotator, after which we iterate: (III) the annotator makes corrections using scribbles (Sec. 4), and (IV) our model uses them to update the predicted segmentation (Sec. 3).
AbstractWe address interactive full image annotation, where the goal is to accurately segment all object and stuff regions in an image. We propose an interactive, scribble-based annotation framework which operates on the whole image to produce segmentations for all regions. This enables sharing scribble corrections across regions, and allows the annotator to focus on the largest errors made by the machine across the whole image. To realize this, we adapt Mask-RCNN [22] into a fast interactive segmentation framework and introduce an instance-aware loss measured at the pixellevel in the full image canvas, which lets predictions for nearby regions properly compete for space. Finally, we compare to interactive single object segmentation on the COCO panoptic dataset [11,27,34]. We demonstrate that our interactive full image segmentation approach leads to a 5% IoU gain, reaching 90% IoU at a budget of four extreme clicks and four corrective scribbles per region.