“…In terms of segmentation quality, currently only methods based on deep convolutional networks [19,33] are strong enough to tackle segmentation datasets of difficulty similar to what fully-supervised methods can handle, such as the PASCAL VOC 2012 [9], which we make use of in this work. In particular, MIL-FCN [25], MIL-ILP [26] and the approaches of [4,18] leverage deep networks in a multiple instance learning setting, differing mainly in their pooling strategies, i.e. how they convert their internal spatial representation to per-image labels.…”