“…Moreover, this also ignores vast amount of freely available weakly structured data. As a result, recent works in object detection have turned towards utilizing weakly labelled data [5, 6,7,8,9,10,11,12], particularly 20 videos [13,14,15]. Critically, these methods assume that motion or appearance of objects are sufficient descriptors for segmenting them with relative ease, which is indeed the case for large active objects such as flying airplanes and walking tigers.…”