“…Despite significant recent progress [45,15,35,24,17,27,46,49,11,25,22,50,23,13,43,54,20,44], current systems still perform poorly on arbitrary images in the wild [6]. One major obstacle is the lack of diverse training data, as most existing RGB-D datasets were collected via depth sensors and are limited to rooms [39,10,5] and roads [14]. As shown by recent work [6], systems trained on such data are unable to generalize to diverse scenes in the real world.…”