“…It has been recognized that contextual information (object relations, global scene statistics) helps object detection and recognition [197], especially for small objects, occluded objects, and with poor image quality. There was extensive work preceding deep learning [185,193,220,58,78], and also quite a few works in the era of deep learning [82,304,305,35,114]. How to efficiently and effectively incorporate contextual information remains to be explored, possibly guided by how human vision uses context, based on scene graphs [161], or via the full segmentation of objects and scenes using panoptic segmentation [134].…”