“…This approach has been found to be effective for segmentation [177,241] and human pose estimation [194], has been widely exploited by both one-stage and two-stage detectors to alleviate problems of scale variation across object instances. Representative methods include SharpMask [214], Deconvolutional Single Shot Detector (DSSD) [77], Feature Pyramid Network (FPN) [167], Top Down Modulation (TDM) [247], Reverse connection with Objectness prior Network (RON) [136], ZIP [156], Scale Transfer Detection Network (STDN) [321], RefineDet [308], StairNet [283], Path Aggregation Network (PANet) [174], Feature Pyramid Reconfiguration (FPR) [137], DetNet [164], Scale Aware Network (SAN) [133], Multiscale Location aware Kernel Representation (MLKP) [278] and M2Det [315], as shown in Table 7 and contrasted in Fig. 17.…”