“…With the development of deep learning, extraordinary processes have been achieved in static image object detection [3,18,38,43,63]. Existing object detectors can be mainly divided into three categories: twostage [3,25,28,42,57], one-stage [43, 45, 49, 54-56, 64, 65] and query-based models [4,23,48,58,63,90]. For better performance, two-stage models generate a set of proposals and then refine the prediction results, like R-CNN fami-lies [11,22,28,57].…”