“…This transfer learning approach enables reusing already existing models [48,55,63,70,75,76,[78][79][80][81][82]87,91,100,103,107], previously trained on public standard object detection benchmarks, such as Pascal VOC [115] (used in [48]), and Microsoft COCO [126] (used in [63,75,78,80,91,107]). Still, as shown in Table 2, it is a common practice as well to exploit not the entire detection model but merely the backbone [49,50,59,68,71,95,106,109], being this a CNN embedded in the detection framework, responsible for extracting from some given input images the different feature maps subsequently exploited by the deeper layers of the detector for predicting the several classes and bounding boxes produced as output. In any case, either globally or just circumscribed to the backbone, weights are initialized with values taken directly from pre-trained models, and then, through a fine-tuning process, the detector is re-trained on an application-specific dataset in order to adjust it to the specific use case to be addressed.…”