“…object detection [21,22] and semantic segmentation [23,24]. In practice, visual recognition (i.e., classification, detection and segmentation) plays a significant role in various computer vision scenarios and applications including transportation (e.g., autonomous vehicles [25,26], drones [27,28] and robots [29,30]), healthcare (e.g., analysis of CT [31] and MRI [32,33] images, cancer detection [34,35] and patient movement analysis [36]), manufacturing (e.g., defect inspection [37,38], scene text recognition [39,40] and product assembly [41,42]), construction (e.g., predictive maintenance [43,44] and personal protective equipment detection [45,46]), agriculture (e.g., crop and livestock surveillance [47,48], automatic weeding [49,50] and insect detection [51,52]), retail (e.g., self-checkout [53,54] and surveillance for unmanned supermarkets [55,56]) and entertainment (e.g., augmented reality [57,58] and virtual reality [59,60]).…”