Aerial vision object tracking technology plays an increasingly critical part in intelligent perception and task cognition when aircraft perform various reconnaissance missions. However, the particularity in object types and visual angles in aerial vision imagery render the tracking process more prone to deformation, scale variation, motion blur and occlusion, making this technique very challenging. A high-performance Siamese network tracker for aerial visual object tracking, SiamAVOT, is proposed to deal with this problem in this paper. First, we utilize the SiamCAR infrastructure to introduce the Bottleneck Attention Module (BAM) attention mechanism after its backbone network for fusion of low and high level features to improve recognition for deformation object features. Then, we change to Distance-IoU (DIoU) loss for the bounding box regression during the training process to improve the network’s ability to predict scale variation. Finally, a Kalman filter online learning module that integrates time and space trajectory information is designed to solve the object motion blur and occlusion disappearance problem in the inference process. The proposed SiamAVOT achieves leading performance on the UAV123 and AVOT40 aerial datasets and can run in real-time at 72 FPS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.