Air-to-ground object detection is playing an increasingly important role in a variety of ground awareness and cognitive missions such as fighter aircraft attempting to assault and defend ground defense barrier fortifications and strike and destroy ground objects. However, air-to-ground object detection becomes very challenging due to the insufficient number of battlefield samples in air-to-ground imaging, many ground background disturbances and large-scale variation. In this paper, an improved air-to-ground object detection algorithm, YOLO-A2G, is proposed to solve this problem based on YOLOv5. In YOLO-A2G, firstly, in response to the insufficient number of samples, we used the direct and inverse Visual Focus (VF) affine a data augmentation algorithm to enrich and expand the samples in addition to the original data augmentation algorithm of YOLOv5. We then introduced the Coordinate Attention (CA) mechanism into the head network of YOLOv5 to autonomously learn explicit and implicit knowledge for the purpose of feature focusing and redundancy removal. Finally, in the post-processing stage after the network prediction, we used Weighted Boxes Fusion (WBF) instead of the traditional NMS to achieve spatial scale fusion. We performed an experimental validation using the Air-to-Ground (A2G) dataset and mAP of YOLO-A2G reached 94%.
Aerial vision object tracking technology plays an increasingly critical part in intelligent perception and task cognition when aircraft perform various reconnaissance missions. However, the particularity in object types and visual angles in aerial vision imagery render the tracking process more prone to deformation, scale variation, motion blur and occlusion, making this technique very challenging. A high-performance Siamese network tracker for aerial visual object tracking, SiamAVOT, is proposed to deal with this problem in this paper. First, we utilize the SiamCAR infrastructure to introduce the Bottleneck Attention Module (BAM) attention mechanism after its backbone network for fusion of low and high level features to improve recognition for deformation object features. Then, we change to Distance-IoU (DIoU) loss for the bounding box regression during the training process to improve the network’s ability to predict scale variation. Finally, a Kalman filter online learning module that integrates time and space trajectory information is designed to solve the object motion blur and occlusion disappearance problem in the inference process. The proposed SiamAVOT achieves leading performance on the UAV123 and AVOT40 aerial datasets and can run in real-time at 72 FPS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.