The detection of small infrared targets lacking texture and shape information in the presence of complex background clutter is a challenge that has attracted considerable research attention in recent years. Typical deep learning-based target detection methods are designed with deeper network structures, which may lose targets in the deeper layers and cannot directly be used for small infrared target detection. Therefore, we designed the attention fusion feature pyramid network (AFFPN) specifically for small infrared target detection. Specifically, it consists of feature extraction and feature fusion modules. In the feature extraction stage, the global contextual prior information of small targets is first considered in the deep layer of the network using the atrous spatial pyramid pooling module. Subsequently, the spatial location and semantic information features of small infrared targets in the shallow and deep layers are adaptively enhanced by the designed attention fusion module to improve the feature representation capability of the network for targets. Finally, high-performance detection is achieved through the multilayer feature fusion mechanism. Moreover, we performed a comprehensive ablation study to evaluate the effectiveness of each component. The results demonstrate that the proposed method performs better than state-of-the-art methods on a publicly available dataset. Furthermore, AFFPN was deployed on an NVIDIA Jetson AGX Xavier development board and achieved real-time target detection, further advancing practical research and applications in the field of unmanned aerial vehicle infrared search and tracking.
Recognition of surface targets has a vital influence on the development of military and civilian applications such as maritime rescue patrols, illegal-vessel screening, and maritime operation monitoring. However, owing to the interference of visual similarity and environmental variations and the lack of high-quality datasets, accurate recognition of surface targets has always been a challenging task. In this paper, we introduce a multi-attention residual model based on deep learning methods, in which channel and spatial attention modules are applied for feature fusion. In addition, we use transfer learning to improve the feature expression capabilities of the model under conditions of limited data. A function based on metric learning is adopted to increase the distance between different classes. Finally, a dataset with eight types of surface targets is established. Comparative experiments on our self-built dataset show that the proposed method focuses more on discriminative regions, avoiding problems like gradient disappearance, and achieves better classification results than B-CNN, RA-CNN, MAMC, and MA-CNN, DFL-CNN.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.