You Only Look Once (YOLO) series detectors are suitable for aerial image object detection because of their excellent real-time ability and performance. Their high performance depends heavily on the anchor generated by clustering the training set. However, the effectiveness of the general Anchor Generation algorithm is limited by the unique data distribution of the aerial image dataset. The divergence in the distribution of the number of objects with different sizes can cause the anchors to overfit some objects or be assigned to suboptimal layers because anchors of each layer are generated uniformly and affected by the overall data distribution. In this paper, we are inspired by experiments under different anchors settings and proposed the Layered Anchor Generation (LAG) algorithm. In the LAG, objects are layered by their diagonals, and then anchors of each layer are generated by analyzing the diagonals and aspect ratio of objects of the corresponding layer. In this way, anchors of each layer can better match the detection range of each layer. Experiment results showed that our algorithm is of good generality that significantly uprises the performance of You Only Look Once version 3 (YOLOv3), You Only Look Once version 5 (YOLOv5), You Only Learn One Representation (YOLOR), and Cascade Regions with CNN features (Cascade R-CNN) on the Vision Meets Drone (VisDrone) dataset and the object DetectIon in Optical Remote sensing images (DIOR) dataset, and these improvements are cost-free.
The most significant technical challenges of current aerial image object-detection tasks are the extremely low accuracy for detecting small objects that are densely distributed within a scene and the lack of semantic information. Moreover, existing detectors with large parameter scales are unsuitable for aerial image object-detection scenarios oriented toward low-end GPUs. To address this technical challenge, we propose efficient-lightweight You Only Look Once (EL-YOLO), an innovative model that overcomes the limitations of existing detectors and low-end GPU orientation. EL-YOLO surpasses the baseline models in three key areas. Firstly, we design and scrutinize three model architectures to intensify the model’s focus on small objects and identify the most effective network structure. Secondly, we design efficient spatial pyramid pooling (ESPP) to augment the representation of small-object features in aerial images. Lastly, we introduce the alpha-complete intersection over union (α-CIoU) loss function to tackle the imbalance between positive and negative samples in aerial images. Our proposed EL-YOLO method demonstrates a strong generalization and robustness for the small-object detection problem in aerial images. The experimental results show that, with the model parameters maintained below 10 M while the input image size was unified at 640 × 640 pixels, the APS of the EL-YOLOv5 reached 10.8% and 10.7% and enhanced the APs by 1.9% and 2.2% compared to YOLOv5 on two challenging aerial image datasets, DIOR and VisDrone, respectively.
The study of remote sensing image object detection has excellent research value in environmental protection and public safety. However, the performance of the detectors is unsatisfactory due to the large variability of object size and complex background noise in remote sensing images. Therefore, it is essential to improve the detection performance of the detectors. Inspired by the idea of "divide and conquer", we proposed a Multiple Receptive Field Attention(MRFA) to solve these problems and which is a plug-and-play attention method. First, we use the method of multiple receptive field feature map generation to convert the input feature map into four feature maps with different receptive fields. In this way, the small, medium, large, and immense objects in the input feature maps are "seen" in these feature maps, respectively. Then, we used the multiple attention map fusion method to focus objects of different sizes separately, which can effectively suppress noise in the background of remote sensing images. Experiments on remote sensing object detection datasets DIOR and HRRSD demonstrate that the performance of our method is better than other state-of-the-art attention modules. At the same time, the experiments on remote sensing image semantic segmentation dataset WHDLD and classification dataset AID prove the generalization and superiority of our method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.