Finding tiny persons under the drone vision was, is and remains to be an integral and challenging task. Unmanned Aerial Vehicles (UAVs) with high-speed, low-altitude and multiperspective flight bring about violently various scales of objects, which burdens the optimization of models. Moreover, the detection performance of densely and faintly discernible person characteristics is far less than that of large objects in highresolution aerial images. In this paper, we introduce the image cropping strategy and attention mechanism based on YOLOv5 to address small person detection in the optimized VisDrone2019 dataset. Specifically, we propose a Densely Cropped and Local Attention of object detector Network (DCLANet), which is inspired by the observation that less area occupied by small objects should be fully focused and relatively magnified in the original image. DCLANet assembled Density Map Guided Object Detection (DMNet) in Aerial Images and You Only Look Twice (YOLT): Rapid Multi-Scale Object Detection In Satellite Imagery to crop images upon training and testing stage, meanwhile, added Bottleneck Attention Mechanism (BAM) to YOLOv5 baseline framework, which more focus on person objects other than irrelevant categories. To achieve further improvement of DCLANet, we also provide bags of useful strategies: data augmentation, label fusion, category filtering and hyperparameter evolution. Extensive experiments on the VisDrone2019 show that DCLANet achieves state-of-the-art performance, the detection result of person category AP val @0.5 is 50.04% with test-dev subset, which is substantially better than the previous SOTA method(DPNetV3) by 12.01%. In addition, on our optimized VisDrone2019 dataset, AP
Detecting sparse, small, lost persons with only a few pixels in high-resolution aerial images was, is, and remains an important and difficult mission, in which a vital role is played by accurate monitoring and intelligent co-rescuing for the search and rescue (SaR) system. However, many problems have not been effectively solved in existing remote-vision-based SaR systems, such as the shortage of person samples in SaR scenarios and the low tolerance of small objects for bounding boxes. To address these issues, a copy-paste mechanism (ISCP) with semi-supervised object detection (SSOD) via instance segmentation and maximum mean discrepancy distance is proposed (MMD), which can provide highly robust, multi-task, and efficient aerial-based person detection for the prototype SaR system. Specifically, numerous pseudo-labels are obtained by accurately segmenting the instances of synthetic ISCP samples to obtain their boundaries. The SSOD trainer then uses soft weights to balance the prediction entropy of the loss function between the ground truth and unreliable labels. Moreover, a novel evaluation metric MMD for anchor-based detectors is proposed to elegantly compute the IoU of the bounding boxes. Extensive experiments and ablation studies on Heridal and optimized public datasets demonstrate that our approach is effective and achieves state-of-the-art person detection performance in aerial images.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.