In this paper, we propose a novel method for person detection in aerial images of nonurban terrain gathered by an Unmanned Aerial Vehicle (UAV), which plays an important role in Search And Rescue (SAR) missions. The UAV in SAR operations contributes significantly due to the ability to survey a larger geographical area from an aerial viewpoint. Because of the high altitude of recording, the object of interest (person) covers a small part of an image (around 0.1%), which makes this task quite challenging. To address this problem, a multimodel deep learning approach is proposed. The solution consists of two different convolutional neural networks in region proposal, as well as in the classification stage. Additionally, contextual information is used in the classification stage in order to improve the detection results. Experimental results tested on the HERIDAL dataset achieved precision of 68.89% and a recall of 94.65%, which is better than current state-of-the-art methods used for person detection in similar scenarios. Consequently, it may be concluded that this approach is suitable for usage as an auxiliary method in real SAR operations.
Recent results in person detection using deep learning methods applied to aerial images gathered by Unmanned Aerial Vehicles (UAVs) have demonstrated the applicability of this approach in scenarios such as Search and Rescue (SAR) operations. In this paper, the continuation of our previous research is presented. The main goal is to further improve detection results, especially in terms of reducing the number of false positive detections and consequently increasing the precision value. We present a new approach that, as input to the multimodel neural network architecture, uses sequences of consecutive images instead of only one static image. Since successive images overlap, the same object of interest needs to be detected in more than one image. The correlation between successive images was calculated, and detected regions in one image were translated to other images based on the displacement vector. The assumption is that an object detected in more than one image has a higher probability of being a true positive detection because it is unlikely that the detection model will find the same false positive detections in multiple images. Based on this information, three different algorithms for rejecting detections and adding detections from one image to other images in the sequence are proposed. All of them achieved precision value about 80% which is increased by almost 20% compared to the current state-of-the-art methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.