Background
The ability to automatically count animals is important to design appropriate environmental policies and to monitor their populations in relation to biodiversity and maintain balance among species. Out of all living mammals on Earth, 60% are livestock, 36% humans, and only 4% are animals that live in the wild. In a relatively short period, development of human civilization caused a loss of 83% of wildlife and 50% of plants. The rate of species extinction is accelerating. Traditional wildlife surveys provide rough population estimates. However, emerging technologies, such as aerial photography, allow to perform large-scale surveys in a short period of time with high accuracy. In this paper, we propose the use of computer vision, through deep learning (DL) architecture, together with aerial photography and density maps, to count the population of Steller sea lions and African elephants with high precision.
Results
We have trained two deep learning models, a basic UNet without any feature extractor (Model-1) and another with the EfficientNet-B5 feature extractor (Model-2). We measured the model’s prediction accuracy, using Root Mean Square Error (RMSE) for the predicted and actual animal count. The results showed an RMSE of 1.88 and 0.60 to count Steller sea lions and African elephants, respectively, regardless of complex background, different illumination conditions, heavy overlapping and occlusion of the animals.
Conclusions
Our proposed solution performed very well in the counting prediction problem, with relatively low training parameters and minimum annotation. The approach adopted, combining DL and density maps, provided better results than state-of-art deep learning models used for counting, indicating that the proposed method has the potential to be used more widely in large-scale wildlife surveying projects and initiatives.
Despite the plethora of successful Super-Resolution Reconstruction (SRR) models applied to natural images, their application to remote sensing imagery tends to produce poor results. Remote sensing imagery is often more complicated than natural images and has its peculiarities such as being of lower resolution, it contains noise, and often depicting large textured surfaces. As a result, applying non-specialized SRR models like the Enhanced Super Resolution Generative Adversarial Network (ESRGAN) on remote sensing imagery results in artifacts and poor reconstructions. To address these problems, we propose a novel strategy for enabling an SRR model to output realistic remote sensing images: instead of relying on feature-space similarities as a perceptual loss, the model considers pixel-level information inferred from the normalized Digital Surface Model (nDSM) of the image. This allows the application of betterinformed updates during the training of the model which sources from a task (elevation map inference) that is closely related to remote sensing. Nonetheless, the nDSM auxiliary information is not required during production i.e., the model infers a superresolution image without additional data. We assess our model on two remotely sensed datasets of different spatial resolutions that also contain the DSMs of the images: the DFC2018 dataset and the dataset containing the national LiDAR fly-by of Luxembourg. We compare our model with ESRGAN and we show that it achieves better performance and does not introduce any artifacts in the results. In particular, the results for the high-resolution DFC2018 dataset are realistic and almost indistinguishable from the ground truth images.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.