Estimating the height of buildings and vegetation in single aerial images is a challenging problem. A task-focused Deep Learning (DL) model that combines architectural features from successful DL models (U-NET and Residual Networks) and learns the mapping from a single aerial imagery to a normalized Digital Surface Model (nDSM) was proposed. The model was trained on aerial images whose corresponding DSM and Digital Terrain Models (DTM) were available and was then used to infer the nDSM of images with no elevation information. The model was evaluated with a dataset covering a large area of Manchester, UK, as well as the 2018 IEEE GRSS Data Fusion Contest LiDAR dataset. The results suggest that the proposed DL architecture is suitable for the task and surpasses other state-of-the-art DL approaches by a large margin.
This paper describes preliminary work in the recent promising approach of generating synthetic training data for facilitating the learning procedure of deep learning (DL) models, with a focus on aerial photos produced by unmanned aerial vehicles (UAV). The general concept and methodology are described, and preliminary results are presented, based on a classification problem of fire identification in forests as well as a counting problem of estimating number of houses in urban areas. The proposed technique constitutes a new possibility for the DL community, especially related to UAV-based imagery analysis, with much potential, promising results, and unexplored ground for further research.
Background The ability to automatically count animals is important to design appropriate environmental policies and to monitor their populations in relation to biodiversity and maintain balance among species. Out of all living mammals on Earth, 60% are livestock, 36% humans, and only 4% are animals that live in the wild. In a relatively short period, development of human civilization caused a loss of 83% of wildlife and 50% of plants. The rate of species extinction is accelerating. Traditional wildlife surveys provide rough population estimates. However, emerging technologies, such as aerial photography, allow to perform large-scale surveys in a short period of time with high accuracy. In this paper, we propose the use of computer vision, through deep learning (DL) architecture, together with aerial photography and density maps, to count the population of Steller sea lions and African elephants with high precision. Results We have trained two deep learning models, a basic UNet without any feature extractor (Model-1) and another with the EfficientNet-B5 feature extractor (Model-2). We measured the model’s prediction accuracy, using Root Mean Square Error (RMSE) for the predicted and actual animal count. The results showed an RMSE of 1.88 and 0.60 to count Steller sea lions and African elephants, respectively, regardless of complex background, different illumination conditions, heavy overlapping and occlusion of the animals. Conclusions Our proposed solution performed very well in the counting prediction problem, with relatively low training parameters and minimum annotation. The approach adopted, combining DL and density maps, provided better results than state-of-art deep learning models used for counting, indicating that the proposed method has the potential to be used more widely in large-scale wildlife surveying projects and initiatives.
Despite the plethora of successful Super-Resolution Reconstruction (SRR) models applied to natural images, their application to remote sensing imagery tends to produce poor results. Remote sensing imagery is often more complicated than natural images and has its peculiarities such as being of lower resolution, it contains noise, and often depicting large textured surfaces. As a result, applying non-specialized SRR models like the Enhanced Super Resolution Generative Adversarial Network (ESRGAN) on remote sensing imagery results in artifacts and poor reconstructions. To address these problems, we propose a novel strategy for enabling an SRR model to output realistic remote sensing images: instead of relying on feature-space similarities as a perceptual loss, the model considers pixel-level information inferred from the normalized Digital Surface Model (nDSM) of the image. This allows the application of betterinformed updates during the training of the model which sources from a task (elevation map inference) that is closely related to remote sensing. Nonetheless, the nDSM auxiliary information is not required during production i.e., the model infers a superresolution image without additional data. We assess our model on two remotely sensed datasets of different spatial resolutions that also contain the DSMs of the images: the DFC2018 dataset and the dataset containing the national LiDAR fly-by of Luxembourg. We compare our model with ESRGAN and we show that it achieves better performance and does not introduce any artifacts in the results. In particular, the results for the high-resolution DFC2018 dataset are realistic and almost indistinguishable from the ground truth images.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.