Automatic building extraction and delineation from high-resolution satellite imagery is an important but very challenging task, due to the extremely large diversity of building appearances. Nowadays, it is possible to use multiple high-resolution remote sensing data sources which allow the integration of different information in order to improve the extraction accuracy of building outlines. Many algorithms are built on spectral-based or appearance-based criteria, from single or fused data sources, to perform the building footprint extraction. But the features for these algorithms are usually manually extracted, which limits their accuracy. Recently developed fully convolutional networks (FCNs), which are similar to normal convolutional neural networks (CNNs), but the last fully connected layer is replaced by another convolution layer with a large "receptive field", quickly became the state-of-the-art method for image recognition tasks, as they bring the possibility to perform dense pixel-wise classification of input images. Based on these advantages, i.e., the automatic extraction of relevant features, and dense classification of images, we propose an end-to-end fully convolutional network (FCN) which effectively combines the spectral and height information from different data sources and automatically generates a full resolution binary building mask. Our architecture (FUSED-FCN4S) consists of three parallel networks merged at a late stage, which helps propagating fine detailed information from earlier layers to higher-levels, in order to produce an output with more accurate building outlines. The inputs to the proposed Fused-FCN4s are three-band (RGB), panchromatic (PAN), and normalized digital surface model (nDSM) images. Experimental results demonstrate that the fusion of several networks is able to achieve excellent results on complex data. Moreover, the developed model was successfully applied to different cities to show its generalization capacity.
ABSTRACT:Building detection and footprint extraction are highly demanded for many remote sensing applications. Though most previous works have shown promising results, the automatic extraction of building footprints still remains a nontrivial topic, especially in complex urban areas. Recently developed extensions of the CNN framework made it possible to perform dense pixel-wise classification of input images. Based on these abilities we propose a methodology, which automatically generates a full resolution binary building mask out of a Digital Surface Model (DSM) using a Fully Convolution Network (FCN) architecture. The advantage of using the depth information is that it provides geometrical silhouettes and allows a better separation of buildings from background as well as through its invariance to illumination and color variations. The proposed framework has mainly two steps. Firstly, the FCN is trained on a large set of patches consisting of normalized DSM (nDSM) as inputs and available ground truth building mask as target outputs. Secondly, the generated predictions from FCN are viewed as unary terms for a Fully connected Conditional Random Fields (FCRF), which enables us to create a final binary building mask. A series of experiments demonstrate that our methodology is able to extract accurate building footprints which are close to the buildings original shapes to a high degree. The quantitative and qualitative analysis show the significant improvements of the results in contrast to the multy-layer fully connected network from our previous work.
A digital surface model (DSM) provides the geometry and structure of an urban environment with buildings being the most prominent objects in it. Built-up areas change with time due to the rapid expansion of cities. New buildings are being built, existing ones are expanded, and old buildings are torn down. As a result, 3D surface models can increase the understanding and explanation of complex urban scenarios. They are very useful in numerous fields of remote sensing applications, in tasks related to 3D reconstruction and city modeling, planning, visualization, disaster management, navigation, and decision-making, among others. DSMs are typically derived from various acquisition techniques, like photogrammetry, laser scanning, or synthetic aperture radar (SAR). The generation of DSMs from very high resolution optical stereo satellite imagery leads to high resolution DSMs which often suffer from mismatches, missing values, or blunders, resulting in coarse building shape representation. To overcome these problems, we propose a method for 3D surface model generation with refined building shapes to level of detail (LoD) 2 from stereo half-meter resolution satellite DSMs using deep learning techniques. Mainly, we train a conditional generative adversarial network (cGAN) with an objective function based on least square residuals to generate an accurate LoD2-like DSM with enhanced 3D object shapes directly from the noisy stereo DSM input. In addition, to achieve close to LoD2 shapes of buildings, we introduce a new approach to generate an artificial DSM with accurate and realistic building geometries from city geography markup language (CityGML) data, on which we later perform a training of the proposed cGAN architecture. The experimental results demonstrate the strong potential to create large-scale remote sensing elevation models where the buildings exhibit better-quality shapes and roof forms than just using the matching process. Moreover, the developed model is successfully applied to a different city that is unseen during the training to show its generalization capacity.2 of 20 building shapes, including the recovery of disturbed boundaries and robust reconstruction of precise rooftop geometries, is in demand.Remote sensing technology provides several ways to measure the 3D urban morphology. Conventional ground surveying, stereo airborne or satellite photogrammetry, interferometric synthetic aperture radar (InSAR), and light detection and ranging (LIDAR) are the main data sources used to obtain high-resolution elevation information [1]. The main advantage of digital surface models (DSMs) generated using ground surveying and LIDAR is their good quality and detailed object representations. However, their production is costly and time consuming, and covers relatively small areas compared with images produced with spaceborne remote sensing [2]. SAR imagery is operational in all seasons under different weather conditions. Nevertheless it has a side-looking sensor principle that is not so useful for building recognition a...
Recent technical developments made it possible to supply large-scale satellite image coverage. This poses the challenge of efficient discovery of imagery. One very important task in applications like urban planning and reconstruction is to automatically extract building footprints. The integration of different information, which is presently achievable due to the availability of high-resolution remote sensing data sources, makes it possible to improve the quality of the extracted building outlines. Recently, deep neural networks were extended from image-level to pixel-level labelling, allowing to densely predict semantic labels. Based on these advances, we propose an end-to-end U-shaped neural network, which efficiently merges depth and spectral information within two parallel networks combined at the late stage for binary building mask generation. Moreover, as satellites usually provide high-resolution panchromatic images, but only low-resolution multi-spectral images, we tackle this issue by using a residual neural network block. It fuses those images with different spatial resolution at the early stage, before passing the fused information to the Unet stream, responsible for processing spectral information. In a parallel stream, a stereo digital surface model (DSM) is also processed by the Unet. Additionally, we demonstrate that our method generalizes for use in cities which are not included in the training data.
Various deep learning applications benefit from multi-task learning with multiple regression and classification objectives by taking advantage of the similarities between individual tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models compared to separately trained models. In this paper, we make an observation of such influences for important remote sensing applications like elevation model generation and semantic segmentation tasks from the stereo half-meter resolution satellite digital surface models (DSMs). Mainly, we aim to generate good-quality DSMs with complete, as well as accurate level of detail (LoD)2-like building forms and to assign an object class label to each pixel in the DSMs. For the label assignment task, we select the roof type classification problem to distinguish between flat, non-flat, and background pixels. To realize those tasks, we train a conditional generative adversarial network (cGAN) with an objective function based on least squares residuals and an auxiliary term based on normal vectors for further roof surface refinement. Besides, we investigate recently published deep learning architectures for both tasks and develop the final end-to-end network, which combines different models, as using them first separately, they provide the best results for their individual tasks.which the object belongs. Mainly, building footprint extraction or roof type classification is one of the most challenging, but important problems. It is common to use DSMs as input data for classification tasks regarding buildings [6,7], as depth information provides geometrical silhouettes and allows a better understanding of building forms. Although a vast amount of attempts have already been made on accurate pixel-wise classification [8,9], it still remains a challenging task in practice due to the wide variety of building appearances.In most cases, each task, e.g., depth image generation and pixel-wise image classification, is tackled independently, although they are closely connected. Solving those multiple tasks jointly can enhance the performance of each independent task, as well as speed up computation time. This observation leads to the advantages of multi-task (MT) learning. The approach of simultaneously improving the generalization performance of multiple outputs from a single input was applied to numerous machine learning techniques. As a promising concept for convolutional neural networks (CNNs), MT learning has been proven to leverage a variety of problems successfully, like classification and semantic segmentation [10] or classification and object detection [11]. Due to the fact that different tasks may conflict, MT learning is regarded as the optimization of MT loss, which minimizes a linear combination of contributed single-task loss functions.In this work, we aim to produce good-quality LoD2-like DSMs with realistic building geometries together with dense pixel-wise rooftop classification masks, defining multiple classes, like ground, flat, and ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.