A digital surface model (DSM) provides the geometry and structure of an urban environment with buildings being the most prominent objects in it. Built-up areas change with time due to the rapid expansion of cities. New buildings are being built, existing ones are expanded, and old buildings are torn down. As a result, 3D surface models can increase the understanding and explanation of complex urban scenarios. They are very useful in numerous fields of remote sensing applications, in tasks related to 3D reconstruction and city modeling, planning, visualization, disaster management, navigation, and decision-making, among others. DSMs are typically derived from various acquisition techniques, like photogrammetry, laser scanning, or synthetic aperture radar (SAR). The generation of DSMs from very high resolution optical stereo satellite imagery leads to high resolution DSMs which often suffer from mismatches, missing values, or blunders, resulting in coarse building shape representation. To overcome these problems, we propose a method for 3D surface model generation with refined building shapes to level of detail (LoD) 2 from stereo half-meter resolution satellite DSMs using deep learning techniques. Mainly, we train a conditional generative adversarial network (cGAN) with an objective function based on least square residuals to generate an accurate LoD2-like DSM with enhanced 3D object shapes directly from the noisy stereo DSM input. In addition, to achieve close to LoD2 shapes of buildings, we introduce a new approach to generate an artificial DSM with accurate and realistic building geometries from city geography markup language (CityGML) data, on which we later perform a training of the proposed cGAN architecture. The experimental results demonstrate the strong potential to create large-scale remote sensing elevation models where the buildings exhibit better-quality shapes and roof forms than just using the matching process. Moreover, the developed model is successfully applied to a different city that is unseen during the training to show its generalization capacity.2 of 20 building shapes, including the recovery of disturbed boundaries and robust reconstruction of precise rooftop geometries, is in demand.Remote sensing technology provides several ways to measure the 3D urban morphology. Conventional ground surveying, stereo airborne or satellite photogrammetry, interferometric synthetic aperture radar (InSAR), and light detection and ranging (LIDAR) are the main data sources used to obtain high-resolution elevation information [1]. The main advantage of digital surface models (DSMs) generated using ground surveying and LIDAR is their good quality and detailed object representations. However, their production is costly and time consuming, and covers relatively small areas compared with images produced with spaceborne remote sensing [2]. SAR imagery is operational in all seasons under different weather conditions. Nevertheless it has a side-looking sensor principle that is not so useful for building recognition a...