Various deep learning applications benefit from multi-task learning with multiple regression and classification objectives by taking advantage of the similarities between individual tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models compared to separately trained models. In this paper, we make an observation of such influences for important remote sensing applications like elevation model generation and semantic segmentation tasks from the stereo half-meter resolution satellite digital surface models (DSMs). Mainly, we aim to generate good-quality DSMs with complete, as well as accurate level of detail (LoD)2-like building forms and to assign an object class label to each pixel in the DSMs. For the label assignment task, we select the roof type classification problem to distinguish between flat, non-flat, and background pixels. To realize those tasks, we train a conditional generative adversarial network (cGAN) with an objective function based on least squares residuals and an auxiliary term based on normal vectors for further roof surface refinement. Besides, we investigate recently published deep learning architectures for both tasks and develop the final end-to-end network, which combines different models, as using them first separately, they provide the best results for their individual tasks.which the object belongs. Mainly, building footprint extraction or roof type classification is one of the most challenging, but important problems. It is common to use DSMs as input data for classification tasks regarding buildings [6,7], as depth information provides geometrical silhouettes and allows a better understanding of building forms. Although a vast amount of attempts have already been made on accurate pixel-wise classification [8,9], it still remains a challenging task in practice due to the wide variety of building appearances.In most cases, each task, e.g., depth image generation and pixel-wise image classification, is tackled independently, although they are closely connected. Solving those multiple tasks jointly can enhance the performance of each independent task, as well as speed up computation time. This observation leads to the advantages of multi-task (MT) learning. The approach of simultaneously improving the generalization performance of multiple outputs from a single input was applied to numerous machine learning techniques. As a promising concept for convolutional neural networks (CNNs), MT learning has been proven to leverage a variety of problems successfully, like classification and semantic segmentation [10] or classification and object detection [11]. Due to the fact that different tasks may conflict, MT learning is regarded as the optimization of MT loss, which minimizes a linear combination of contributed single-task loss functions.In this work, we aim to produce good-quality LoD2-like DSMs with realistic building geometries together with dense pixel-wise rooftop classification masks, defining multiple classes, like ground, flat, and ...