In recent years, advances in computer hardware, graphics rendering algorithms and computer vision have enabled the utilization of 3D building reconstructions in the fields of archeological structure restoration and urban planning. This paper deals with the reconstruction of realistic 3D models of buildings façades, in the urban environment for cultural heritage. The proposed approach is an extension of our previous work in this research topic, which introduced a methodology for accurate 3D realistic façade reconstruction by defining and exploiting a relation between stereoscopic image and tacheometry data. In this work, we re-purpose well known deep neural network architectures in the fields of image segmentation and single image depth prediction, for the tasks of façade structural element detection, depth point-cloud generation and protrusion estimation, with the goal of alleviating drawbacks in our previous design, resulting in a more light-weight, robust, flexible and cost-effective design.