The mitigation of energy usage in urban areas, especially in buildings, has recently captured the attention of many city managers. Owing to the thermal images' limited resolution, especially at the edges, creating a high-resolution (HR) surface model from them is a challenging process. This research proposes a two-phase strategy to generate an HR four-dimensional thermal surface model of building roofs. In the single-source modification phase, an enhanced thermal orthophoto is produced by retraining the enhanced deep residual super-resolution deep network, and then, using state-of-the-art structures from motion, semi-global matching, and space intersection. The final surface model's resolution is raised by combining thermal data with visible unmanned aerial vehicle images to overcome the limitation of singlesource methods in resolution increase. To this end, after visible orthophoto and digital surface model generation, buildings and their boundaries are extracted using the multi-feature semantic segmentation method. Next, in the multi-source modification phase, a fine-registered enhanced thermal orthophoto is generated, and thermal edges are identified around the boundary of the building. The visible and thermal boundaries are then matched, and any smoothness in the temperature edges is eliminated. The results show that the average difference in position between the thermal edges and building boundaries is reduced, and temperature smoothness is completely eliminated at the building edges.