Abstract. 3D data generation often requires expensive data collection such as aerial photogrammetric or LiDAR flight. In cases such data are unavailable, for example, areas of interest inaccessible from aerial platforms, alternative sources to be considered can be quite heterogeneous and come in the form of different accuracy, resolution and views, which challenge the standard data processing workflows. Assuming only overview satellite and ground-level go-pro images are available, which we call cross-view data due to the significant view differences, this paper introduces a framework from our project, consisting of a few novel algorithms that convert such challenging dataset to 3D textured mesh models containing both top and façade features. The necessary methods include 3D point cloud generation from satellite overview images and ground-level images, geo-registration and meshing. We firstly introduce the problems and discuss the potential challenges and introduce our proposed methods to address these challenges. Finally, we practice our proposed framework on a dataset consisting of twelve satellite images and 150k video frames acquired through a vehicle-mounted Go-pro camera and demonstrate the reconstruction results. We have also compared our results with results generated from an intuitive processing pipeline that involves typical geo-registration and meshing methods.