Existing 3D city reconstruction via oblique photography can only produce surface models, lacking semantic information about the urban environment and the ability to incorporate all individual buildings. Here, we propose a method for the semantic segmentation of 3D model data from oblique photography and for building monomer construction and implementation. Mesh data were converted into and mapped as point sets clustered to form superpoint sets via rough geometric segmentation, facilitating subsequent feature extractions. In the local neighborhood computation of semantic segmentation, a neighborhood search method based on geodesic distances, improved the rationality of the neighborhood. In addition, feature information was retained via the superpoint sets. Considering the practical requirements of large-scale 3D datasets, this study offers a robust and efficient segmentation method that combines traditional random forest and Markov random field models to segment 3D scene semantics. To address the need for modeling individual and unique buildings, our methodology utilized 3D mesh data of buildings as a data source for specific contour extraction. Model monomer construction and building contour extractions were based on mesh model slices and assessments of geometric similarity, which allowed the simultaneous and automatic achievement of these two processes.