For accurate urban planning, three-dimensional (3D) building models with a high level of detail (LOD) must be developed. However, most large-scale 3D building models are limited to a low LOD of 1–2, as the creation of higher LOD models requires the modeling of detailed building elements such as walls, windows, doors, and roof shapes. This process is currently not automated and is performed manually. In this study, an end-to-end framework for the creation of 3D building models was proposed by integrating multi-source data such as omnidirectional images, building footprints, and aerial photographs. These different data sources were matched with the building ID considering their spatial location. The building element information related to the exterior of the building was extracted, and detailed LOD3 3D building models were created. Experiments were conducted using data from Kobe, Japan, yielding a high accuracy for the intermediate processes, such as an 86.9% accuracy in building matching, an 88.3% pixel-based accuracy in the building element extraction, and an 89.7% accuracy in the roof type classification. Eighty-one LOD3 3D building models were created in 8 h, demonstrating that our method can create 3D building models that adequately represent the exterior information of actual buildings.