In unmanned aerial vehicle (UAV) large-scale scene modeling, challenges such as missed shots, low overlap, and data gaps due to flight paths and environmental factors, such as variations in lighting, occlusion, and weak textures, often lead to incomplete 3D models with blurred geometric structures and textures. To address these challenges, an implicit–explicit coupling enhancement for a UAV large-scale scene modeling framework is proposed. Benefiting from the mutual promotion of implicit and explicit models, we initially address the issue of missing co-visibility clusters caused by environmental noise through large-scale implicit modeling with UAVs. This enhances the inter-frame photometric and geometric consistency. Subsequently, we enhance the multi-view point cloud reconstruction density via synthetic co-visibility clusters, effectively recovering missing spatial information and constructing a more complete dense point cloud. Finally, during the mesh modeling phase, high-quality 3D modeling of large-scale UAV scenes is achieved by inversely radiating and mapping additional texture details into 3D voxels. The experimental results demonstrate that our method achieves state-of-the-art modeling accuracy across various scenarios, outperforming existing commercial UAV aerial photography software (COLMAP 3.9, Context Capture 2023, PhotoScan 2023, Pix4D 4.5.6) and related algorithms.