Building extraction in landslide-affected scattered mountainous areas is essential for sustainable development, as it improves disaster risk management, fosters sustainable land use, safeguards the environment, and bolsters socio-economic advancement; however, this process entails considerable challenges. This study proposes a Res-Unet-based model to extract landslide-affected buildings from unmanned aerial vehicle (UAV) data in scattered mountain regions, leveraging the feature extraction capabilities of ResNet and the precise localization abilities of U-Net. A landslide-affected, scattered mountainous region within the Three Gorges Reservoir area was selected as a case study to validate the model’s performance. Experimental results indicate that Res-Unet displays high accuracy and robustness in building recognition, attaining accuracy (ACC), intersection-over-union (IOU), and F1-score values of 0.9849, 0.9785, and 0.9892, respectively. This enhancement can be attributed to the combined model, which amalgamates the skip connections, the symmetric architecture of U-Net, and the residual blocks of ResNet. This integration preserves low-level detail during recovery at higher levels, facilitating the extraction of multi-scale features while also mitigating the vanishing gradient problem prevalent in deep network training through the residual block structure, thus enabling the extraction of more complex features. The proposed Res-Unet approach shows significant potential for the accurate recognition and extraction of buildings in complex terrains through the efficient processing of remote sensing images.