This paper examines the multi-scale super-resolution challenge of digital elevation models in remote sensing. A dual-domain multi-scale attention fusion network is proposed, which reconstructs digital elevation image details step-by-step using cascading sub-networks. This model incorporates components like the wavelet guidance and separation module, multi-scale attention fusion blocks, dilated convolutional inception module, and edge enhancement module to improve feature extraction and fusion capabilities. A new loss function is designed to enhance the model’s robustness and stability. Experiments indicate that the proposed model outperforms 15 benchmark models in PSNR, RMSE, MAE, RMSEslope, and RMSEaspect metrics. In HMA data, The proposed model’s PSNR increases by 0.89 dB (~1.81%), and RMSE decreases by 1.22 m (~8.6%) compared to a state-of-the-art model. Compared to EDEM, which has the best elevation index, RMSEslope decreases by 0.79° (~16%). Additionally, the effectiveness and contribution of each DDMAFN component were verified through ablation experiments. Finally, on the SRTM dataset, The proposed model demonstrates superior performance even with interpolated degradation.