Geothermal resources are efficient, clean, and renewable energy sources. Using high-resolution images captured by remote sensing satellites for temperature retrieval and searching for geothermal anomaly areas is an efficient method. However, obtaining land surface temperature retrieval images requires multiple steps of calculation, which can result in a great loss of image information and resolution. Therefore, the super-resolution reconstruction of LST retrieval images is currently a challenge in geothermal resource exploration. Although the current super-resolution methods for LST retrieval images can appropriately restore image quality, the overall restoration of the surface temperature information in the region is still not ideal. We propose a cross-scale reference image super-resolution model based on a diffusion model using deep learning technology. First, we propose the Pre-Super-Resolution Network (PreNet), which can improve both indices and the visual effect of images. Second, to reduce the white noise in the super-resolution images, we propose the Cross-Scale Reference Image Attention Mechanism (CSRIAM). The introduction of this mechanism greatly reduces noise in the images and improves the overall image quality. Compared to previous methods, we improved both experimental indices such as Peak Signal-to-Noise Ratio (PSNR), Structural Similarity (SSIM), etc., and vision quality, and optimized the recovery of geothermal anomalies. Through our experimental results, we found that the CS-Diffusion model has a very strong ability to restore the image quality of the LST retrieval. After restoring its image quality, we can make a positive contribution to subsequent geothermal resource exploration.