This study proposes a novel method, which is a U‐shaped convolutional neural network that combines non‐local attention mechanisms, Res2net residual modules, and terrain information (UNR‐Net). The original U‐Net method and the linear regression (LR) method are conducted as benchmarks. Generally, the UNR‐Net has demonstrated promise in performing a 10× downscaling for daily 2‐m temperature over North China with lead times of 1–7 days and shows superiority to the U‐Net and LR methods. To be specific, U‐Net and UNR‐Net demonstrate higher Nash‐Sutcliffe Efficiency coefficient values compared to LR by 0.052 and 0.077, respectively. The corresponding improvements in pattern correlation coefficient are 0.013 and 0.016, while the root mean square error values are higher by 0.22 and 0.338, respectively. Additionally, the structural similarity index metric is higher by 0.033 and lower by 0.015. Furthermore, regions with significant errors are primarily distributed in complex terrain areas such as the Taihang Mountains, where UNR‐Net exhibits noticeable improvements. In addition, the 12 components‐based error decomposition method is proposed to analyze the error source of different models. Generally, the smallest errors are observed during the summer season and the sequence error component is proven to be the main source error of 2‐m temperature forecasts. Furthermore, UNR‐Net consistently demonstrates the lowest errors among all 12 error components. Therefore, combining the numerical weather prediction model and deep learning method is very promising in downscaling temperature forecasts and can be applied to routine forecasting of other atmospheric variables in the future.