The availability of high spatio-temporal resolution of urban air temperature is paramount for understanding urban heat island (UHI) and developing effective mitigation strategies, in particular for local-scale mitigations. Obtaining high spatial resolution of air temperature at city-scale is challenging as the quantity of weather stations is often limited in cities, particularly in those less developed ones. In this study, based on the existing weather station network in Guangzhou city, China, we compare eight different air temperature interpolation models and select one with the best performance to interpolate city-scale air temperature. The training and validation of the models are performed using observatory meteorological data of 321 weather stations in Guangzhou. Deep learning-derived land cover information and social-economic data are encoded to be used as explanatory variables. The regression kriging combined with multiple linear regression is found to result in the best performance, with an average root mean squared error (RMSE) of 0.92℃ and a coefficient of determination (R2) of 0.959. Furthermore, the quantities and locations of current weather stations can be optimized by the proposed model. Guided by the k-means clustering alongside the information of geocoordinates and land cover, the number of current weather stations in Guangzhou can be reduced by 50% (i.e., 160 weather stations) while retaining the model performance. This study proposes and demonstrates an effective model for obtaining city-scale air temperature at high spatio-temporal resolution with data from sparse weather stations, which is much needed for cities which want to enhance their city-scale air temperature mapping by complementing new weather stations to their existing weather station network.