With the acceleration of urbanisation, urban areas are subject to the combined effects of the accumulation of various natural factors, such as changes in temperature leading to the thermal expansion or contraction of surface materials (rock, soil, etc.) and changes in precipitation and humidity leading to an increase in the self-weight of soil due to the infiltration of water along the cracks or pores in the ground. Therefore, the subsidence of urban areas has now become a serious geological disaster phenomenon. However, the use of traditional neural network prediction models has limitations when examining the causal relationships between time series surface deformation data and multiple influencing factors and when applying multiple influencing factors for predictive analyses. To this end, Sentinel-1A data from March 2017 to February 2023 were used as the data source in this paper, based on time series deformation data acquired using the small baseline subset interferometric synthetic aperture radar (SBAS-InSAR) technique. A sparrow search algorithm–convolutional neural network–long short-term memory (SSA-CNN-LSTM) neural network prediction model was built. The six factors of temperature, humidity, precipitation, and ground temperature at three different depths below the surface (5 cm, 10 cm, and 15 cm) were taken as the input of the model, and the surface deformation data were taken as the output of the neural network model. The correlation between the spatial and temporal evolution characteristics of the ground subsidence in urban areas and various influencing factors was analysed using grey correlation analysis, which proved that these six factors contribute to some extent to the deformation of the urban surface. The main urban area of Hohhot City, Inner Mongolia Autonomous Region, was used as the study area. In order to verify the efficacy of this neural network prediction model, the prediction effects of the multilayer perceptron (MLP), backpropagation (BP), and SSA-CNN-LSTM models were compared and analysed, with the values of the correlation coefficients of the feature points of A1, B1, and C1 being in the range of 0.92, 0.83, and 0.93, respectively. The results show that compared with the traditional MLP and BP neural network models, the SSA-CNN-LSTM model achieves a higher performance in predicting time series surface deformation data in urban areas, which provides new ideas and methods for this area of research.