Displacement prediction of transmission towers is essential for the early warning of transmission network deformation. However, there is still a lack of prediction on the ground subsidence of the tower foundation. In this study, we first used the multi-temporal interferometric synthetic aperture radar (MT-InSAR) approach to acquire time series deformation for the transmission lines in the Salt Lake area. Based on the K-shape clustering method and field investigation results, towers #95 and #151 with representative foundation deformation characteristics were selected for displacement prediction. Combined with field investigations and the characteristics of saline soil in the Salt Lake area, the trigger factors of transmission tower deformation were analyzed. Then, the displacement and trigger factors of the transmission tower were decomposed by variational mode decomposition (VMD), which could closely connect the characteristics of the foundation saline soil with the influence of the trigger factors. To analyze the contribution of each trigger factor, the maximum information coefficient (MIC) was quantified, and the best choice was made. Finally, the hyperparameters of the long short-term memory (LSTM) neural networks were optimized using a convolutional neural network (CNN) and the grey wolf optimizer (GWO). The findings reveal that the refined deep learning models outperform the initial model in generalization potential and prediction precision, with the CNN–LSTM model demonstrating the highest accuracy in predicting the total displacement of tower #151 (RMSE and R2 for the validation set are 0.485 and 0.972, respectively). Given the scant research on the multifactorial influence on the ground subsidence displacement of transmission towers, this study’s methodology offers a novel perspective for monitoring and early warning of ground subsidence disasters in transmission networks.