Lane occupancy is a crucial indicator of traffic flow and is significant for traffic management and planning. However, predicting lane occupancy is challenging due to numerous influencing factors, such as weather, holidays, and events, which render the data nonsmooth. To enhance lane occupancy prediction accuracy, this study introduces a fusion model that combines the CT-Transformer (CSPNet-Attention and Two-stage Transformer framework) with the Temporal Convolutional Neural Network-Long Short-Term Memory (TCN-LSTM) models alongside the Variational Mode. This includes a long-term lane occupancy prediction model utilizing the Variational Mode Decomposition (VMD) technique. Initially, the Variational Mode Decomposition decomposes the original traffic flow data into multiple smooth subsequences. Subsequently, each subsequence’s autocorrelation and partial correlation coefficients ascertain the presence of seasonal characteristics. Based on these characteristics, the CT-Transformer and TCN-LSTM models process each subsequence for long-term lane occupancy rate prediction, respectively. Finally, predictions from both models are integrated using variable modes to derive the ultimate lane occupancy predictions. The core CT-Transformer model, an enhancement of the GBT (Two-stage Transformer) model, comprises two phases: autoregressive and prediction. The autoregressive phase leverages historical data for initial predictions inputted into the prediction phase. Here, the novel CSPNet-Attention mechanism replaces the conventional attention mechanism in the Encoder, reducing memory usage and computational resource loss, thereby enhancing the model’s accuracy and robustness. Experiments on the PeMS public dataset demonstrate that the proposed model surpasses existing methods in predicting long-term lane occupancy, offering decent reliability and generalizability.