Traffic flow prediction is a key step to realizing the effective guidance and control of intelligent transportation systems. For using the short-term non-stationarity and spatio-temporal correlation presented in traffic flow, a spatio-temporal hybrid prediction model, which called ST-VGBiGRU, based on improved Variational Modal Decomposition (VMD), Graph Attention Network (GAT), and Bidirectional Gated Recurrent Unit (BiGRU) network is proposed. First, the traffic flow sequence is decomposed into a series of relatively stationary modal components using VMD algorithm to reduce its short-term nonstationarity. The high-frequency modal components are noise-reduced using the Fuzzy Entropy (Fuzzy En) method to improve the accuracy of decomposition. After that, the GAT network is used to capture the different attention levels of the prediction node to their neighboring traffic nodes, which obtain more spatial characteristics of traffic flow. Then, each modal component containing spatial features is fed into the BiGRU network separately to capture its temporal correlation. Each model parameter is trained to the optimum using the improved RMSProp algorithm, which improves the model's prediction accuracy while speeding up the convergence of RMSProp algorithm. In order to illustrate the performance of the ST-VGBiGRU model, the RTMC traffic dataset is used to conduct ablation experiments on the improved VMD module, the GAT module, and the improved BiGRU module. Meanwhile, we combined the PeMS traffic dataset to conduct baseline experiments and multi-step prediction experiments with the other six models. The results show that the prediction performance of our model is better than all the other baseline models.