PM2.5 pollution poses an important threat to the atmospheric environment and human health. To precisely forecast PM2.5 concentration, this study presents an innovative combined model: EMD-SE-GWO-VMD-ZCR-CNN-LSTM. First, empirical mode decomposition (EMD) is used to decompose PM2.5, and sample entropy (SE) is used to assess the subsequence complexity. Secondly, the hyperparameters of variational mode decomposition (VMD) are optimized by Gray Wolf Optimization (GWO) algorithm, and the complex subsequences are decomposed twice. Next, the sequences are divided into high-frequency and low-frequency parts by using the zero crossing rate (ZCR); the high-frequency sequences are predicted by a convolutional neural network (CNN), and the low-frequency sequences are predicted by a long short-term memory network (LSTM). Finally, the predicted values of the high-frequency and low-frequency sequences are reconstructed to obtain the final results. The experiment was conducted based on the data of 1009A, 1010A, and 1011A from three air quality monitoring stations in the Beijing area. The results indicate that the R2 value of the designed model increased by 2.63%, 0.59%, and 1.88% on average in the three air quality monitoring stations, respectively, compared with the other single model and the mixed model, which verified the significant advantages of the proposed model.