Prolonged exposure to high concentrations of suspended particulate matter (SPM), especially aerodynamic fine particulate matter that is ≤2.5 μm in diameter (PM2.5), can cause serious harm to human health and life via the induction of respiratory diseases and lung cancer. Therefore, accurate prediction of PM2.5 concentrations is important for human health management and governmental environmental management decisions. However, the time-series processing of PM2.5 concentration based only on a single region and a special time period is less explanatory, and thus, the spatial-temporal applicability of the model is more restricted. To address this problem, this paper constructs a PM2.5 concentration prediction optimization model based on Convolutional Neural Networks-Long Short-Term Memory (CNN-LSTM). Hourly data of atmospheric pollutants, meteorological parameters, and Precipitable Water Vapor (PWV) of 10 cities in the Beijing-Tianjin-Hebei metropolitan area during the period of 1–30 September 2021/2022 were used as the training set, and the PM2.5 data of 1–7 October 2021/2022 were used for validation. The experimental results show that the CNN-LSTM model optimizes the average root mean square error (RMSE) by 25.52% and 14.30%, the average mean absolute error (MAE) by 26.23% and 15.01%, and the average mean absolute percentage error (MAPE) by 35.64% and 16.98%, as compared to the widely used Back Propagation Neural Network (BPNN) and Long Short-Term Memory (LSTM) models. In summary, the CNN-LSTM model is superior in terms of applicability and has the highest prediction accuracy in the Beijing-Tianjin-Hebei metropolitan area. The results of this study can provide a reference for the relevant departments in the Beijing-Tianjin-Hebei metropolitan area to predict PM2.5 concentration and its trend in specific time periods.