Accurate and fine-grained prediction of PM2.5 concentration is of great significance for air quality control and human physical and mental health. Traditional approaches, such as time series, recurrent neural networks (RNNs) or graph convolutional networks (GCNs), cannot effectively integrate spatial–temporal and meteorological factors and manage dynamic edge relationships among scattered monitoring stations. In this paper, a spatial–temporal causal convolution network framework, ST-CCN-PM2.5, is proposed. Both the spatial effects of multi-source air pollutants and meteorological factors are considered via spatial attention mechanism. Time-dependent features in causal convolution networks are extracted by stacked dilated convolution and time attention. All the hyper-parameters in ST-CCN-PM2.5 are tuned by Bayesian optimization. Haikou air monitoring station data are employed with a series of baselines (AR, MA, ARMA, ANN, SVR, GRU, LSTM and ST-GCN). Final results include the following points: (1) For a single station, the RMSE, MAE and R2 values of ST-CCN-PM2.5 decreased by 27.05%, 10.38% and 3.56% on average, respectively. (2) For all stations, ST-CCN-PM2.5 achieve the best performance in win–tie–loss experiments. The numbers of winning stations are 68, 63, and 64 out of 95 stations in RMSE (MSE), MAE, and R2, respectively. In addition, the mean MSE, RMSE and MAE of ST-CCN-PM2.5 are 4.94, 2.17 and 1.31, respectively, and the R2 value is 0.92. (3) Shapley analysis shows wind speed is the most influencing factor in fine-grained PM2.5 concentration prediction. The effects of CO and temperature on PM2.5 prediction are moderately significant. Friedman test under different resampling further confirms the advantage of ST-CCN-PM2.5. The ST-CCN-PM2.5 provides a promising direction for fine-grained PM2.5 prediction.