A model with high accuracy and strong generalization performance is conducive to preventing serious pollution incidents and improving the decision-making ability of urban planning. This paper proposes a new neural network structure based on seasonal–trend decomposition using locally weighted scatterplot smoothing (Loess) (STL) and a dependency matrix attention mechanism (DMAttention) based on cosine similarity to predict the concentration of air pollutants. This method uses STL for series decomposition, temporal convolution, a bidirectional long short-term memory network (TCN-BiLSTM) for feature learning of the decomposed series, and DMAttention for interdependent moment feature emphasizing. In this paper, the long short-term memory network (LSTM) and the gated recurrent unit network (GRU) are set as the baseline models to design experiments. At the same time, to test the generalization performance of the model, short-term forecasts in hours were performed using PM2.5, PM10, SO2, NO2, CO, and O3 data. The experimental results show that the model proposed in this paper is superior to the comparison model in terms of root mean square error (RMSE) and mean absolute percentage error (MAPE). The MAPE values of the 6 kinds of pollutants are 6.800%, 10.492%, 9.900%, 6.299%, 4.178%, and 7.304%, respectively. Compared with the baseline LSTM and GRU models, the average reduction is 49.111% and 43.212%, respectively.