As the urgency of PM2.5 prediction becomes increasingly ingrained in public awareness, deep-learning methods have been widely used in forecasting concentration trends of PM2.5 and other atmospheric pollutants. Traditional time-series forecasting models, like long short-term memory (LSTM) and temporal convolutional network (TCN), were found to be efficient in atmospheric pollutant estimation, but either the model accuracy was not high enough or the models encountered certain challenges due to their own structure or some specific application scenarios. This study proposed a high-accuracy, hourly PM2.5 forecasting model, poly-dimensional local-LSTM Transformer, namely PD-LL-Transformer, by deep-learning methods, based on air pollutant data and meteorological data, and aerosol optical depth (AOD) data retrieved from the Himawari-8 satellite. This research was based on the Yangtze River Delta Urban Agglomeration (YRDUA), China for 2020–2022. The PD-LL-Transformer had three parts: a poly-dimensional embedding layer, which integrated the advantages of allocating and embedding multi-variate features in a more refined manner and combined the superiority of different temporal processing methods; a local-LSTM block, which combined the advantages of LSTM and TCN; and a Transformer encoder block. Over the test set (the whole year of 2022), the model’s R2 was 0.8929, mean absolute error (MAE) was 4.4523 µg/m3, and root mean squared error (RMSE) was 7.2683 µg/m3, showing great accuracy for PM2.5 prediction. The model surpassed other existing models upon the same tasks and similar datasets, with the help of which a PM2.5 forecasting tool with better performance and applicability could be established.