The accurate prediction of PM2.5 concentration, a matter of paramount importance in environmental science and public health, has remained a substantial challenge. Conventional methodologies for predicting PM2.5 concentration often grapple with capturing complex dynamics and nonlinear relationships inherent in multi-station meteorological data. To address this issue, we have devised a novel deep learning model, named the Meteorological Sparse Autoencoding Transformer (MSAFormer). The MSAFormer leverages the strengths of the Transformer architecture, effectively incorporating a Meteorological Sparse Autoencoding module, a Meteorological Positional Embedding Module, and a PM2.5 Prediction Transformer Module. The Sparse Autoencoding Module serves to extract salient features from high-dimensional, multi-station meteorological data. Subsequently, the Positional Embedding Module applies a one-dimensional Convolutional Neural Network to flatten the sparse-encoded features, facilitating data processing in the subsequent Transformer module. Finally, the PM2.5 Prediction Transformer Module utilizes a self-attention mechanism to handle temporal dependencies in the input data, predicting future PM2.5 concentrations. Experimental results underscore that the MSAFormer model achieves a significant improvement in predicting PM2.5 concentrations in the Haidian district compared to traditional methods. This research offers a novel predictive tool for the field of environmental science and illustrates the potential of deep learning in the analysis of environmental meteorological data.