Accurate traffic data forecasting is essential to improve the efficiency of intelligent transportation systems. Existing traffic prediction models only model spatial dependency based on the connectivity of roads, which overlooks the characteristic information of hidden spatial dependency and leads to a loss of prediction accuracy. In addition, there exists a strict relative positional relationship in the temporal dependency between traffic data, which is often overlooked by existing models, making it difficult to accurately model the temporal dependency. To solve these problems, this paper proposes a traffic data prediction method (MSS-STT) based on Multi-Spatial Scale Spatio-Temporal Transformer. MSS-STT first employs multiple specialized spatial Transformer networks to model different spatial scales in order to capture spatial dependencies and patterns at various levels. It also utilizes graph convolutional neural networks to extract static spatial structural features. Then, a gating mechanism is used to fuse the spatial dependencies from different spatial scales and the static spatial features. Finally, MSS-STT extracts different temporal dependencies by considering the order of time points and the varying contributions to the prediction from different relative positions between time points in historical traffic data. Experiments on three real-world datasets from the Caltrans Performance Measurement System (PeMS) show that the proposed MSS-STT model outperforms the state-of-the-art methods.