Traffic flow prediction is essential for smart city management and planning, aiding in optimizing traffic scheduling and improving overall traffic conditions. However, due to the correlation and heterogeneity of traffic data, effectively integrating the captured temporal and spatial features remains a significant challenge. This paper proposes a model spatial–temporal fusion gated transformer network (STFGTN), which is based on an attention mechanism that integrates temporal and spatial features. This paper proposes an attention mechanism-based model to address these issues and model complex spatial–temporal dependencies in road networks. The self-attention mechanism enables the model to achieve long-term dependency modeling and global representation of time series data. Regarding temporal features, we incorporate a time embedding layer and a time transformer to learn temporal dependencies. This capability contributes to a more comprehensive and accurate understanding of spatial–temporal dynamic patterns throughout the entire time series. As for spatial features, we utilize DGCN and spatial transformers to capture both global and local spatial dependencies, respectively. Additionally, we propose two fusion gate mechanisms to effectively accommodate to the complex correlation and heterogeneity of spatial–temporal information, resulting in a more accurate reflection of the actual traffic flow. Our experiments on three real-world datasets illustrate the superior performance of our approach.