The accurate prediction of total phosphorus (TP) is crucial for the early detection of water quality eutrophication. However, predicting TP concentrations among canal sites is challenging due to their complex spatiotemporal dependencies. To address this issue, this study proposes a GAT-Informer prediction method based on spatiotemporal correlations to predict TP concentrations in the Beijing–Hangzhou Grand Canal Basin in Changzhou City. The method begins by creating feature sequences for each site based on the time lag relationship of total phosphorus concentration between sites. It then constructs spatiotemporal graph data by combining the real river distance between sites and the correlation of feature sequences. Next, spatial features are extracted by fusing node features using the graph attention (GAT) module. The study employs the Informer network, which uses a sparse attention mechanism to extract temporal features efficiently for simulating and predicting total phosphorus data of the sites. The model was evaluated using R2, MAE, and RMSE, with the experimental results yielding values of 0.9619, 0.1489%, and 0.1999%, respectively. The GAT-Informer model exhibits enhanced robustness and superior predictive accuracy in comparison to traditional water quality prediction models.