As an important reference for assessing plant water consumption and estimating plant transpiration, it is of great significance to achieve accurate prediction of plant sap flow. A number of deep learning models were established and compared using approximately 3 years of continuous eucalyptus flow time series data collected from the SAPFLUXNET open dataset and 6 environmental factors, including shortwave solar incident radiation, air temperature, air relative humidity, net radiation, vapor pressure deficit, and photosynthetic photon flux density. The experimental results show that the improved Transformer model, with the introduction of a two-step self-attention mechanism and simplified design, maintains significant predictive performance advantages compared to the original Transformer model, long short-term memory, gated recurrent unit, and temporal convolutional neural network models. In the shorter 1-h forecast, the mean squared error and coefficient of determination (R2) of the improved Transformer model are 0.0191 and 0.965, respectively. Compared to the suboptimal typical Transformer model, the MSE is reduced by 22.9%, and R2 is increased by 1.0%. Additionally, the improved model maintains stable predictive performance advantages in long-term plant flow prediction. In the longest 8-h advance prediction, the MSE is reduced by 14.9% compared to the suboptimal Transformer model, and R2 increases by 3.0% compared to the Transformer model. The comprehensive experimental results show that the improved Transformer model makes more effective use of environmental information to achieve more accurate and long-term plant flow prediction. This study emphasizes the basic principle and validity of the two-step self-attention network structure and provides a valuable basis for developing more effective methods for predicting plant sap flow.