In recent years, the information requirements of pumping stations have become higher and higher. The prediction of overflow capacity can provide important reference for flood carrying capacity, water resource scheduling and water safety. In order to improve the accuracy, stability and generalization ability of the model, a BiGRU–ARIMA data-driven method based on self-attention mechanism is proposed to predict the flow capacity of the pump station. Bidirectional gated recurrent unit (BiGRU), a variant of cyclic neural network (RNN), can not only deal with nonlinear components well, but also deal with the problem of insufficient dependence over long distances and has a simple structure. Autoregressive integrated moving average (ARIMA) has the advantage of being sensitive to linear components. Firstly, the characteristics of the pre-processed pump station data are selected and screened through Pearson correlation coefficient and a self-attention mechanism. Then, a bi-directional gated recurrent unit (BiGRU) is used to process the nonlinear components of the data, and a dropout layer is added to avoid overfitting phenomena. We extract the linear features of the obtained error terms using the ARIMA model and use them as correction items to correct the prediction results of the BiGRU model. Finally, we obtain the prediction results of the overflow and water level. The variation characteristics of overdischarge are analyzed by the relation of flow and water level. In this paper, the actual production data of a Grade 9 pumping station of Miyun Reservoir is taken as an example to verify the validity of the model. Model performance is evaluated according to mean absolute error (MAE), mean absolute percentage error (MAPE) and linear regression correlation coefficient (R2). The experimental results show that, compared with the single ARIMAX, BiGRU model and BP neural network, the SA–BiGRU–ARIMA hybrid prediction model has a better prediction effect than other data-driven models.