Flood disasters occur worldwide, and flood risk prediction is conducive to protecting human life and property safety. Influenced by topographic changes and rainfall, the water level fluctuates randomly and violently during the flood, introducing many noises and directly increasing the difficulty of flood prediction. A data-driven flood forecasting method is proposed based on data preprocessing and a two-layer BiLSTM-Attention network to improve forecast accuracy. First, the Variational Mode Decomposition (VMD) is used to decompose the data for reducing noise and produce suitable Intrinsic Mode Functions (IMFs); Then, an optimized two-layer attention-based Bidirectional Long Sshort-Term memory (BiLSTM-Attention) network is constructed to predict each IMF. Finally, two optimization algorithms are used to obtain the optimized parameters of VMD and BiLSTM intelligently, increasing the self-adaptability. The inertia factor of particle swarm optimization is improved and then used to optimize the five hyperparameters of BiLSTM. The proposed model reduces storage errors for smaller training sets and can achieve good performance. Three water level data sets from the Yangtze River in China are used for comparative experiments. Numerical results show that the peak height absolute error is within 2 cm, and the relative error of peak time arrival is within 30%. Compared with LSTM, BiLSTM, CNN-BiLSTM-attention, etc., the proposed model reduces the root mean square error by at least 50% and has advantages for high-risk forecasting when the water level exceeds the defense line and fluctuates prominently.