The prediction of the bus passenger flow is crucial for efficient resource allocation, frequency setting, and route optimization in bus transit systems. However, it remains challenging for a single model to simultaneously capture the time-series data of the bus passenger flow with periodicity, correlation, and nonlinearity. Aiming at the complex volatility possessed by the time-series data of the bus passenger flow, a new hybrid-strategy bus-passenger-flow prediction model based on wavelet packet decomposition, an attention mechanism, and bidirectional long–short-term memory is proposed to improve the accuracy of bus-passenger-flow prediction. The differences between this study and the existing studies are as follows: Firstly, this model combines decomposition strategies and deep learning. Wavelet packet decomposition can decompose the original data into a series of smoother data components, allowing the model to be more adequate in capturing the temporal characteristics of passenger-flow data. And the model can consider the information after the predicted moment via backward computation. In addition, the model is equipped with the ability to focus on important features by incorporating an attention mechanism to minimize the interference of irrelevant information. Bus-passenger-flow prediction experiments are conducted using the Harbin bus-passenger-flow dataset as an example. The experimental results show that the model proposed in this paper can obtain more accurate bus-passenger-flow prediction results than the five baseline models can obtain.