Accurate understanding of passenger flow distribution is crucial for effective station crowd management. However, due to the complexity and randomness of passenger flow and the unclear spatial-temporal correlation between functional areas within the station, predicting the spatiotemporal distribution dynamics of inflow and future short-term distribution trends is challenging. Emerging deep learning models offer valuable insights for accurately predicting passenger flow distribution. Thus, we propose a deep learning architecture, named “ST-Bi-LSTM,” which combines a bidirectional long short-term memory network with a spatial-temporal attention mechanism. Initially, we outline the methodologies of Bi-LSTM, the DeepWalk-based spatial attention mechanism, and the temporal attention mechanism. The spatial attention mechanism is employed to extract station spatial network topology information and enhance the representation of passenger flow characteristics in highly correlated areas during the forecasting process. Simultaneously, the temporal attention Bi-LSTM is utilized for capturing temporal correlations. The architecture comprises four branches dedicated to station real-time video monitoring data, spatial network topology, function area attributes, and train timetables. Subsequently, leveraging in-station CCTV data, passenger travel behavior data, and train timetables, we apply the architecture to the Tianjin West High-Speed Railway Station. We conduct a comparative analysis of the prediction performance and time complexity of the proposed architecture against existing baseline models, demonstrating superior performance and robustness exhibited by the ST-Bi-LSTM model (achieving a reduction in RMSE of over 10%). This study facilitates the transition of station management from passive response to active prediction of station passenger flow dynamics.