Short-term passenger flow prediction (STPFP) helps ease traffic congestion and optimize the allocation of rail transit resources. However, the nonlinear and nonstationary nature of passenger flow time series challenges STPFP. To address this issue, a hybrid model based on time series decomposition and reinforcement learning ensemble strategies is proposed. Firstly, the improved arithmetic optimization algorithm is constructed by adding sine chaotic mapping, a new dynamic boundary strategy, and adaptive T distribution mutations for optimizing variational mode decomposition (VMD) parameters. Then, the original passenger flow data containing nonlinear and nonstationary irregular changes of noise is decomposed into several intrinsic mode functions (IMFs) by using the optimized VMD technology, which reduces the time-varying complexity of passenger flow time series and improves predictability. Meanwhile, the IMFs are divided into different frequency series by fluctuation-based dispersion entropy, and diverse models are utilized to predict different frequency series. Finally, to avoid the cumulative error caused by the direct superposition of each IMF’s prediction result, reinforcement learning is adopted to ensemble the multiple models to acquire the multistep passenger flow prediction result. Experiments on four subway station passenger flow datasets proved that the prediction performance of the proposed method was better than all benchmark models. The excellent prediction effect of the proposed model has important guiding significance for evaluating the operation status of urban rail transit systems and improving the level of passenger service.