Effective passenger flow management is critical for improving service quality and alleviating congestion in metro networks. However, the dynamic nature of travel demand and the complex structure of metro networks present significant challenges in building and solving control models. Additionally, the high computational costs of existing methods limit their practical applications. To address these challenges, this study proposes a new reinforcement learning (RL) based method for passenger flow control. The method has three components: the network state characterization, the control model, and the reinforcement learning model. Then, the study outlines the “action”, “state”, and “reward” concepts in RL based on the definition of decision variables, constraints, and objective functions in the constructed passenger flow control programming model. An iterative interaction mechanism is introduced to synchronize the control schemes generated by the reinforcement learning unit and the network states. Furthermore, effectively utilizing computational resources, the Asynchronous Advantage Actor-Critic Neural Network (A3C-NN) is trained to optimize the complex programming model. Finally, the proposed approach is validated through a case study using data from Chengdu Urban Rail Transit (URT), demonstrating its effectiveness in achieving various objectives, such as minimizing passenger waiting time, maximizing passenger turnover, and maximizing passenger numbers.