Overflow and flooding can hardly be avoided in urban drainage system (UDS). Real-time control (RTC) has been proved effective to enhance the use of the existing systems for overflow and flooding mitigation (García et al., 2015;Kerkez et al., 2016;Schütze et al., 2002). Recently, a new RTC method based on reinforcement learning (RL) has been developed for flooding mitigation (Mullapudi et al., 2020;Saliba et al., 2020), achieving a milestone in the direction of smart water management. Despite the successful application of RL in UDS real-time control in modeling exercise, the effectiveness of different RLs in the context of UDS remains unclear. Meanwhile, the risk of handing over the control process to a RL agent is still unavoidable because of two reasons.First, the consequence of implementing a control strategy given by RL agent is unknown in practical application. While the value-based RLs use neural network to score their consequence (Mnih et al., 2015;Sutton & Barto, 2018), their neural network are black-box models which are insufficient to allow for a reliable evaluation. The methods of safe learning improve the reliability of RLs by reducing uncertainty, enhancing RL algorithm, and avoiding damage (Garcia & Fernández, 2015;Martin & Lope, 2009;Pal et al., 2019). But few of them forecast the consequence, and none of their methods is used in the RTC of UDS. Second, the nonlinearity of neural network causes fluctuation in the output of RL agent when facing some different inputs, and thus influence the performance of RL agent during controlling. This phenomenon also happens in other applications of neural network (Liu et al., 2021) and poses a certain risk in practical application. Therefore, forecasting the consequence of implementing RLs control strategies and reducing the influence of the fluctuation in the output of RL agents are necessary to improve the reliability of RLs.