“…Plenty of control approaches have been developed to solve similar problems. Among those, fuzzy logic [7], [13], and deterministic rule-based methods [14], [15], are widely used in online control due to their low cost on development, [16], [17], Pontryagin's minimum strategy (PMP) [18], [19], and equivalent consumption minimization strategy (ECMS) [20], [21], are typical methods of real-time optimal control and can obtain a performance close to the optimum; however, MPC highly relies on the prediction accuracy while PMP and ECMS require a large effort of "tuning and calibration" to refine co-state or equivalence factor. As an emerging method recently, reinforcement learning [22], [23], does not require accurate models of the dynamic process and the environment.…”