“…RL based on policy iteration (PI) technology is an effective method to deal with the optimization problems, and PI technology is implemented by alternating actions of policy evaluation and policy improvement with critic–actor architecture, where two kinds of neural networks (NNs) as the critic NN and the actor NN are used to approximate the optimal cost function and optimal control policy, respectively [
14, 15]. As an improvement of RL algorithm, integral RL (IRL) has been investigated in [
4, 16, 17]. In [
16], a novel PI algorithm with critic‐actor architecture considered as IRL is proposed to solve the optimal control problem by introducing integral Bellman equation that the knowledge of system internal dynamic is no longer required, and the integral term in policy evaluation step can be addressed as the reinforcement signal over the time interval
.…”