In this paper, an off-policy game Q-learning algorithm is proposed for solving linear discretetime non-zero sum multi-player game problems. Unlike the existing Q-learning methods for solving the Riccati equation by on-policy learning approaches for multi-player games, an off-policy game Q-learning method is developed for achieving the Nash equilibrium of multiple players. To this end, first, a non-zero sum game problem is formulated, and the value function and the Q-function defined according to eachplayer individual performance index are rigorously proved to be linear quadratic forms. Then, based on the dynamic programming and Q-learning methods, an off-policy game Q-learning algorithm is developed to find the control policies for multi-player games, such that the Nash equilibrium is reached under the learned control policies. The merit of this paper lies in that the proposed algorithm does not require the system model parameters to be known a priori and fully utilizes measurable data to learn the Nash equilibrium solution. Moreover, there is no bias of Nash equilibrium solution when implementing the proposed off-policy game Q-learning algorithm even though probing noises are added to control policies for maintaining the persistent excitation condition. While bias of the Nash equilibrium solution could be produced if on-policy game Q-learning is employed. This is another contribution of this paper. INDEX TERMS Adaptive dynamic programming, off-policy Q-learning, non-zero sum game, Nash equilibrium, discrete-time systems.
In this article, a novel off-policy cooperative game Q-learning algorithm is proposed for achieving optimal tracking control of linear discrete-time multiplayer systems suffering from exogenous dynamic disturbance. The key strategy, for the first time, is to integrate reinforcement learning, cooperative games with output regulation under the discrete-time sampling framework for achieving data-driven optimal tracking control and disturbance rejection. Without the information of state and input matrices of multiplayer systems, as well as the dynamics of exogenous disturbance and command generator, the coordination equilibrium solution and the steady-state control laws are learned using data by a novel off-policy Q-learning approach, such that multiplayer systems have the capability of tolerating disturbance and follow the reference signal via the optimal approach. Moreover, the rigorous theoretical proofs of unbiasedness of coordination equilibrium solution and convergence of the proposed algorithm are presented. Simulation results are given to show the efficacy of the developed approach.
This paper presents a novel off-policy game Q-learning algorithm to solve H ∞ control problem for discrete-time linear multi-player systems with completely unknown system dynamics. The primary contribution of this paper lies in that the Q-learning strategy employed in the proposed algorithm is implemented in an off-policy policy iteration approach other than on-policy learning, since the off-policy learning has some well-known advantages over the on-policy learning. All of players struggle together to minimize their common performance index meanwhile defeating the disturbance that tries to maximize the specific performance index, and finally they reach the Nash equilibrium of game resulting in satisfying disturbance attenuation condition. For finding the solution of the Nash equilibrium, H ∞ control problem is first transformed into an optimal control problem. Then an off-policy Q-learning algorithm is put forward in the typical adaptive dynamic programming (ADP) and game architecture, such that control policies of all players can be learned using only measured data. More importantly, the rigorous proof of no bias of solution to the Nash equilibrium by using the proposed off-policy game Q-learning algorithm is presented. Comparative simulation results are provided to verify the effectiveness and demonstrate the advantages of the proposed method. INDEX TERMS H ∞ control, off-policy Q-learning, game theory, Nash equilibrium.
In this paper, a data-driven optimal control method based on adaptive dynamic programming and game theory is presented for solving the output feedback solutions of the H ∞ control problem for linear discrete-time systems with multiple players subject to multi-source disturbances. We first transform the H ∞ control problem into a multi-player game problem following the theoretical solutions according to game theory. Since the system state may not be measurable, we derive the output feedback based control policies and disturbances through mathematical operations. Considering the advantages of offpolicy reinforcement learning (RL) over on-policy RL, a novel off-policy game Q-learning algorithm dealing with mixed competition and cooperation among players is developed, such that the H ∞ control problem can be finally solved for linear multi-player systems without the knowledge of system dynamics. Moreover, rigorous proofs of algorithm convergence and unbiasedness of solutions are presented. Finally, simulation results demonstrated the effectiveness of the proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.