2020
DOI: 10.1109/access.2020.2970760
|View full text |Cite
|
Sign up to set email alerts
|

H Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning

Abstract: This paper presents a novel off-policy game Q-learning algorithm to solve H ∞ control problem for discrete-time linear multi-player systems with completely unknown system dynamics. The primary contribution of this paper lies in that the Q-learning strategy employed in the proposed algorithm is implemented in an off-policy policy iteration approach other than on-policy learning, since the off-policy learning has some well-known advantages over the on-policy learning. All of players struggle together to minimize… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 47 publications
(103 reference statements)
0
5
0
Order By: Relevance
“…Compared (41) and (42) with (22) and (23) derived in this paper for solving output feedback H ∞ control of DT systems, one can easily find that solving (22) and (23) seems to be more difficult than (41) and (42), since no coupling relationship of control policy gains and disturbance policy gains is in (41) and (42), which is caused by the natural difference of DT systems from CT systems.…”
Section: B No Bias Analysis Of Solutionsmentioning
confidence: 99%
See 3 more Smart Citations
“…Compared (41) and (42) with (22) and (23) derived in this paper for solving output feedback H ∞ control of DT systems, one can easily find that solving (22) and (23) seems to be more difficult than (41) and (42), since no coupling relationship of control policy gains and disturbance policy gains is in (41) and (42), which is caused by the natural difference of DT systems from CT systems.…”
Section: B No Bias Analysis Of Solutionsmentioning
confidence: 99%
“…Theorem 1 is used to prove the uniqueness of the control policies and the disturbances in (23). Theorem 1: Given Assumption 2, then there is a unique matrixH that makes (22) hold, and the unique optimal control policies and the worst disturbances will be solved by (23). Moreover, (23) is the same as (16).…”
Section: B Output Feedback Control Designmentioning
confidence: 99%
See 2 more Smart Citations
“…In [35], a data-based policy iteration Q-learning algorithm for ZS-TP-G was developed for linear systems to eliminate process dynamics knowledge. Recent game-theoretical contributions (some in nonzero-sum games) for nonlinear systems are reported in [36]- [38], in the more general framework of robust control [39]- [40].…”
Section: Introductionmentioning
confidence: 99%