2022
DOI: 10.1002/rnc.6191
|View full text |Cite
|
Sign up to set email alerts
|

Modified general policy iteration based adaptive dynamic programming for unknown discrete‐time linear systems

Abstract: In this article, the general policy iteration (GPI) method for the optimal control of discrete-time linear systems is studied. First, the existing result on the GPI method is recalled and some new properties are proposed. Based on these new properties, a model-based modified GPI algorithm is proposed with its convergence proof. Moreover, the data-driven implementation for the proposed method is introduced without using the information of the system matrices. Compared with the existing results, the condition to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 39 publications
0
4
0
Order By: Relevance
“…Meanwhile, the iterative methods of ADP include value iteration (VI) [16][17][18][19] and policy iteration (PI). [20][21][22] Ha et al 23 elaborated a new cost function to develop a VI-based ADP framework to solve the tracking control problem for unknown systems. In Reference 24, a data-driven iterative ADP was proposed to address the nonlinear optimal control problem.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Meanwhile, the iterative methods of ADP include value iteration (VI) [16][17][18][19] and policy iteration (PI). [20][21][22] Ha et al 23 elaborated a new cost function to develop a VI-based ADP framework to solve the tracking control problem for unknown systems. In Reference 24, a data-driven iterative ADP was proposed to address the nonlinear optimal control problem.…”
Section: Introductionmentioning
confidence: 99%
“…Due to the widespread application of ADP, we can observe its extensions into areas such as NZSG, event‐triggered mechanism (ETM), and trajectory tracking. Meanwhile, the iterative methods of ADP include value iteration (VI) 16–19 and policy iteration (PI) 20–22 . Ha et al 23 elaborated a new cost function to develop a VI‐based ADP framework to solve the tracking control problem for unknown systems.…”
Section: Introductionmentioning
confidence: 99%
“…Policy iteration (PI) and value iteration (VI) are common algorithms in RL, both of which include two processes: policy evaluation and policy improvement. Although most control algorithms are based on PI, [12][13][14][15] some VI algorithms 16,17 have also been developed. In the control environment, RL algorithms have been applied in the optimal control or tracking control 18 of single-agent systems as well as the optimal coordination control 19,20 of multi-agent systems.…”
Section: Introductionmentioning
confidence: 99%
“…More interesting works on ADP technique are refered to the works of Han et al, Wang et al and Zhang et al [9][10][11] As is known to all, the objective of optimal control is to design an optimal control law that minimizes a predefined performance index. 12 There are various forms of performance indexes, such as the quadratic performance index, [13][14][15][16][17] the average performance index, 18,19 and the discounted performance index. [20][21][22][23] Among them, the discounted performance index is usually employed by researchers for optimal tracking control (OTC) problems, 24 in which the system output or state are required to track a reference trajectory with optimal control performance.…”
mentioning
confidence: 99%