2020
DOI: 10.1002/oca.2597
|View full text |Cite
|
Sign up to set email alerts
|

Optimal tracking control for non‐zero‐sum games of linear discrete‐time systems via off‐policy reinforcement learning

Abstract: SummaryIn this article, a model‐free off‐policy reinforcement learning algorithm is applied to address the optimal tracking problem based on multiplayer non‐zero‐sum games for discrete‐time linear systems. In contrast to the traditional method and the policy iteration method for solving the optimal tracking problems, the proposed algorithm operates with the system data rather than the knowledge of the system dynamics. For performing the proposed algorithm, an auxiliary augmented system is constructed via assem… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 36 publications
0
10
0
Order By: Relevance
“…The initial setting of PPD is given as θ(1) = 1. All parameters in the learning law (19) and ( 20 (20). 8 For the case of unknown system dynamics, those parameters m and b are designed by trial-and-error.…”
Section: Model Free Adaptive Ftcmentioning
confidence: 99%
See 2 more Smart Citations
“…The initial setting of PPD is given as θ(1) = 1. All parameters in the learning law (19) and ( 20 (20). 8 For the case of unknown system dynamics, those parameters m and b are designed by trial-and-error.…”
Section: Model Free Adaptive Ftcmentioning
confidence: 99%
“…For the case of unknown system dynamics, controllers based on reinforcement learning (RL) have been developed with the structure action‐critic NNs 18,19 . The nearly optimal solution can be obtained under the long‐term performance of Bellman equation 20 . Because of the future value in the Bellman equation, the critic NNs have been tuned by temporal difference (TD) which can cause the learning performance for the TD error for estimating the solution of Hamilton–Jacobi–Bellman (HJB) 21,22 .…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the removal rate of ionic charge species in soil can be influenced by electrical conductivity, mineral composition, porosity, and tortuosity as well as the physiochemical properties of soil [10]. The significant drawbacks of EK are the electrode corrosion, dehydration, thermal effects, and focusing effects (which occur in soil, where acidic (H + ions) and alkaline (OHˉ ions) fronts collide with each other and cause the accumulation of metal ions in the soil and restrict their emissions) [11]. In particular, the focusing effects are the main thing that makes it hard to remove heavy metals over a long period of time [12].…”
Section: Introductionmentioning
confidence: 99%
“…As we all know, RL methods have been widely used to solve the optimal control of NZS games, but few results considered solving the optimal tracking control problem of NZS games. A model‐free off‐policy RL algorithm is proposed to solve the optimal tracking problem of discrete‐time linear NZS games in Reference 51. Using RL methods, the output regulation linear optimal tracking control NZS game problem in discrete‐time and continuous‐time are studied in References 52 and 53, respectively.…”
Section: Introductionmentioning
confidence: 99%