2020
DOI: 10.1177/0142331220940208
|View full text |Cite
|
Sign up to set email alerts
|

Robust control for Markov jump linear systems with unknown transition probabilities – an online temporal differences approach

Abstract: In this paper, an online temporal differences (TD) learning approach is proposed to solve the robust control problem for discrete-time Markov jump linear systems (MJLS) subject to completely unknown transition probabilities (TP). The TD learning algorithm consists of two parts: policy evaluation and policy improvement. In the first part, by observing the mode jumping trajectories instead of solving a set of coupled algebraic Riccati equations, value functions are updated and approximate the TP related matrices… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
5
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 44 publications
0
5
0
Order By: Relevance
“…Then, it should be proved that the online form Yi(t) and the offline form Y˜i(t) converge to the same value. In this regard, Y˜i(t) is defined as follows: trueY˜ifalse(t+1false)=trueY˜ifalse(tfalse)+truen=0N(t)1.3emγifalse(tfalse)eifalse(t,nfalse)trued˜false(t,nfalse), trued˜false(t,nfalse)=Nrfalse(t,n+1false)+1ηrfalse(t,n+1false)Srfalse(t,n+1false)normalTnormalΛfalse(trueY˜false(tfalse)false)Srfalse(t,n+1false)prefix−trueY˜rfalse(t,nfalse)false(tfalse). It can be proved that the conditions (a), (b), and (c) of Lemma 2 in Reference 41 are satisfied, which completes the first step of the proof. Then the offline TD value function Y˜i(...…”
Section: Resultsmentioning
confidence: 93%
See 3 more Smart Citations
“…Then, it should be proved that the online form Yi(t) and the offline form Y˜i(t) converge to the same value. In this regard, Y˜i(t) is defined as follows: trueY˜ifalse(t+1false)=trueY˜ifalse(tfalse)+truen=0N(t)1.3emγifalse(tfalse)eifalse(t,nfalse)trued˜false(t,nfalse), trued˜false(t,nfalse)=Nrfalse(t,n+1false)+1ηrfalse(t,n+1false)Srfalse(t,n+1false)normalTnormalΛfalse(trueY˜false(tfalse)false)Srfalse(t,n+1false)prefix−trueY˜rfalse(t,nfalse)false(tfalse). It can be proved that the conditions (a), (b), and (c) of Lemma 2 in Reference 41 are satisfied, which completes the first step of the proof. Then the offline TD value function Y˜i(...…”
Section: Resultsmentioning
confidence: 93%
“…Remark The proposed algorithm has some differences from the previous TD algorithm for MJLS 41 . These differences are as the following: In the proposed algorithm, TPs of EMC are unknown, which is different from the case with unavailable one‐step TPs.…”
Section: Resultsmentioning
confidence: 95%
See 2 more Smart Citations
“…In Cheng et al (2017), the output feedback control problem for nonhomogeneous Markov jump system has been investigated. Chen et al (2020) consider passive control for nonhomogeneous Markov jump systems with random communication delays.…”
Section: Introductionmentioning
confidence: 99%