Robust control for Markov jump linear systems with unknown transition probabilities – an online temporal differences approach

Chen, Yaogang; Wen, Jiwei; Luan, Xiaoli; Liu, Fei

doi:10.1177/0142331220940208

Cited by 5 publications

(5 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then, it should be proved that the online form

Y_{i} (t)

and the offline form

{\tilde{Y}}_{i} (t)

converge to the same value. In this regard,

{\tilde{Y}}_{i} (t)

is defined as follows:

\begin{align} {\overset{Y}{true}}_{i} false(t + 1 false) & = {\overset{Y}{true}}_{i} false(t false) + true \sum_{n = 0}^{N (t) - 1} γ_{i} false(t false) e_{i} false(t, n false) \overset{d}{true} false(t, n false), \end{align}

\begin{align} \overset{d}{true} false(t, n false) & = N_{r false(t, n + 1 false)} + \frac{1}{η_{r false(t, n + 1 false)}} S_{r false(t, n + 1 false)}^{normalT} normalΛ false(\overset{Y}{true} false(t false) false) S_{r false(t, n + 1 false)} prefix- {\overset{Y}{true}}_{r false(t, n false)} false(t false) . \end{align}

It can be proved that the conditions (a), (b), and (c) of Lemma 2 in Reference 41 are satisfied, which completes the first step of the proof. Then the offline TD value function

{\tilde{Y}}_{i} (...

…”

Section: Resultsmentioning

confidence: 93%

“…Remark The proposed algorithm has some differences from the previous TD algorithm for MJLS 41 . These differences are as the following: In the proposed algorithm, TPs of EMC are unknown, which is different from the case with unavailable one‐step TPs.…”

Section: Resultsmentioning

confidence: 95%

“…Proof Based on Lemma 2 and the proof of Theorem 1 in Reference 41, the proof that

Y_{i} (t)

converges to

\sum_{j \in 𝕄} θ_{i j} P_{j}, \forall i \in 𝕄

when

t \to \infty

consists of two parts: First, an offline TD value function

{\tilde{Y}}_{i} (t)

is defined, which is updated every time after observing a complete mode trajectory and converges to

\sum_{j \in 𝕄} θ_{i j} P_{j}

. Then, it should be proved that the online form

Y_{i} (t)

and the offline form

{\tilde{Y}}_{i} (t)

converge to the same value.…”

Section: Resultsmentioning

confidence: 99%

“…In order to solve the nonlinear control problems with uncertain or incomplete system dynamics, some recent new results based on neural network have been reported, such as adaptive optimal control 37‐39 and optimal tracking control 40 . Moreover, it is found that an online TD(

λ

) algorithm can be developed to effectively estimate the solutions of coupled algebraic Riccati equations (CAREs) for the robust control of MJLSs with no need to TPs information 41 …”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

optimal control for semi‐Markov jump linear systems via TP‐free temporal difference () learning

Chen

Wen

Luan

et al. 2021

Intl J Robust & Nonlinear

Self Cite

View full text Add to dashboard Cite

In the present study, a temporal difference (TD) learning algorithm is proposed to solve the H∞ optimal control problem for semi‐Markov jump linear systems (S‐MJLSs). The proposed scheme is TP‐free so that it can be applied in cases without pre‐known transition probabilities of embedded Markov chain. Coupled algebraic Riccati equations (CAREs) implied with the analytical solution of control gains are derived by utilizing a S‐MJLS augmented with maximum sojourn time, which contributes to develop the TD learning algorithm. It is proved that for sufficiently rich enough jumping modes and jumping numbers observed online, the value function in TD algorithm converges to CAREs solutions. Finally, an example is carried out to evaluate the learning capability of TD algorithm and the effectiveness of the proposed control method.

show abstract

“…Then, it should be proved that the online form

Y_{i} (t)

and the offline form

{\tilde{Y}}_{i} (t)

converge to the same value. In this regard,

{\tilde{Y}}_{i} (t)

is defined as follows:

\begin{align} {\overset{Y}{true}}_{i} false(t + 1 false) & = {\overset{Y}{true}}_{i} false(t false) + true \sum_{n = 0}^{N (t) - 1} γ_{i} false(t false) e_{i} false(t, n false) \overset{d}{true} false(t, n false), \end{align}

\begin{align} \overset{d}{true} false(t, n false) & = N_{r false(t, n + 1 false)} + \frac{1}{η_{r false(t, n + 1 false)}} S_{r false(t, n + 1 false)}^{normalT} normalΛ false(\overset{Y}{true} false(t false) false) S_{r false(t, n + 1 false)} prefix- {\overset{Y}{true}}_{r false(t, n false)} false(t false) . \end{align}

It can be proved that the conditions (a), (b), and (c) of Lemma 2 in Reference 41 are satisfied, which completes the first step of the proof. Then the offline TD value function

{\tilde{Y}}_{i} (...

…”

Section: Resultsmentioning

confidence: 93%

Section: Resultsmentioning

confidence: 95%

“…Proof Based on Lemma 2 and the proof of Theorem 1 in Reference 41, the proof that

Y_{i} (t)

converges to

\sum_{j \in 𝕄} θ_{i j} P_{j}, \forall i \in 𝕄

when

t \to \infty

consists of two parts: First, an offline TD value function

{\tilde{Y}}_{i} (t)

is defined, which is updated every time after observing a complete mode trajectory and converges to

\sum_{j \in 𝕄} θ_{i j} P_{j}

. Then, it should be proved that the online form

Y_{i} (t)

and the offline form

{\tilde{Y}}_{i} (t)

converge to the same value.…”

Section: Resultsmentioning

confidence: 99%

λ

) algorithm can be developed to effectively estimate the solutions of coupled algebraic Riccati equations (CAREs) for the robust control of MJLSs with no need to TPs information 41 …”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

optimal control for semi‐Markov jump linear systems via TP‐free temporal difference () learning

Chen

Wen

Luan

et al. 2021

Intl J Robust & Nonlinear

Self Cite

View full text Add to dashboard Cite

show abstract

“…In Cheng et al (2017), the output feedback control problem for nonhomogeneous Markov jump system has been investigated. Chen et al (2020) consider passive control for nonhomogeneous Markov jump systems with random communication delays.…”

Section: Introductionmentioning

confidence: 99%

Event-based asynchronous dissipative control for nonhomogeneous Markov jump systems

Wang

Zhou

et al. 2023

Transactions of the Institute of Measurement and Control

View full text Add to dashboard Cite

In this paper, the problem of the asynchronous dissipative control is investigated for a class of discrete-time nonhomogeneous Markov jump systems. Hidden Markov model is introduced to represent the asynchronization between the designed controller and the system. On the contrary, the mode-dependent event-triggered mechanism is formulated to alleviate the burden of data transmission in the communication channel. The Lyapunov function, which is dependent on both mode and uncertain parameters, is considered to obtain the sufficient conditions that make the system with strictly [Formula: see text]-[Formula: see text]-dissipative performance. Moreover, gains of the mode-dependent controller and coefficients of the event-triggered mechanism can be co-designed simultaneously. Finally, illustrative simulation and practical examples are provided to demonstrate the effectiveness of the proposed results.

show abstract

Distributed filtering with time-varying topology: A temporal-difference learning approach in dual games

Xue,

Wen,

et al. 2025

Signal Processing

View full text Add to dashboard Cite

Robust control for Markov jump linear systems with unknown transition probabilities – an online temporal differences approach

Cited by 5 publications

References 44 publications

optimal control for semi‐Markov jump linear systems via TP‐free temporal difference () learning

optimal control for semi‐Markov jump linear systems via TP‐free temporal difference () learning

Event-based asynchronous dissipative control for nonhomogeneous Markov jump systems

Distributed filtering with time-varying topology: A temporal-difference learning approach in dual games

Contact Info

Product

Resources

About