2021
DOI: 10.1109/tac.2020.3037046
|View full text |Cite
|
Sign up to set email alerts
|

Learning Optimal Controllers for Linear Systems With Multiplicative Noise via Policy Gradient

Abstract: The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for reinforcement learning-based control of complex dynamical systems with continuous state and action spaces. In contrast with nearly all recent work in this area, we consider multiplicative noise models, which are increasingly relevant because they explicitly incorporate inherent uncertainty and variation in the system dynamics and thereby improve robustness properties of the controller. Robustness is a critical … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
48
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 70 publications
(58 citation statements)
references
References 42 publications
0
48
0
Order By: Relevance
“…The LQR problem with unknown system matrices (without the actuator or selection selection component) has been widely studied recently as a benchmark for reinforcement learning (e.g., [1,11,14,33,17,26]). The setting in this direction closest to ours is the so called model based learning, where the algorithms estimate the system matrices using the system trajectories and design the control policy based on the estimated system matrices.…”
Section: Related Workmentioning
confidence: 99%
“…The LQR problem with unknown system matrices (without the actuator or selection selection component) has been widely studied recently as a benchmark for reinforcement learning (e.g., [1,11,14,33,17,26]). The setting in this direction closest to ours is the so called model based learning, where the algorithms estimate the system matrices using the system trajectories and design the control policy based on the estimated system matrices.…”
Section: Related Workmentioning
confidence: 99%
“…There has been some work on the case of noisy dynamics, but all in the setting of infinite horizon. In [26] the problem with a multiplicative noise was discussed, using a relatively straightforward extension of the deterministic dynamics considered in the Downloaded 10/18/21 to 88.109.64.71 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms original framework. In the case of additive noise, [32] studies the global convergence of policy gradient and other learning algorithms for the LQR over an infinite time horizon and with Gaussian noise.…”
Section: Related Workmentioning
confidence: 99%
“…One of the most studied problems is the centralized LQR problem. For this problem, two broad classes of methods have been studied, i.e., model based learning [1,29,12], and model-free learning [16,40,28,20]. In the model-based learning approach, a system model is first estimated from observed system trajectories using some system identification method.…”
Section: Related Workmentioning
confidence: 99%