2019 IEEE 58th Conference on Decision and Control (CDC) 2019
DOI: 10.1109/cdc40024.2019.9029916
|View full text |Cite
|
Sign up to set email alerts
|

From self-tuning regulators to reinforcement learning and back again

Abstract: Machine and reinforcement learning (RL) are being applied to plan and control the behavior of autonomous systems interacting with the physical world -examples include self-driving vehicles, distributed sensor networks, and agile robots. However, if machine learning is to be applied in these new settings, the resulting algorithms must come with the reliability, robustness, and safety guarantees that are hallmarks of the control theory literature, as failures could be catastrophic. Thus, as RL algorithms are inc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
70
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 71 publications
(70 citation statements)
references
References 85 publications
(122 reference statements)
0
70
0
Order By: Relevance
“…It is this difference that allows us to demonstrate the √ T -rate. Using this, we show that if the optimal policy to (1) gives degenerate information in a certain sense, then regret must be super-logarithmic. In this regime, one is forced to introduce supplementary excitation beyond the randomness already present in the algorithm.…”
Section: Discussionmentioning
confidence: 97%
See 2 more Smart Citations
“…It is this difference that allows us to demonstrate the √ T -rate. Using this, we show that if the optimal policy to (1) gives degenerate information in a certain sense, then regret must be super-logarithmic. In this regime, one is forced to introduce supplementary excitation beyond the randomness already present in the algorithm.…”
Section: Discussionmentioning
confidence: 97%
“…We write K = K(B, λ) ∈ R m×n for the optimal linear law and its Jacobian is G = ∇ B vec K(B, λ). The reference signal, r t , is assumed to be known in advance and we will make a standard persistence of excitation assumption, namely that t k=1 r k r k tcI + o(t) and that r t > c for some c, c > 0 and sufficiently large t. The noise w t ∈ R n is assumed to be mean zero, indepedent and identically distributed, and have density q(w), which admits Fisher information 1 . The control u t ∈ R m is constrained to depend on only past inputs and outputs and is in particular oblivious of the parameter B -it is adaptive.…”
Section: Problem Formulationmentioning
confidence: 99%
See 1 more Smart Citation
“…Finally, we provided computable data-dependent bounds that can be used in practical algorithms. In our companion paper [1], we show how these tools can be used to design and analyze self-tuning and adaptive control methods with finite-data guarantees. Although we focused on the full information setting, we note that many of the techniques described extend naturally to the partially observed setting [13], [14], [15].…”
Section: Discussionmentioning
confidence: 99%
“…This is in some sense the simplest possible system identification problem, making it the perfect case study for such a tutorial. Our companion paper [1] shows how the results derived in this paper can then be integrated into self-tuning and adaptive control policies with finite-data guarantees. We also refer the reader to Section II of [1] for an in-depth and comprehensive literature review of classical and contemporary results in system identification.…”
Section: Introductionmentioning
confidence: 99%