2021
DOI: 10.48550/arxiv.2108.02307
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Regret Analysis of Learning-Based MPC with Partially-Unknown Cost Function

Abstract: The exploration/exploitation trade-off is an inherent challenge in data-driven and adaptive control. Though this trade-off has been studied for multi-armed bandits, reinforcement learning (RL) for finite Markov chains, and RL for linear control systems; it is less well-studied for learning-based control of nonlinear control systems. A significant theoretical challenge in the nonlinear setting is that, unlike the linear case, there is no explicit characterization of an optimal controller for a given set of cost… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 43 publications
0
5
0
Order By: Relevance
“…The approach achieves Õ(T 2/3 ) regret, where Õ(•) represents the order up to logarithmic factors. The authors in [28] consider a nonlinear MPC with constraints and state feedback, where the nonlinear dynamics is decoupled into a known nominal linear model and an additive unknown nonlinear term. The unknown nonlinear term is re-estimated during the closed-loop operation, and to ensure the data informativity for estimation, at each time step, the control algorithm randomly chooses to either inject a random input within the input constraint or applies the input computed by the MPC controller.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations
“…The approach achieves Õ(T 2/3 ) regret, where Õ(•) represents the order up to logarithmic factors. The authors in [28] consider a nonlinear MPC with constraints and state feedback, where the nonlinear dynamics is decoupled into a known nominal linear model and an additive unknown nonlinear term. The unknown nonlinear term is re-estimated during the closed-loop operation, and to ensure the data informativity for estimation, at each time step, the control algorithm randomly chooses to either inject a random input within the input constraint or applies the input computed by the MPC controller.…”
Section: Introductionmentioning
confidence: 99%
“…The results in [26]- [28] have made significant contributions in designing learning-based RHC controllers for unknown systems and in understanding their finite-time performances. However, there are still important open problems.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Dynamic regret is able to capture transient performance of the closed loop system and is defined as the ⋆ This work was supported by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) -505182457. difference between the accumulated closed-loop cost of the controller and some benchmark, which is typically defined in hindsight, i.e., with knowledge of all cost functions. Studying the dynamic regret of controllers for dynamical systems has recently gained increasing attention, compare, e.g., Dogan et al (2021); Didier et al (2022). However, in the literature on OCO-based control, pointwise in time state and input constraints are only considered in Nonhoff and Müller (2021); Li et al (2021), and restrictive assumptions or a limited setting are necessary in these works to guarantee constraint satisfaction at all times.…”
Section: Introductionmentioning
confidence: 99%