On the Optimization Landscape of Dynamic Output Feedback Linear Quadratic Control

Duan, Jingliang; Cao, Wei; Zheng, Yufeng; Zhang, Lin

doi:10.48550/arxiv.2201.09598

Cited by 4 publications

(4 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, beyond these properties, until recently little was known about the geometric and analytical properties of the PO formulation of LQG control. We will mainly summarize results on the optimization landscape of LQG control from Zheng et al (63), especially with respect to the connectivity of the stabilizing set K and the structure of the stationary points; several related extensions can be found in other works (64)(65)(66)(67). Before introducing the results, we discuss a special structure for LQG control with the state-space dynamical controller parameterization in Equation 11.…”

Section: Policy Optimization For Linear Quadratic Gaussian Control: T...mentioning

confidence: 99%

“…Recent theoretical results on PO for particular classes of control synthesis problems, some of which are discussed in this survey, not only are exciting but also lead to a new research thrust at the interface of control theory and machine learning. This survey includes control synthesis related to linear quadratic regulator (LQR) theory (35)(36)(37)(38)(39)(40)(41)(42)(43)(44), stabilization (45)(46)(47), linear robust/risk-sensitive control (48)(49)(50)(51)(52)(53)(54)(55), Markov jump linear quadratic control (56-59), Lur'e system control (60), output feedback control (61)(62)(63)(64)(65)(66)(67), and dynamic filtering (68). Surprisingly, some of these strong global convergence results for PO have been obtained in the absence of convexity in the design objective and/or the underlying feasible set.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies

Zhang

et al. 2023

Annu. Rev. Control Robot. Auton. Syst.

View full text Add to dashboard Cite

Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis that has been popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently developed theoretical results on the optimization landscape, global convergence, and sample complexityof gradient-based methods for various continuous control problems, such as the linear quadratic regulator (LQR), [Formula: see text] control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control.

show abstract

Section: Policy Optimization For Linear Quadratic Gaussian Control: T...mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies

Zhang

et al. 2023

Annu. Rev. Control Robot. Auton. Syst.

View full text Add to dashboard Cite

show abstract

“…This brings some positive news and opens the possibility of developing global convergent policy search methods for dynamical output feedback problems, such as linear quadratic Gaussian (LQG) control [16]. Two other recent studies are [17], [18]. In [18], the global convergence of policy search over dynamical filters was proved for a simpler estimation problem.…”

Section: Introductionmentioning

confidence: 99%

Connectivity of the Feasible and Sublevel Sets of Dynamic Output Feedback Control With Robustness Constraints

Zheng

2023

IEEE Control Syst. Lett.

Self Cite

View full text Add to dashboard Cite

This paper considers the optimization landscape of linear dynamic output feedback control with H∞ robustness constraints. We consider the feasible set of all the stabilizing full-order dynamical controllers that satisfy an additional H∞ robustness constraint. We show that this H∞-constrained set has at most two path-connected components that are diffeomorphic under a mapping defined by a similarity transformation. Our proof technique utilizes a classical change of variables in H∞ control to establish a surjective mapping from a set with a convex projection to the H∞-constrained set. This proof idea can also be used to establish the same topological properties of strict sublevel sets of linear quadratic Gaussian (LQG) control and optimal H∞ control. Our results bring positive news for gradientbased policy search on robust control problems.

show abstract

“…The seminal work of [8] first shows that PG methods have global convergence guarantees for the celebrated linear quadratic regulator (LQR) problem. Then, many sample complexity results of PG methods are established for both discrete-time [10], [11] and continuoustime LQR [12], and the PG methods are applied to solve other fundamental Linear Quadratic (LQ) problems, such as risk-sensitive control [13], LQ game [14], linear quadratic Gaussian (LQG) [15], [16] and decentralized control [17], [18], just to name a few. Though these advances lead to fruitful and profound results for model-free control synthesis, they all require a common assumption: a stabilizing controller must be known a prior.…”

Section: Introductionmentioning

confidence: 99%

On the Sample Complexity of Stabilizing Linear Systems via Policy Gradient Methods

Zhao¹,

Fu²,

You³

2022

Preprint

View full text Add to dashboard Cite

Stabilizing unknown dynamical systems with the direct use of data samples has drawn increasing attention in both control and machine learning communities. In this paper, we study the sample complexity of stabilizing linear time-invariant systems via Policy Gradient (PG) methods. Our analysis is built upon a discounted Linear Quadratic Regulator (LQR) framework which alternatively updates the policy and the discount factor of the LQR. In sharp contrast to the existing literature, we propose an explicit rule to adaptively adjust the discount factor by characterizing the stability margin using Lyapunov theory, which has independent interests of its own. We show that the number of iterations per discount factor is uniformly upper bounded, which enables us to prove the sample complexity of stabilizing linear systems via PG methods. Particularly, it only adds a coefficient logarithmic in the spectral radius of the state matrix to the sample complexity of solving LQR problems. We perform numerical experiments to verify our theoretical finding and empirically evaluate the effectiveness of our results on nonlinear systems.

show abstract

On the Optimization Landscape of Dynamic Output Feedback Linear Quadratic Control

Cited by 4 publications

References 22 publications

Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies

Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies

Connectivity of the Feasible and Sublevel Sets of Dynamic Output Feedback Control With Robustness Constraints

On the Sample Complexity of Stabilizing Linear Systems via Policy Gradient Methods

Contact Info

Product

Resources

About