Task-Optimal Exploration in Linear Dynamical Systems

Wagenmaker, Andrew; Simchowitz, Max; Jamieson, Kevin

doi:10.48550/arxiv.2102.05214

Cited by 2 publications

(3 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We discuss (Marjani et al, 2021) in more detail in Section 4.1. In the special case of linear dynamical systems and smooth rewards, a setting which encompasses the Linear Quadratic Regulator problem, Wagenmaker et al (2021) establish a finite-time, instancedependent lower bound and matching upper bound for -optimal policy identification. To our knowledge, this is the only work to obtain an instance-optimal ( , δ)-PAC result, but their analysis does not apply to tabular MDPs.…”

Section: Related Workmentioning

confidence: 99%

Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

Wagenmaker¹,

Simchowitz²,

Jamieson³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

The theory of reinforcement learning has focused on two fundamental problems: achieving low regret, and identifying -optimal policies. While a simple reduction allows one to apply a low-regret algorithm to obtain an -optimal policy and achieve the worst-case optimal rate, it is unknown whether low-regret algorithms can obtain the instance-optimal rate for policy identification. We show that this is not possible-there exists a fundamental tradeoff between achieving low regret and identifying an -optimal policy at the instance-optimal rate. Motivated by our negative finding, we propose a new measure of instance-dependent sample complexity for PAC tabular reinforcement learning which explicitly accounts for the attainable state visitation distributions in the underlying MDP. We then propose and analyze a novel, planning-based algorithm which attains this sample complexity-yielding a complexity which scales with the suboptimality gaps and the "reachability" of a state. We show that our algorithm is nearly minimax optimal, and on several examples that our instance-dependent sample complexity offers significant improvements over worst-case bounds.

show abstract

Section: Related Workmentioning

confidence: 99%

Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

Wagenmaker¹,

Simchowitz²,

Jamieson³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Note that L provides an upper bound on the information in the face of an optimal offline exploration policy. Studying it may therefore assist with experiment design, as in Wagenmaker et al (2021). Rather than the appearance of −K I on the denominator, as we saw in Proposition 3.1, we have A B F .…”

Section: Interesting System-theoretic Quantitiesmentioning

confidence: 96%

“…Lower bounds on the variance of the gradient estimates in policy gradient approaches are supplied in Ziemann et al (2022). Lower bounds for offline linear control are also studied in Wagenmaker et al (2021) with the objective of designing optimal experiments. We instead focus on the LQR setting to understand the dependence of the excess cost on interpretable system-theoretic quantities.…”

Section: Related Workmentioning

confidence: 99%

The Fundamental Limitations of Learning Linear-Quadratic Regulators

Lee¹,

Ziemann²,

Tsiamis³

et al. 2023

Preprint

View full text Add to dashboard Cite

We present a local minimax lower bound on the excess cost of designing a linear-quadratic controller from offline data. The bound is valid for any offline exploration policy that consists of a stabilizing controller and an energy bounded exploratory input. The derivation leverages a relaxation of the minimax estimation problem to Bayesian estimation, and an application of Van Trees' inequality. We show that the bound aligns with system-theoretic intuition. In particular, we demonstrate that the lower bound increases when the optimal control objective value increases. We also show that the lower bound increases when the system is poorly excitable, as characterized by the spectrum of the controllability gramian of the system mapping the noise to the state and the H ∞ norm of the system mapping the input to the state. We further show that for some classes of systems, the lower bound may be exponential in the state dimension, demonstrating exponential sample complexity for learning the linear-quadratic regulator offline.

show abstract

Task-Optimal Exploration in Linear Dynamical Systems

Cited by 2 publications

References 40 publications

Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

The Fundamental Limitations of Learning Linear-Quadratic Regulators

Contact Info

Product

Resources

About