2021
DOI: 10.48550/arxiv.2102.05214
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Task-Optimal Exploration in Linear Dynamical Systems

Andrew Wagenmaker,
Max Simchowitz,
Kevin Jamieson

Abstract: Exploration in unknown environments is a fundamental problem in reinforcement learning and control. In this work, we study task-guided exploration and determine what precisely an agent must learn about their environment in order to complete a particular task. Formally, we study a broad class of decision-making problems in the setting of linear dynamical systems, a class that includes the linear quadratic regulator problem. We provide instance-and taskdependent lower bounds which explicitly quantify the difficu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 40 publications
0
3
0
Order By: Relevance
“…We discuss (Marjani et al, 2021) in more detail in Section 4.1. In the special case of linear dynamical systems and smooth rewards, a setting which encompasses the Linear Quadratic Regulator problem, Wagenmaker et al (2021) establish a finite-time, instancedependent lower bound and matching upper bound for -optimal policy identification. To our knowledge, this is the only work to obtain an instance-optimal ( , δ)-PAC result, but their analysis does not apply to tabular MDPs.…”
Section: Related Workmentioning
confidence: 99%
“…We discuss (Marjani et al, 2021) in more detail in Section 4.1. In the special case of linear dynamical systems and smooth rewards, a setting which encompasses the Linear Quadratic Regulator problem, Wagenmaker et al (2021) establish a finite-time, instancedependent lower bound and matching upper bound for -optimal policy identification. To our knowledge, this is the only work to obtain an instance-optimal ( , δ)-PAC result, but their analysis does not apply to tabular MDPs.…”
Section: Related Workmentioning
confidence: 99%
“…Note that L provides an upper bound on the information in the face of an optimal offline exploration policy. Studying it may therefore assist with experiment design, as in Wagenmaker et al (2021). Rather than the appearance of −K I on the denominator, as we saw in Proposition 3.1, we have A B F .…”
Section: Interesting System-theoretic Quantitiesmentioning
confidence: 96%
“…Lower bounds on the variance of the gradient estimates in policy gradient approaches are supplied in Ziemann et al (2022). Lower bounds for offline linear control are also studied in Wagenmaker et al (2021) with the objective of designing optimal experiments. We instead focus on the LQR setting to understand the dependence of the excess cost on interpretable system-theoretic quantities.…”
Section: Related Workmentioning
confidence: 99%