2021
DOI: 10.48550/arxiv.2106.07046
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards Tight Bounds on the Sample Complexity of Average-reward MDPs

Abstract: We prove new upper and lower bounds for sample complexity of finding an -optimal policy of an infinite-horizon average-reward Markov decision process (MDP) given access to a generative model. When the mixing time of the probability transition matrix of all policies is at most t mix , we provide an algorithm that solves the problem using O(t mix −3 ) (oblivious) samples per state-action pair. Further, we provide a lower bound showing that a linear dependence on t mix is necessary in the worst case for any algor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 19 publications
0
1
0
Order By: Relevance
“…Another popular performance measure is the sample complexity, which is the amount of data required to learn a near-optimal policy; see e.g Brunskill and Li (2014),Jin and Sidford (2021),Wang (2017)…”
mentioning
confidence: 99%
“…Another popular performance measure is the sample complexity, which is the amount of data required to learn a near-optimal policy; see e.g Brunskill and Li (2014),Jin and Sidford (2021),Wang (2017)…”
mentioning
confidence: 99%