2008
DOI: 10.1007/978-3-540-89722-4_4
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2012
2012
2014
2014

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(10 citation statements)
references
References 6 publications
0
10
0
Order By: Relevance
“…1). Intuitively, the algorithm seeks to identify and use the policy in the input set Π that yields the highest average reward on the 5 One can easily prove that the upper bound H always exists for any unichain Markov reward process (see [12,Chap. 8]).…”
Section: Algorithmmentioning
confidence: 99%
See 2 more Smart Citations
“…1). Intuitively, the algorithm seeks to identify and use the policy in the input set Π that yields the highest average reward on the 5 One can easily prove that the upper bound H always exists for any unichain Markov reward process (see [12,Chap. 8]).…”
Section: Algorithmmentioning
confidence: 99%
“…Nonetheless, the algorithm inherits the regret of UCRL itself and still displays a O(S √ A) dependency on states and actions. In [5] the Parameter Elimination (PEL) algorithm is provided with a set of MDPs. The algorithm is analyzed in the PAC-MDP framework and under the assumption that the true model actually belongs to the set of MDPs, it is shown to have a performance which does not depend on the size of the stateaction space and it only has a O( √ m) a dependency on the number of MDPs m. 8 In our setting, although no model is provided and no assumption on the optimality of π * is made, RLPA achieves the same dependency on m. The span sp(λ π ) of a policy is known to be a critical parameter determining how well and fast the average reward of a policy can be estimated using samples (see e.g., [1]).…”
Section: Gap-independent Boundmentioning
confidence: 99%
See 1 more Smart Citation
“…We study the question of finding the minimax sample-complexity of reinforcement learning without making the usual Markov assumption, but where the learner has access to a finite set of reinforcement learning environments to which the truth is known to belong. This problem was tackled previously by Dyagilev et al (2008) and Lattimore et al (2013a). The new algorithm improves on the theoretical results in both papers and is simultaneously simpler and more elegant.…”
Section: Introductionmentioning
confidence: 96%
“…[6]. For results on adaptive control in the non-countable setting we refer the reader to [7][8][9] and references therein: these deal with the classical setup, namely they seek a combined estimation and control scheme and consider parameterized models.…”
Section: Introductionmentioning
confidence: 99%