2001
DOI: 10.1007/3-540-44581-1_39
|View full text |Cite
|
Sign up to set email alerts
|

Learning Rates for Q-Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

7
261
0
9

Year Published

2005
2005
2019
2019

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 229 publications
(277 citation statements)
references
References 4 publications
7
261
0
9
Order By: Relevance
“…Bertsekas and Tsitsiklis have verified the convergence of stochastic iterative algorithms [3] when (41) holds. In fact many traditional RL algorithms have been proved to be stochastic iterative algorithms [3], [4], [47] and QRL is the same as traditional RL, and main differences lie in:…”
Section: A Convergence Of Qrlmentioning
confidence: 99%
“…Bertsekas and Tsitsiklis have verified the convergence of stochastic iterative algorithms [3] when (41) holds. In fact many traditional RL algorithms have been proved to be stochastic iterative algorithms [3], [4], [47] and QRL is the same as traditional RL, and main differences lie in:…”
Section: A Convergence Of Qrlmentioning
confidence: 99%
“…Interest in Linear programming (LP) for solving MDPs has been renewed (Defarias and van Roy, 2003) because of the well-known stability properties of LP solvers. Other areas that are attracting interest are: risk-sensitive RL (Borkar, 2002;Geibel and Wysotzki, 2005), factored MDPs (Schuurmans and Patrascu, 2002), and analyzing computational complexity (Evan-Dar and Mansour, 2003).…”
Section: Discussionmentioning
confidence: 99%
“…The sequence (Q t ; t ≥ 0) is indeed known to converge to Q * when appropriate local learning rates are used (Tsitsiklis, 1994;Jaakkola et al, 1994). 15 The rate of convergence of Q-learning was studied by in an asymptotic setting and later by Even-Dar and Mansour (2003) in a finite-sample setting. The key observation that lead to the discovery of Q-learning is that unlike the optimal state values, the optimal action-values can be expressed as expectations (compare Equations (13) and (15)).…”
Section: Q-learning In Finite Mdpsmentioning
confidence: 99%