2014
DOI: 10.1613/jair.4271
|View full text |Cite
|
Sign up to set email alerts
|

Convergence of a Q-learning Variant for Continuous States and Actions

Abstract: This paper presents a reinforcement learning algorithm for solving infinite horizon Markov Decision Processes under the expected total discounted reward criterion when both the state and action spaces are continuous. This algorithm is based on Watkins' Q-learning, but uses Nadaraya-Watson kernel smoothing to generalize knowledge to unvisited states. As expected, continuity conditions must be imposed on the mean rewards and transition probabilities. Using results from kernel regression theory, this algorithm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 28 publications
0
5
0
Order By: Relevance
“…Under standard assumptions, CAQL with dynamic tolerance {τ t } converges a.s. to a stationary point (Thm. 1, (Carden, 2014)).…”
Section: Accelerating Max-q Computationmentioning
confidence: 99%
“…Under standard assumptions, CAQL with dynamic tolerance {τ t } converges a.s. to a stationary point (Thm. 1, (Carden, 2014)).…”
Section: Accelerating Max-q Computationmentioning
confidence: 99%
“…In other words, the controller structure should induce similar decisions from similar observation chains. This typical assumption is also made by the continuous state-action MDP and POMDP literature [7], [8], [19].…”
Section: A Stochastic Kernel-based Finite State Automatamentioning
confidence: 99%
“…Traditionally, value function V 0 ( v, τ ) is initialized as 0, which may slow down the convergence speed (Carden 2014). Therefore, we propose a warm start strategy 3 that approximates the probability of arriving on time for vehicles at intersection v with time-to-deadline τ as follows: V 0 ( v, τ ) = 1/(1 + e −ζ(τ −Te) ), where ζ is the coefficient.…”
Section: Other Practical Considerationsmentioning
confidence: 99%