2020
DOI: 10.1109/lcsys.2019.2921158
|View full text |Cite
|
Sign up to set email alerts
|

Successive Over-Relaxation ${Q}$ -Learning

Abstract: In a discounted reward Markov Decision Process (MDP), the objective is to find the optimal value function, i.e., the value function corresponding to an optimal policy. This problem reduces to solving a functional equation known as the Bellman equation and a fixed point iteration scheme known as the value iteration is utilized to obtain the solution. In literature, a successive over-relaxation based value iteration scheme is proposed to speed-up the computation of the optimal value function. The speed-up is ach… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 14 publications
0
5
0
Order By: Relevance
“…The generalized Bellman operator can be used in other reinforcement learning algorithms as well. For example, it has already been applied to Watkins' Q-learning [10]. It will be interesting to study the rate of convergence and other properties of the modified algorithms, both theoretically and experimentally.…”
Section: Discussionmentioning
confidence: 99%
See 4 more Smart Citations
“…The generalized Bellman operator can be used in other reinforcement learning algorithms as well. For example, it has already been applied to Watkins' Q-learning [10]. It will be interesting to study the rate of convergence and other properties of the modified algorithms, both theoretically and experimentally.…”
Section: Discussionmentioning
confidence: 99%
“…A popular method is successive overrelaxation (SOR). SOR technique has been applied previously to solve an MDP when the model information is completely known [5] and also in the setting of model-free reinforcement learning [10]. The latter algorithm is known as SOR Qlearning.…”
Section: A Related Workmentioning
confidence: 99%
See 3 more Smart Citations