2007
DOI: 10.1007/s10732-007-9031-5
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating autonomous learning by using heuristic selection of actions

Abstract: This paper investigates how to make improved action selection for online policy learning in robotic scenarios using reinforcement learning (RL) algorithms. Since finding control policies using any RL algorithm can be very time consuming, we propose to combine RL algorithms with heuristic functions for selecting promising actions during the learning process. With this aim, we investigate the use of heuristics for increasing the rate of convergence of RL algorithms and contribute with a new learning algorithm, H… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
32
0
8

Year Published

2009
2009
2024
2024

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 61 publications
(40 citation statements)
references
References 18 publications
0
32
0
8
Order By: Relevance
“…For example, in the Q-learning algorithm, the approximated state-action value function Q(s, a) will converge to the optimal state-value function Q * (s, a) for discrete MDPs provided that each state-action pair is visited infinitely often and the learning rate has the property of being square summable but not summable; these conditions are still applicable and not invalidated under the proposed model. Although they follow a different heuristic based approach, Bianchi et al (2008) also employ an analogous model that transparently guides the exploration behavior of an underlying reinforcement learning algorithm for increasing the rate of convergence; we refer interested reader to their work for similar theoretical results under other settings.…”
Section: Update-tree(t H)mentioning
confidence: 99%
“…For example, in the Q-learning algorithm, the approximated state-action value function Q(s, a) will converge to the optimal state-value function Q * (s, a) for discrete MDPs provided that each state-action pair is visited infinitely often and the learning rate has the property of being square summable but not summable; these conditions are still applicable and not invalidated under the proposed model. Although they follow a different heuristic based approach, Bianchi et al (2008) also employ an analogous model that transparently guides the exploration behavior of an underlying reinforcement learning algorithm for increasing the rate of convergence; we refer interested reader to their work for similar theoretical results under other settings.…”
Section: Update-tree(t H)mentioning
confidence: 99%
“…A Heuristically Accelerated Reinforcement Learning (HARL) algorithm [3] is a way to solve a MDP problem with explicit use of a heuristic function H : S × A → for influencing the choice of actions by the learning agent. H(s, a) defines the heuristic that indicates the importance of performing action a when visiting state s. The heuristic function is strongly associated with the policy indicating which action must be taken regardless of the action-value of the other actions that could be used in the state.…”
Section: Heuristically Accelerated Reinforcement Learningmentioning
confidence: 99%
“…The first HARL algorithm proposed was the Heuristically Accelerated Q-learning (HAQL) [3], as an extension of the Q-learning algorithm [2]. The only difference between the two algorithms is that in the HAQL makes use of an heuristic function H(s, a) in the − greedy action choice rule, that can be written as:…”
Section: Heuristically Accelerated Reinforcement Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, it is necessary to study the self-learning model. At present, there are many associated researches, but most of them focus on how students should learn English by themselves, master the appropriate learning skills and efficiently use some modern technology and equipment to help them doing self-study [8][9][10][11].…”
Section: Introductionmentioning
confidence: 99%