2019
DOI: 10.1007/s10732-019-09408-x
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic heuristic acceleration of linearly approximated SARSA($$\lambda $$): using ant colony optimization to learn heuristics dynamically

Abstract: Heuristically accelerated reinforcement learning (HARL) is a new family of algorithms that combines the advantages of reinforcement learning (RL) with the advantages of heuristic algorithms. To achieve this, the action selection strategy of the standard RL algorithm is modified to take into account a heuristic running in parallel with the RL process. This paper presents two approximated HARL algorithms that make use of pheromone trails to improve the behaviour of linearly approximated SARSA(λ) by dynamically l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 40 publications
(55 reference statements)
0
2
0
Order By: Relevance
“…When it receives a new tuple (s, r, d), it will return an action following a policy depending on the implemented algorithm ( -greedy for example). The model has a Q state variable which is an SxA dimension matrix for implementing the learning algorithm (Q-learning or SARSA [4] for example). When convergence is reached (depending on the values of Q), the model becomes passive and no longer responds to the environment.…”
Section: Motivation and Contributionmentioning
confidence: 99%
“…When it receives a new tuple (s, r, d), it will return an action following a policy depending on the implemented algorithm ( -greedy for example). The model has a Q state variable which is an SxA dimension matrix for implementing the learning algorithm (Q-learning or SARSA [4] for example). When convergence is reached (depending on the values of Q), the model becomes passive and no longer responds to the environment.…”
Section: Motivation and Contributionmentioning
confidence: 99%
“…The mechanism of the ACS algorithm is inspired by the natural behavior of biological ant colonies [2] [3]. The ACS algorithm is built on notation of the reinforcement learning concept [4]. It uses ants as agents to manipulate environment via use of pheromone trails to find the shortest path.…”
Section: Introductionmentioning
confidence: 99%