2022
DOI: 10.1007/s40430-022-03399-w
|View full text |Cite
|
Sign up to set email alerts
|

Optimal path planning method based on epsilon-greedy Q-learning algorithm

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(7 citation statements)
references
References 27 publications
0
7
0
Order By: Relevance
“…An e-greedy algorithm was used to define if an exploitative or exploratory a was taken. 53 A linear e-decay was introduced to reduce convergence time, with e starting at 1 and ending at 0.001. 54 Furthermore, the processing is split between three agents, each agent (i) is responsible for finding the optimal policy across three a instead of 27.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…An e-greedy algorithm was used to define if an exploitative or exploratory a was taken. 53 A linear e-decay was introduced to reduce convergence time, with e starting at 1 and ending at 0.001. 54 Furthermore, the processing is split between three agents, each agent (i) is responsible for finding the optimal policy across three a instead of 27.…”
Section: Methodsmentioning
confidence: 99%
“…An ϵ ‐greedy algorithm was used to define if an exploitative or exploratory a was taken 53 . A linear ϵ ‐decay was introduced to reduce convergence time, with ϵ starting at 1 and ending at 0.001 54 .…”
Section: Methodsmentioning
confidence: 99%
“…The proposed algorithms (Algorithm 2 and Algorithm 3) have some initialization parameters, such as α, which represents the learning rate to moderate the speed of learning and the update of Q-values we assume α = 0.5), and γ, which represents the discount factor that quantifies the importance given to future rewards (in our approach we consider that future task placement are important thus we attribute an enough great value to γ = 0.9). To choose an action (i.e., for the placement or scheduling), Q-learning uses an ϵ-greedy policy [11]. ϵ-greedy policy is an efficient random approach that selects with a probability ϵ a random action and with a probability (1-ϵ) the action with the highest estimated reward Q(S, a).…”
Section: Reward (R)mentioning
confidence: 99%
“…Bulut proposed an improved epsilon-greedy Q-learning (IEGQL) algorithm to enhance efficiency and productivity regarding path length and computational cost [18]. The IEGQL presents a reward function that ensures the environment's knowledge in advance for a mobile robot, and mathematical modeling is presented to provide the optimal selection besides ensuring a rapid convergence.…”
Section: Related Workmentioning
confidence: 99%