2019
DOI: 10.1109/access.2019.2950055
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning Based Stochastic Shortest Path Finding in Wireless Sensor Networks

Abstract: Many factors influence the connection states between nodes of wireless sensor networks, such as physical distance, and the network load, making the network's edge length dynamic in abundant scenarios. This dynamic property makes the network essentially form a graph with stochastic edge lengths. In this paper, we study the stochastic shortest path problem on a directional graph with stochastic edge lengths, using reinforcement learning algorithms. we regard each edge length as a random variable following unknow… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
22
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 34 publications
(22 citation statements)
references
References 35 publications
0
22
0
Order By: Relevance
“…If the current policy is used to select actions for such an update, the procedure is called "on-policy learning". For instance, in SARSA method [40], [41], which is an on-policy learning method, the state-action value function is updated as follows…”
Section: B Off-policy Td Learningmentioning
confidence: 99%
See 3 more Smart Citations
“…If the current policy is used to select actions for such an update, the procedure is called "on-policy learning". For instance, in SARSA method [40], [41], which is an on-policy learning method, the state-action value function is updated as follows…”
Section: B Off-policy Td Learningmentioning
confidence: 99%
“…In most cases, a stochastic (e.g., random) policy is selected as the behaviour policy to ensure enough exploration of new states. One of the most practiced off-policy methods is known as Q-learning [41], [42], [44], [46], which updates the value function using the Bellman optimality equation as follows…”
Section: B Off-policy Td Learningmentioning
confidence: 99%
See 2 more Smart Citations
“…There are different ways to address this problem, including using genetic algorithm [4], ant colony optimisation [5], [6], using reinforcement learning [7], using nearest neighbour optimisation [8], [9], etc. However, unlike classical shortest path problems, we treat this as a stochastic shortest path problem [10] since the AUV must consider the energy cost of a path in addition to the path length. This is because a shorter path may be more expensive than a longer path if the AUV needs to make many turns for the shorter path.…”
Section: Introductionmentioning
confidence: 99%