2024
DOI: 10.1088/1361-6501/ad2663
|View full text |Cite
|
Sign up to set email alerts
|

Improve exploration in deep reinforcement learning for UAV path planning using state and action entropy

Hui Lv,
Yadong Chen,
Shibo Li
et al.

Abstract: Despite being a widely adopted development framework for unmanned aerial vehicle (UAV), deep reinforcement learning (DRL) is often considered sample inefficient. Particularly, UAV struggles to fully explore the state and action space in environments with sparse rewards. While some exploration algorithms have been proposed to overcome the challenge of sparse rewards, they are not specifically tailored for UAV platform. Consequently, applying those algorithms to UAV path planning may lead to problems such as uns… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 28 publications
0
4
0
Order By: Relevance
“…where, x and y represent both the corresponding coordinates of the elements in the cross-correlation matrix φ and the coordinates of the cross-correlation elements in V t and V t+i , respectively, and M denotes the cross-correlation module. We use the Euclidean distance of the corresponding patches in the high-dimensional feature space as a across-correlation metric, as shown in equation (3), where the smaller the distance is, the higher the degree of cross-correlation and the smaller the difference of the corresponding patches…”
Section: Cross-correlationmentioning
confidence: 99%
See 1 more Smart Citation
“…where, x and y represent both the corresponding coordinates of the elements in the cross-correlation matrix φ and the coordinates of the cross-correlation elements in V t and V t+i , respectively, and M denotes the cross-correlation module. We use the Euclidean distance of the corresponding patches in the high-dimensional feature space as a across-correlation metric, as shown in equation (3), where the smaller the distance is, the higher the degree of cross-correlation and the smaller the difference of the corresponding patches…”
Section: Cross-correlationmentioning
confidence: 99%
“…In unknown and complex environments, visual odometry [1] plays a crucial role as a key technology for motion estimation and self-localization solely based on the onboard sensors of vehicles [2], drones [3] and robots [4]. Substantial research efforts [5,6] have been dedicated to devising a visual odometry estimation system that integrates accuracy, robustness and from consecutive images, followed by estimating relative pose between two adjacent frames based on geometric relationships.…”
Section: Introductionmentioning
confidence: 99%
“…The second layer of the RBF network is the hidden layer with s nodes and the basis function is j k (Y) (k = 1, 2, L, T, G). The expression of the t-th node in this layer based on the G-dimensional Gaussian function is shown in equation (19).…”
Section: Q-table Improvementmentioning
confidence: 99%
“…Deep reinforcement learning combines the perceptual ability of deep learning and the decision-making ability of reinforcement learning, which effectively solves the problems of dimensional catastrophe and algorithm training in Reinforcement Learning [19], and provides a new idea for solving path-planning problems in complex environments. Deep Q learning [20], Deep Deterministic Policy Gradient (DDPG) [21], Twin Delayed Deep Deterministic Policy Gradient [22], and other methods have been used in the fields of mobile robot task allocation and control optimization.…”
Section: Introductionmentioning
confidence: 99%