2023
DOI: 10.1109/tie.2022.3190850
|View full text |Cite
|
Sign up to set email alerts
|

A Hierarchical Deep Reinforcement Learning Framework With High Efficiency and Generalization for Fast and Safe Navigation

Abstract: We present a hierarchical deep reinforcement learning (DRL) framework with prominent sampling efficiency and sim-to-real transfer ability for fast and safe navigation: the low-level DRL policy enables the robot to move towards the target position and keep a safe distance to obstacles simultaneously; the high-level DRL policy is supplemented to further enhance the navigation safety. We select a waypoint located on the path from the robot to the ultimate goal as the sub-goal to reduce the state space and avoid s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 23 publications
(6 citation statements)
references
References 29 publications
0
6
0
Order By: Relevance
“…Before the navigation termination signal flag = 0 appears, the agent gets an intensive reward r dis . Among them, w is the reward weight, and r dis is determined by the distance the agent moves to the target, as shown in (4), where d tar (t − 1) is the Euclidean distance between the agent and the target at the previous moment, and d tar (t) is the Euclidean distance at the current moment. Specifically, it is calculated as shown in (5).…”
Section: Reward Structurementioning
confidence: 99%
See 1 more Smart Citation
“…Before the navigation termination signal flag = 0 appears, the agent gets an intensive reward r dis . Among them, w is the reward weight, and r dis is determined by the distance the agent moves to the target, as shown in (4), where d tar (t − 1) is the Euclidean distance between the agent and the target at the previous moment, and d tar (t) is the Euclidean distance at the current moment. Specifically, it is calculated as shown in (5).…”
Section: Reward Structurementioning
confidence: 99%
“…T HANKS to their simple structure and strong maneuverability, quadrotor unmanned aerial vehicles (UAVs) have been widely used in disaster rescue, environmental monitoring, border patrol, and other application fields [1]. Autonomous navigation of UAVs in unknown environments is a key technology to ensure that UAVs can complete complex tasks by themselves [2], and its goal is to navigate UAVs moving to desired destinations along collision-free and efficient paths without human intervention [3], [4]. This article aims to develop an autonomous navigation method that allows UAV to achieve collision-free autonomous exploration and autonomous navigation tasks from its starting location to the target location based only on the UAV's visual sensory data and the semantic information of the target, without obtaining the environment map and target location in advance, as opposed to existing works [5], [6].…”
Section: Introductionmentioning
confidence: 99%
“…This paper integrates a fully connected layer dueling structure and PER into the DDQN algorithm. During the learning phase, a batch of experience sequences is selected from the prioritized experience replay memory using Equation (18). Unlike the DDQN, the gradient in the proposed method is multiplied by the importance sampling weight w i in Equation (19), defined as…”
Section: Prioritized Experience Replaymentioning
confidence: 99%
“…To address this issue, the double deep Q-network (DDQN) was proposed [16], utilizing two distinct Q-networks for action selection and value estimation to prevent overestimation. Meanwhile, DDQN finds applications in training mobile robots for tasks like optimal navigation and obstacle avoidance [17,18]. This research initially used the DQN algorithm to solve autonomous navigation for mobile robots.…”
Section: Introductionmentioning
confidence: 99%
“…To let go of the assumption of fully known human information in simulations, directly using raw sensor data is a promising alternative [17], [18], [19], [20], [21]. The reciprocal relationship among pedestrians is extracted from consecutive raw sensor data such as high-dimensional LiDAR scans with DNNs.…”
Section: Introductionmentioning
confidence: 99%