2021
DOI: 10.3390/s21030841
|View full text |Cite
|
Sign up to set email alerts
|

Non-Communication Decentralized Multi-Robot Collision Avoidance in Grid Map Workspace with Double Deep Q-Network

Abstract: This paper presents a novel decentralized multi-robot collision avoidance method with deep reinforcement learning, which is not only suitable for the large-scale grid map workspace multi-robot system, but also directly processes Lidar signals instead of communicating between the robots. According to the particularity of the workspace, we handcrafted a reward function, which considers both the collision avoidance among the robots and as little as possible change of direction of the robots during driving. Using … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 31 publications
0
7
0
Order By: Relevance
“…Since the application scenarios of this paper are the same as those proposed by [10], and the algorithm selection is consistent; therefore, the network structure selection is compared with them. Based on Chen et al's network structure, we introduced LSTM layer to obtain temporal information of data in order to enable the robot to make decisions based on historical information, and we added CBAM hybrid attention mechanism [12] in order to enable the robot to pay more attention to the information of nearby robots.…”
Section: Network Structure Verification Experimentsmentioning
confidence: 99%
See 2 more Smart Citations
“…Since the application scenarios of this paper are the same as those proposed by [10], and the algorithm selection is consistent; therefore, the network structure selection is compared with them. Based on Chen et al's network structure, we introduced LSTM layer to obtain temporal information of data in order to enable the robot to make decisions based on historical information, and we added CBAM hybrid attention mechanism [12] in order to enable the robot to pay more attention to the information of nearby robots.…”
Section: Network Structure Verification Experimentsmentioning
confidence: 99%
“…For the scenario in this paper, the robot can only get the reward when it reaches the end point or collision. For this sparse reward scenario, adding the appropriate process reward can speed up the training process.Therefore, based on the reward of [10], we design a new reward function for this motion model, and we still judge the effectiveness of the reward function in terms of two metrics, 𝑅 π‘Žπ‘π‘ and 𝑅 π‘šπ‘œπ‘£π‘’ . We demonstrate the effectiveness of the reward function through two sets of experiments, one using the new reward function designed in this paper to derive policy2 (i.e., policy2 above), and the other not using the reward function in this paper to derive policy3, using the same network structure as [10], and keeping the other parameters the same for 10,000 rounds.…”
Section: Reward Function Verification Experimentsmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, PPO was used in multi-robot collision avoidance task [66], bipedal robot locomotion [67] etc. DQN also proves to work well on a certain type of tasks in real robots [68]- [70]. Recently, SAC [71] emerges to combine the strengths of the above two main approaches and has proved its capability in real robot problems like Dexterous manipulation [72], mobile robot navigation [73], robot arm control [74], multi-legged robot [75], etc.…”
Section: Reinforcement Learning For Robot Controlmentioning
confidence: 99%
“…In the adaptive inverse distance weighted method, hyperparameters of each known point in the model are learned, and the nearest adjacent statistics of each point are calculated. Furthermore, the multidimensional spatial discrete points are formed, and spatial modeling is done by Kriging interpolation method [19,20]. Finally, the corresponding coordinates of interpolation points to be predicted are input into the spatial model, so as to obtain the corresponding hyperparameters of interpolation points.…”
Section: State Value Reusementioning
confidence: 99%