2021
DOI: 10.1049/itr2.12046
|View full text |Cite
|
Sign up to set email alerts
|

A centralised training algorithm with D3QN for scalable regular unmanned ground vehicle formation maintenance

Abstract: The unmanned ground vehicle (UGV) has been widely used to accomplish various missions in civilian or military environments. Formation of the UGVs group is an important technique to support the broad applications of multi-functional UGVs. This study proposes a scalable regular UGV formation maintenance (SRUFM) algorithm based on deep reinforcement learning (DRL), which aims to use a unified DRL framework to improve the lateral control and longitudinal control performance of UGV in different situations of the fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 39 publications
0
2
0
Order By: Relevance
“…Traditional Q learning [10] belongs to the value-based algorithm of reinforcement learning algorithm,Q that is,Q (s,a),is the state state at a certain moment,take action a can get the expectation of gain,the environment will feedback the corresponding reward reward according to the agent's action,so the main idea of the algorithm is to build the state and action into a Qtable table to store Q values,and then select the action that can obtain the maximum benefit according to Q values.As the complexity of the environment increases,Qtable is difficult to adapt to tasks with huge state space.2013 DeepMind team [11]proposed the DQN algorithm,which combines deep learning and reinforcement learning for the first time.In 2016,Tom Schaul [12] proposed a preferred experience playback method,which uses temporal difference error to measure the learning value of each experience; secondly,the absolute value of temporal difference error is used to rank the experiences in the experience pool,and the experiences with high deviation are played back more frequently,while the importance sampling weights to correct the high bias problem,which speeds up the training and reduces the convergence difficulty.In 2016 Van Hasselt [13] proposed DoubleDQN to provide a solution to the problem of DQN overestimation,by implementing the selection of actions and the evaluation of actions with different value functions.Wang [14] further proposed DuelingDQN,which The dominance function "normalizes" the Q-Value to the Value baseline,which helps to improve the learning efficiency and make the learning more stable; at the same time,experience shows that the dominance function also helps to reduce the variance,which is an important factor of overfitting.However,D3QN [15] combined with prioritized empirical replay still has shortcomings,and the ability to explore the optimal path is still weak.…”
Section: Introduction To Deep Q Reinforcement Learningmentioning
confidence: 99%
“…Traditional Q learning [10] belongs to the value-based algorithm of reinforcement learning algorithm,Q that is,Q (s,a),is the state state at a certain moment,take action a can get the expectation of gain,the environment will feedback the corresponding reward reward according to the agent's action,so the main idea of the algorithm is to build the state and action into a Qtable table to store Q values,and then select the action that can obtain the maximum benefit according to Q values.As the complexity of the environment increases,Qtable is difficult to adapt to tasks with huge state space.2013 DeepMind team [11]proposed the DQN algorithm,which combines deep learning and reinforcement learning for the first time.In 2016,Tom Schaul [12] proposed a preferred experience playback method,which uses temporal difference error to measure the learning value of each experience; secondly,the absolute value of temporal difference error is used to rank the experiences in the experience pool,and the experiences with high deviation are played back more frequently,while the importance sampling weights to correct the high bias problem,which speeds up the training and reduces the convergence difficulty.In 2016 Van Hasselt [13] proposed DoubleDQN to provide a solution to the problem of DQN overestimation,by implementing the selection of actions and the evaluation of actions with different value functions.Wang [14] further proposed DuelingDQN,which The dominance function "normalizes" the Q-Value to the Value baseline,which helps to improve the learning efficiency and make the learning more stable; at the same time,experience shows that the dominance function also helps to reduce the variance,which is an important factor of overfitting.However,D3QN [15] combined with prioritized empirical replay still has shortcomings,and the ability to explore the optimal path is still weak.…”
Section: Introduction To Deep Q Reinforcement Learningmentioning
confidence: 99%
“…Therefore, many researchers used deep reinforcement learning methods to study vehicle decisionmaking problems. [12][13][14] The deep reinforcement learning methods replaces the value function of the continuous state space with a neural network, and outputs discrete (such as DQN algorithm) or continuous actions (such as DDPG algorithm), which reduces computer memory requirement, improves decision-making accuracy and training efficiency, and improves traffic efficiency and safety, reduces fuel consumption reduction. Studies have shown that deep reinforcement learning methods are more excellent than traditional methods, [15][16][17] and can get good effects in complex environments.…”
Section: Introductionmentioning
confidence: 99%