2022
DOI: 10.3390/jmse10030383
|View full text |Cite
|
Sign up to set email alerts
|

An AUV Target-Tracking Method Combining Imitation Learning and Deep Reinforcement Learning

Abstract: This study aims to solve the problem of sparse reward and local convergence when using a reinforcement learning algorithm as the controller of an AUV. Based on the generative adversarial imitation (GAIL) algorithm combined with a multi-agent, a multi-agent GAIL (MAG) algorithm is proposed. The GAIL enables the AUV to directly learn from expert demonstrations, overcoming the difficulty of slow initial training of the network. Parallel training of multi-agents reduces the high correlation between samples to avoi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(12 citation statements)
references
References 46 publications
0
12
0
Order By: Relevance
“…For example, four strategies, i.e., prioritized experience replay, actor network indirect supervision training, target network updating with different periods, and expansion of exploration space by applying random noise, were applied in [ 106 ], respectively, to eliminate the correlation of training data, ensure the stability and speed of the convergence of the reinforcement learning AC network, update the critic network faster, and more accurately evaluate and improve the actor network’s generalization ability. Highly correlated data may lead to local convergence in RL [ 133 ]. One solution is to perform random sampling in the experience replay buffer, but this solution is only suitable for off-policy RL [ 133 ].…”
Section: Training and Deployment Methods Of Rl On Bionic Underwater R...mentioning
confidence: 99%
See 1 more Smart Citation
“…For example, four strategies, i.e., prioritized experience replay, actor network indirect supervision training, target network updating with different periods, and expansion of exploration space by applying random noise, were applied in [ 106 ], respectively, to eliminate the correlation of training data, ensure the stability and speed of the convergence of the reinforcement learning AC network, update the critic network faster, and more accurately evaluate and improve the actor network’s generalization ability. Highly correlated data may lead to local convergence in RL [ 133 ]. One solution is to perform random sampling in the experience replay buffer, but this solution is only suitable for off-policy RL [ 133 ].…”
Section: Training and Deployment Methods Of Rl On Bionic Underwater R...mentioning
confidence: 99%
“…Highly correlated data may lead to local convergence in RL [ 133 ]. One solution is to perform random sampling in the experience replay buffer, but this solution is only suitable for off-policy RL [ 133 ]. Another solution is multi-agent RL.…”
Section: Training and Deployment Methods Of Rl On Bionic Underwater R...mentioning
confidence: 99%
“…The calculation equation of uncertainty is: The target probability map represents the probability of the existence of a target in a 3D environment [17]. The target probability map of the 3D environment is defined in Equation (5).…”
Section: Uncertainty Mapmentioning
confidence: 99%
“…The existence of fish and glaciers in the marine environment make it difficult for AUVs to carry out underwater unmanned operation, while an AUV's own energy consumption and communication restriction also limit the efficiency of underwater missions. Research on how to realize the target search and complete the corresponding tasks scientifically and efficiently in the complex 3D environment has been gaining popularity among scholars [3][4][5].…”
Section: Introductionmentioning
confidence: 99%
“…Autonomous underwater vehicles (AUVs) are widely used in military and commercial fields due to their small size, low cost, high degree of autonomy and flexible deployment. They are irreplaceable tools in applications such as accident rescue, seabed topographic mapping, object detection, observation of ocean phenomena and marine resource development [1][2][3][4][5][6][7][8]. With the increasing complexity of underwater tasks, it is necessary to improve work efficiency through the cooperation of multiple AUVs, which can even complete tasks that cannot be performed by a single AUV [9].…”
Section: Introductionmentioning
confidence: 99%