Autonomous Underwater Vehicle Path Planning Method of Soft Actor–Critic Based on Game Training

Wang, Zhuo; Lu, Hao; Qin, Hongde; Sui, Yancheng

doi:10.3390/jmse10122018

Cited by 5 publications

(6 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…SAC adopts the actor-critic framework. The actor collects data by interacting with the environment [31], while the critic's value function directs the actor in learning a more efficient strategy by using a policy-based gradient optimizer [32]. The critic learns a value function to measure the quality of state-action pairs from the data collected by the actor, and then helps the actor to update the policy.…”

Section: Algorithmic Framework Based On Sacmentioning

confidence: 99%

Soft Actor-Critic and Risk Assessment-Based Reinforcement Learning Method for Ship Path Planning

Wang,

Ji,

2024

Sustainability

View full text Add to dashboard Cite

Ship path planning is one of the most important themes in waterway transportation, which is deemed as the cleanest mode of transportation due to its environmentally friendly and energy-efficient nature. A path-planning method that combines the soft actor-critic (SAC) and navigation risk assessment is proposed to address ship path planning in complex water environments. Specifically, a continuous environment model is established based on the Markov decision process (MDP), which considers the characteristics of the ship path-planning problem. To enhance the algorithm’s performance, an information detection strategy for restricted navigation areas is employed to improve state space, converting absolute bearing into relative bearing. Additionally, a risk penalty based on the navigation risk assessment model is introduced to ensure path safety while imposing potential energy rewards regarding navigation distance and turning angle. Finally, experimental results obtained from a navigation simulation environment verify the robustness of the proposed method. The results also demonstrate that the proposed algorithm achieves a smaller path length and sum of turning angles with safety and fuel economy improvement compared with traditional methods such as RRT (rapidly exploring random tree) and DQN (deep Q-network).

show abstract

Section: Algorithmic Framework Based On Sacmentioning

confidence: 99%

Soft Actor-Critic and Risk Assessment-Based Reinforcement Learning Method for Ship Path Planning

Wang,

Ji,

2024

Sustainability

View full text Add to dashboard Cite

show abstract

“…Four algorithms based on DDPG, EDDPG, SAC, and ENSAC were applied to verify the effectiveness of MOP observation path planning. The hyperparameters for algorithmic models were initially based on references, i.e., based on the hyperparameters of the SAC algorithm primarily referenced [57][58][59], while the hyperparameters of the ENN network mainly referenced [43,60]. Moreover, the learning rate controls the size of each parameter update.…”

Section: Parameter Settingmentioning

confidence: 99%

Adaptive Sampling Path Planning for a 3D Marine Observation Platform Based on Evolutionary Deep Reinforcement Learning

Zhang,

Liu,

Zhou

2023

JMSE

View full text Add to dashboard Cite

Adaptive sampling of the marine environment may improve the accuracy of marine numerical prediction models. This study considered adaptive sampling path optimization for a three-dimensional (3D) marine observation platform, leading to a path-planning strategy based on evolutionary deep reinforcement learning. The low sampling efficiency of the reinforcement learning algorithm is improved by evolutionary learning. The combination of these two components as a new algorithm has become a current research trend. We first combined the evolutionary algorithm with different reinforcement learning algorithms to verify the effectiveness of the combination of algorithms with different strategies. Experimental results indicate that the fusion of the two algorithms based on a maximum-entropy strategy is more effective for adaptive sampling using a 3D marine observation platform. Data assimilation experiments indicate that adaptive sampling data from a 3D mobile observation platform based on evolutionary deep reinforcement learning improves the accuracy of marine environment numerical prediction systems.

show abstract

“…If no obstacle is detected in a particular zone, then δ i z t and d i z s t have constant values. Therefore, the detection result χ of the forward-looking sonar can be described as shown below [28]:…”

Section: Hunting Environment Descriptionmentioning

confidence: 99%

“…The deep neural networks were constructed using Pytorch, and the network models were trained using GPU. The hyperparameters for S4AC were initially based on references, i.e., based on the hyperparameters of the SAC algorithm primarily referenced our previous work [28], while the hyperparameters of the GAN network mainly referenced [45]. Moreover, the learning rate was slightly adjusted from 0.0001 to 0.0003 to expedite the training process.…”

Section: Experimental Environment and Training Parametersmentioning

confidence: 99%

“…The traditional AUV planning methods mainly include rapidly exploring random tree (RRT) [18,19], Artificial Potential Field (APF) [20][21][22], Fuzzy Logic [23,24], and Geofencing [25][26][27] algorithms. These methods require designing algorithm parameters based on underwater conditions, which depends on the designer's understanding of the underwater environment [28]. Additionally, the lack of learning capability makes these methods unable to enhance the path-planning function of AUVs.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

State Super Sampling Soft Actor–Critic Algorithm for Multi-AUV Hunting in 3D Underwater Environment

Wang

Sui

Qin

et al. 2023

JMSE

Self Cite

View full text Add to dashboard Cite

Reinforcement learning (RL) is known for its efficiency and practicality in single-agent planning, but it faces numerous challenges when applied to multi-agent scenarios. In this paper, a Super Sampling Info-GAN (SSIG) algorithm based on Generative Adversarial Networks (GANs) is proposed to address the problem of state instability in Multi-Agent Reinforcement Learning (MARL). The SSIG model allows a pair of GAN networks to analyze the previous state of dynamic system and predict the future state of consecutive state pairs. A multi-agent system (MAS) can deduce the complete state of all collaborating agents through SSIG. The proposed model has the potential to be employed in multi-autonomous underwater vehicle (multi-AUV) planning scenarios by combining it with the Soft Actor–Critic (SAC) algorithm. Hence, this paper presents State Super Sampling Soft Actor–Critic (S4AC), which is a new algorithm that combines the advantages of SSIG and SAC and can be applied to Multi-AUV hunting tasks. The simulation results demonstrate that the proposed algorithm has strong learning ability and adaptability and has a considerable success rate in hunting the evading target in multiple testing scenarios.

show abstract

Autonomous Underwater Vehicle Path Planning Method of Soft Actor–Critic Based on Game Training

Cited by 5 publications

References 42 publications

Soft Actor-Critic and Risk Assessment-Based Reinforcement Learning Method for Ship Path Planning

Soft Actor-Critic and Risk Assessment-Based Reinforcement Learning Method for Ship Path Planning

Adaptive Sampling Path Planning for a 3D Marine Observation Platform Based on Evolutionary Deep Reinforcement Learning

State Super Sampling Soft Actor–Critic Algorithm for Multi-AUV Hunting in 3D Underwater Environment

Contact Info

Product

Resources

About