Autonomous underwater vehicle (AUV) plays an increasingly important role in ocean exploration. Existing AUVs are usually not fully autonomous and generally limited to pre-planning or preprogramming tasks. Reinforcement learning (RL) and deep reinforcement learning have been introduced into the AUV design and research to improve its autonomy. However, these methods are still difficult to apply directly to the actual AUV system because of the sparse rewards and low learning efficiency. In this paper, we proposed a deep interactive reinforcement learning method for path following of AUV by combining the advantages of deep reinforcement learning and interactive RL. In addition, since the human trainer cannot provide human rewards for AUV when it is running in the ocean and AUV needs to adapt to a changing environment, we further propose a deep reinforcement learning method that learns from both human rewards and environmental rewards at the same time. We test our methods in two path following tasks-straight line and sinusoids curve following of AUV by simulating in the Gazebo platform. Our experimental results show that with our proposed deep interactive RL method, AUV can converge faster than a DQN learner from only environmental reward. Moreover, AUV learning with our deep RL from both human and environmental rewards can also achieve a similar or even better performance than that with the deep interactive RL method and can adapt to the actual environment by further learning from environmental rewards.
INDEX TERMS Autonomous Underwater Vehicle, Interactive Reinforcement Learning, Deep Q Network, Path FollowingGUANGLIANG LI (M'14) received the Bachelor's degree in automation and M.Sc. degree in control theory and control engineering from the
Reinforcement learning agent learns how to perform a task by interacting with the environment. The use of reinforcement learning in real-life applications has been limited because of the sample efficiency problem. Interactive reinforcement learning has been developed to speed up the agent's learning and facilitate to learn from ordinary people by allowing them to provide social feedback, e.g, evaluative feedback, advice or instruction. Inspired by real-life biological learning scenarios, there could be many ways to provide feedback for agent learning, such as via hardware delivered, natural interaction like facial expressions, speech or gestures. The agent can even learn from feedback via unimodal or multimodal sensory input. This paper reviews methods for interactive reinforcement learning agent to learn from human social feedback and the ways of delivering feedback. Finally, we discuss some open problems and possible future research directions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.