Underwater target localization using range-only and single-beacon (ROSB) techniques with autonomous vehicles has been used recently to improve the limitations of more complex methods, such as long baseline and ultra-short baseline systems. Nonetheless, in ROSB target localization methods, the trajectory of the tracking vehicle near the localized target plays an important role in obtaining the best accuracy of the predicted target position. Here, we investigate a Reinforcement Learning (RL) approach to find the optimal path that an autonomous vehicle should follow in order to increase and optimize the overall accuracy of the predicted target localization, while reducing time and power consumption. To accomplish this objective, different experimental tests have been designed using state-of-the-art deep RL algorithms. Our study also compares the results obtained with the analytical Fisher information matrix approach used in previous studies. The results revealed that the policy learned by the RL agent outperforms trajectories based on these analytical solutions, e.g. the median predicted error at the beginning of the target's localisation is 17% less. These findings suggest that using deep RL for localizing acoustic targets can be successfully applied to in-water applications that include tracking of acoustically tagged marine animals by autonomous underwater vehicles. This is envisioned as a first necessary step to validate the use of RL to tackle such problems, which could be used later on in a more complex scenarios
I. INTRODUCTIONOne of the main challenges in marine research lies in underwater positioning of underwater features or assets (e.g., marine species [1] or underwater vehicles [2]). Due to the large attenuation of radio waves in water [3], Global Positioning System (GPS) signals are not suitable for positioning underwater targets. Nonetheless, acoustic signals can fill the underwater communications gap left by radio waves. Acoustic signals have much greater underwater propagation capabilities [4], and therefore, a network of nodes or beacons can be deployed and used to localize underwater targets, which may include Autonomous Underwater Vehicles (AUV), benthic rovers, or acoustically tagged organisms.Unfortunately, underwater acoustic deployments are often complex and highly economically and logistically expensive