Unmanned underwater vehicles (UUVs) that are widely utilized for underwater cooperative combat, underwater environment detection and underwater resource exploration have to be localized by underwater acoustic sensor networks (UASNs). However, the localization accuracy is hard to guarantee due to the limited bandwidths, long propagation latency, and limited energy resources of the UASNs. In this paper, we propose a reinforcement learning (RL) and neural network based mobile underwater localization scheme to optimize the anchor nodes selection in the UASNs to localize the target precisely. More specifically, this scheme applies SqueezeNet to select the line-of-sight (LOS) anchor nodes based on the received signals. In addition, an RL-based approach is further proposed to make further selection from the LOS anchor nodes without knowing the underwater environment model. The Dyna architecture is applied to reduce the convergence time of the anchor nodes selection. Simulation results based on a nonisovelocity geometry-based underwater acoustic channel model show that the proposed schemes significantly improve the localization accuracy and reduce energy consumption of the UASN to achieve trajectory correction.