Life on Earth depends on healthy oceans, which supply a large percentage of the planet's oxygen, food, and energy. However, the oceans are under threat from climate change, which is devastating the marine ecosystem and the economic and social systems that depend on it. The Internet-of-underwaterthings (IoUTs), a global interconnection of underwater objects, enables round-the-clock monitoring of the oceans. It provides high-resolution data for training machine learning (ML) algorithms for rapidly evaluating potential climate change solutions and speeding up decision-making. The sensors in conventional IoUTs are battery-powered, which limits their lifetime, and constitutes environmental hazards when they die. In this paper, we propose a sustainable scheme to improve the throughput and enable wireless charging of underwater networks, enabling them to potentially operate indefinitely. The scheme is based on simultaneous wireless information and power transfer (SWIPT) from an autonomous underwater vehicle (AUV) used for data collection. We model the problem of jointly maximising throughput and harvested power as a Markov Decision Process (MDP), and develop a model-free reinforcement learning (RL) solution. The model's reward function incentivises the AUV to find optimal trajectories that maximise throughput and power transfer to the underwater nodes while minimising its own energy consumption. To the best of our knowledge, this is the first attempt at using RL for this application. The scheme is implemented in an open 3D RL environment specifically developed in MATLAB for this study. The performance results show up 207% improvement in energy efficiency compared to those of a random trajectory scheme used as a baseline model.Index Terms-wireless underwater sensor networks, machine learning, reinforcement learning, internet-of-underwater-things, simultaneous wireless and information transfer, and wireless power transfer (SWIPT), autonomous underwater vehicles (auv).