The Internet of Underwater Things (IoUT) systems using Autonomous Underwater Vehicle (AUV) have significant issues with data collecting latency. This work explores the utilization of heuristic approaches and reinforcement learning. Specifically, we employ ant colony optimization (ACO) and Q-learning methods by implementing them using NS3 simulation. Based on the traveling salesman problem (TSP), adopting two objectives, determining the shortest path and achieving a balance between the length of the AUV's tour and increasing the value of information (VoI) of the entire network, moreover, adopting 7 scenarios, every one with a specific style of ordering visiting the sensor nodes. Additionally, the study investigates the integration of ACO and Q-learning algorithms. The results prove the suggested algorithms can obtain the desired paths planning of AUVs dealing with various numbers of SNs (10, 30, and 50) by considering the length of the path, energy consumption, and period of time for data transfer and the computation time. In contrast to previous studies that employed the branch and bound (BB), genetic algorithms (GA), and ant colony algorithm (ACA), the distance obtained is smaller by 7.5%, and the VoI increased by 0.3%. In reference to the examined algorithm's computational time, in ACO using 10 SNs (less by 0.96% for BB method and 0.32% for ACA method, and increased by 0.2% for GA method), however, the utilisation of 30 SNs exhibits a decrease of 0.99%, 52.688%, and 30.37% for BB, ACA, and GA, respectively. Additionally, Q-learning with 10 SNs takes less time than BB, ACA, and GA methods, by 95.4%, 90%, and 82.22%, respectively, as well, with 30 SNs decreased by (99.95%, 97.6%, and 96.47%) for (BB, ACA and GA) methods, respectively.