The underwater wireless sensor network (UWSNs) is an important communication facility supporting underwater monitoring applications. However, the transmission channel has the characteristics of high bit error rate, strong multipath effect, and many interference factors, and the network node has the characteristics of high energy consumption, difficult energy supply, and the node position vulnerable to change, which makes it extremely difficult for UWSNs to realize the reliable and efficient packet forwarding. To address the problem, we propose the Stackelberg Q-learning based multi-hop cooperative routing algorithm (SQMCR). The SQMCR builds the transmission routes based on the Q-learning algorithm, considering factors such as the delay, the remaining energy, and the network topology, which improves the rationality and adaptability of selecting the next-hop node. By balancing the packet forwarding benefits and the energy consumption costs based on the Stackelberg Q-learning algorithm, the SQMCR establishes the cooperative communication policy to ensure both the reliability and efficiency of underwater communications. It also adopts initializing Q-values and dynamic exploration probabilities optimization methods to further improve the performance of routing algorithms. Experimental results show that the SQMCR can help UWSNs increase the packet forwarding reliability and prolong the network lifetime by 17%. It has a better environment and application adaptability and is more suitable for underwater high-reliability applications.INDEX TERMS underwater wireless sensor networks (UWSNs), routing algorithm, cooperative communication, Q-learning, Stackelberg game. I. INTRODUCTION U NDERWATER wireless sensor networks are an important part of the construction of the marine Internet of Things [1] and an important part of the underwater direction of the future 6G network [2]. They are widely used in many fields, such as disaster early warning, pollutant monitoring, hydrological data monitoring, marine resource exploration, auxiliary navigation, and as an important infrastructure for studying, building, and developing the ocean [3]. Underwater wireless sensor networks are composed of sensor nodes, communication nodes, and sink nodes [4]. At present, the long distance underwater wireless transmission of data mainly depends on the acoustic channel [5]. The underwater acoustic channel has many problems, such as large transmission delay, limited transmission bandwidth, many interference factors, and serious multipath phenomena [6], [7]. Underwater communication nodes are affected by water flow, and their positions and node relationships change dynamically, their communication energy consumption is high, and the energy supplies for the nodes are difficult [6], [7], [8]. All the unfavorable factors make reliable underwater communication extremely difficult. But the reliable communication is the base of various applications in underwater networks [9]. The reliable communication in underwater wireless sensor networks is reflected not ...