Underwater target detection is a critical task in various applications, including environmental monitoring, underwater exploration, and marine resource management. As the demand for underwater observation and exploitation continues to grow, there is a greater need for reliable and efficient methods of detecting underwater targets. However, the unique underwater environment often leads to significant degradation of the image quality, which results in reduced detection accuracy. This paper proposes an improved YOLOv5 underwater-target-detection network to enhance accuracy and reduce missed detection. First, we added the global attention mechanism (GAM) to the backbone network, which could retain the channel and spatial information to a greater extent and strengthen cross-dimensional interaction so as to improve the ability of the backbone network to extract features. Then, we introduced the fusion block based on DAMO-YOLO for the neck, which enhanced the system’s ability to extract features at different scales. Finally, we used the SIoU loss to measure the degree of matching between the target box and the regression box, which accelerated the convergence and improved the accuracy. The results obtained from experiments on the URPC2019 dataset revealed that our model achieved an mAP@0.5 score of 80.2%, representing a 1.8% and 2.3% increase in performance compared to YOLOv7 and YOLOv8, respectively, which means our method achieved state-of-the-art (SOTA) performance. Moreover, additional evaluations on the MS COCO dataset indicated that our model’s mAP@0.5:0.95 reached 51.0%, surpassing advanced methods such as ViDT and RF-Next, demonstrating the versatility of our enhanced model architecture.