Underwater structural defects in hydraulic tunnels are highly concealed and difficult to be identified by conventional manual methods. Remotely operated vehicle combined with visible light cameras can provide a noncontact and high spatial resolution damage detection solution. However, manually extracting useful structural damage-related information from massive data is time-consuming and involves high labor cost. This article proposes an integrated pixel-level underwater structural multi-defects instance segmentation and quantification framework for hydraulic tunnels via machine vision and deep learning. Firstly, a tunnel lining underwater structural multi-defects video dataset is developed. Next, an improved You Only Look At CoefficienTs for Edge devices is used to build the detector by exploiting temporal redundancy in videos. Three backbone detectors are used to trade off the balance between detection accuracy and efficiency, and a cross-domain transfer learning strategy is introduced to reduce model training costs and data dependencies. Various complicated tunnel underwater inspection scenarios, including uneven illumination, tilt shooting, high brightness, and motion blur scenarios, are used to evaluate model generalization capability. Experimental results show that ResNet50-based YolactEdge can well trade off the balance between accuracy and speed, which achieves 92.47 bbox mAP, 92.15 mask mAP, and 39.27 FPS in the testing set. A quantification evaluation method is proposed to quantify the detection results and extract the geometric features of structural defects based on digital image processing techniques. The proposed method can accurately identify the number, size, and area of tunnel underwater structural defects, providing data support for subsequent reinforcement.