Underwater cracks are a common problem in buildings and infrastructure, and their presence can have a significant impact on the stability and safety of the structure. Therefore, early detection and repair of cracks is very important. However, traditional underwater crack detection methods usually require manual involvement, which is time-consuming and dangerous. Therefore, it is of great significance to develop an automated crack identification and detection system that can be mounted on an underwater robot. For traditional underwater crack detection, the obtained underwater video is usually segmented and preprocessed, and the obtained data is input into a computer vision-based system for crack recognition and detection, which is much less efficient than the current deep learning methods. In this paper, two kinds of underwater building crack recognition network and detection network based on convolutional neural network are proposed respectively, by collecting different light, depth, turbidity and other multi-waters video materials, after a series of computer vision processing, it will be input into the neural network for the expansion of the data set, and the obtained data set is divided into test set, training set and validation set, and the training set will be input into the neural network used for the target recognition to be trained, to get the comprehensive recognition model of cracks in underwater complex environment. The size and shape of the cracks are manually depicted using the training set, and inputted into the neural network for crack detection to obtain the complex background underwater crack detection model. Using the complex background underwater crack detection model to detect the underwater environment through the neural network for target recognition, the underwater robot equipped with a waterproof and anti-shaking camera communicates with the shore and moves, follows the recognition results of the neural network for target recognition to lock the crack location, and then manipulates the underwater robot to approach the target crack, and then uses the neural network for target recognition to detect the underwater crack with a complex background using the complex background underwater crack detection model. The neural network for target recognition uses a complex background underwater crack detection model to segment the scaled underwater video captured by the camera and process it in high-definition to map the length and width of the cracks with pixel-level accuracy, and then transmits the data to the shore in real time to complete the identification and detection of cracks in underwater buildings.