Urban water supply and drainage systems are a crucial component of urban infrastructure, directly affecting residents' livelihoods and industrial production. The normal operation of the water supply and the drainage pipeline is of great significance for conserving water resources and preventing water pollution. However, due to characteristics such as deep burial, diverse materials, and extensive lengths, the detection of defects becomes exceptionally complex. Traditional detection methods used in practical applications, such as ground excavation and destructive testing, typically require the shutdown of water pipelines. This process is time-consuming and labor-intensive, often resulting in significant economic losses. This paper proposes an effective technique for detecting defects in the water supply and the drainage pipeline. The method involves capturing images of the inner walls of water supply conduits and subsequently utilizing an artificial intelligence large-scale model approach (grounded language-image pre-training, GLIP) and a You Only Look Once version 5 (YOLOv5) model to detect defects within them. The experimental results show that GLIP demonstrates impressive detection performance in zero-shot scenarios, while YOLOv5 performs well on existing datasets. By combining these two models, we were able to achieve a balance between fast, flexible detection and high precision, making our approach both practical and efficient for real-world applications.