In-service urban utility tunnels (UUT) suffer from cracks, corrosion, and leakage defects, which rises the chance of major accidents. However, prevailing detection methods for UUT remain reliant on manual inspection and subjective judgment, or traditional image processing technologies, such methods may not be able to obtain accurate defect information. This study proposes a novel and effective network called UUTNet based on the constructed UUT dataset for defects detection. Considering that the UUT defects has a certain distribution correlation, the attention module is introduced to the Pyramid Scene Parsing Network to capture the relation. By adding the hybrid dilated convolution after the feature extraction layer, the receptive field is expanded to further extract global and local features. The performance of UUTNet was evaluated based on the metrics MIoU, F1-score, Accuracy, and robustness. Comparative experiments were conducted, and the results showed the UUTNet achieved the best detection performance, achieving 0.7615 MIoU, 0.9806 Accuracy and 0.8012 F1-score. The MIoU was further improved to 0.7847 by utilizing the Bayesian optimization. Three extreme inspection scenes, including uneven illumination, high brightness, and obstacle interference, were applied to validate model robustness. The proposed method offers robust technical assistance for detecting defects in the UUT and precisely assessing the distribution and extent of these defects.