Regular crack inspection of tunnels is essential to guarantee their safe operation. At present, the manual detection method is time-consuming, subjective and even dangerous, while the automatic detection method is relatively inaccurate. Detecting tunnel cracks is a challenging task since cracks are tiny, and there are many noise patterns in the tunnel images. This study proposes a deep learning algorithm based on U-Net and a convolutional neural network with alternately updated clique (CliqueNet), called U-CliqueNet, to separate cracks from background in the tunnel images. A consumer-grade DSC-WX700 camera (SONY, Wuxi, China) was used to collect 200 original images, then cracks are manually marked and divided into sub-images with a resolution of 496 × 496 pixels. A total of 60,000 sub-images were obtained in the dataset of tunnel cracks, among which 50,000 were used for training and 10,000 were used for testing. The proposed framework conducted training and testing on this dataset, the mean pixel accuracy (MPA), mean intersection over union (MIoU), precision and F1-score are 92.25%, 86.96%, 86.32% and 83.40%, respectively. We compared the U-CliqueNet with fully convolutional networks (FCN), U-net, Encoder–decoder network (SegNet) and the multi-scale fusion crack detection (MFCD) algorithm using hypothesis testing, and it’s proved that the MIoU predicted by U-CliqueNet was significantly higher than that of the other four algorithms. The area, length and mean width of cracks can be calculated, and the relative error between the detected mean crack width and the actual mean crack width ranges from −11.20% to 18.57%. The results show that this framework can be used for fast and accurate crack semantic segmentation of tunnel images.