In the field of tunnel lining crack identification, the semantic segmentation algorithms based on convolution neural network (CNN) are extensively used. Owing to the inherent locality of CNN, these algorithms cannot make full use of context semantic information, resulting in difficulty in capturing the global features of crack. Transformer‐based networks can capture global semantic information, but this method also has the deficiencies of strong data dependence and easy loss of local features. In this paper, a hybrid semantic segmentation algorithm for tunnel lining crack, named SCDeepLab, is proposed by fusing Swin Transformer and CNN in the encoding and decoding framework of DeepLabv3+ to address the above issues. In SCDeepLab, a joint backbone network is introduced with CNN‐based Inverse Residual Block and Swin Transformer Block. The former is used to extract the local detailed information of the crack to generate the shallow feature layer, whereas the latter is used to extract the global semantic information to obtain the deep feature layer. In addition, Efficient Channel Attention enhanced Feature Fusion Module is proposed to fuse the shallow and deep features to combine the advantages of the two types of features. Furthermore, the strategy of transfer learning is adopted to solve the data dependency of Swin Transformer. The results show that the mean intersection over union (mIoU) and mean pixel accuracy (mPA) of SCDeepLab on the data sets constructed in this paper are 77.41% and 84.42%, respectively, which have higher segmentation accuracy than previous CNN‐based and transformer‐based semantic segmentation algorithms.