To address the challenges of low recognition accuracy, low robustness, and low detection efficiency in existing tunnel face joint and fissure recognition methods, we present a deep learning recognition segmentation algorithm called the mask region convolutional neural network (Mask R‐CNN) that is enhanced by an advanced Transformer attention mechanism and deformable convolution network (Mask R‐CNN‐TD). The Transformer attention mechanism improves the backbone network's ability to extract image features by focusing on important areas. A deformable convolutional network enables the network to more precisely conform to the morphological characteristics of joints and fissures on the tunnel face, thereby enhancing the accuracy of detection. Experimental results demonstrate that Mask R‐CNN‐TD achieves superior performance, compared to Mask R‐CNN series algorithms and other instance segmentation methods in terms of detection accuracy, with mean average precision scores of 70.5%, 70.8%, 53.2%, and 63.3% for detection box and mask segmentation at thresholds of 0.5 and 0.75, respectively. Based on the stable and efficient Mask R‐CNN‐TD model, we developed a mobile application called tunnel face detector to automatically detect tunnel faces on the construction site.