The accurate distribution of joints on the tunnel face is crucial for assessing the stability and safety of surrounding rock during tunnel construction. This paper introduces the Mask R-CNN image segmentation algorithm, a state-of-the-art deep learning model, to achieve efficient and accurate identification and extraction of joints on tunnel face images. First, digital images of tunnel faces were captured and stitched, resulting in 286 complete images suitable for analysis. Then, the joints on the tunnel face were extracted using traditional image processing algorithms, the commonly used U-net image segmentation model, and the Mask R-CNN image segmentation model introduced in this paper to address the lack of recognition accuracy. Finally, the extraction results obtained by the three methods were compared. The comparison results show that the joint extraction method based on the Mask R-CNN image segmentation deep learning model introduced in this paper achieved the best joint extraction effect with a Dice similarity coefficient of 87.48%, outperforming traditional methods and the U-net model, which scored 60.59% and 75.36%, respectively, realizing accurate and efficient acquisition of tunnel face rock joints. These findings suggest that the Mask R-CNN model can be effectively implemented in real-time monitoring systems for tunnel construction projects.