Cracks in tunnel linings might cause a lack of water tightness, directly affecting the overall stability and durability using an increased risk of corrosion of the rebar. Hence, automatic, timely and accurate detection of cracks is significant to safe operation and maintenance for tunnels. In recent years, Convolution Neural Network (CNN) has achieved success in the field of computer vision. However, the large storage requirement and high computational consumption of the model limit its application. Aiming to overcome these challenges, this study proposes a lightweight target detector YOLOV5-lite, which utilizes transfer learning to construct the YOLO single-stage target detection model and uses the Efficient Intersection over Union ( EIoU) in the loss function to optimize its convergence speed. To improve the model efficiency, a Network Pruning algorithm is performed in order to reduce the number of parameters in the model. To compensate for the loss of accuracy caused by the Network Pruning algorithm compressing the network, knowledge distillation algorithm is implemented in this study and fused with the Network Pruning algorithm. This results in a new lightweight modelling framework which has a high computational efficiency enabling deployment in mobile devices as well as high accuracy leading to good detection performance. To illustrate the advantages of the proposed method, two experiments using an extensive evaluation of the YOLOV5 series of models and a comparison with different model tests were made to validate it. In the evaluation of the YOLOv5 series, key findings include: (a) The optimized YOLOV5-loss model, incorporating the EIoU loss function, achieved an impressive crack recognition accuracy of 0.97. This model demonstrated superior capability in detecting fine cracks, particularly in corner regions, with an accuracy exceeding 0.85. The EIoU loss function offers enhanced sensitivity to overlapping regions and more precise boundary localization, which are critical in identifying minute or boundary-ambiguous cracks. (b) The YOLOV5-finetuned model, which underwent network pruning alone, achieved an accuracy of 0.74 but was hindered by significant detection gaps, despite achieving a 50.9% reduction in model size. In contrast, the YOLOV5-lite model, refined using a combination of network pruning and knowledge distillation, maintained a high recognition accuracy of 0.96, with only a negligible 0.01 difference from the optimal YOLOV5-loss model. When compared with five different models, the YOLOv5-lite model demonstrated significant advantages: A substantial reduction in the size of the proposed YOLOV5-lite by 184.1 MB, 659.4 MB and an increase in the number of the transmitted Frames Per Second (FPS) by 6.69 f/s, 13.24 f/s compared with that of YOLOV3, Faster R-CNN, respectively. Overall, the proposed new method consistently achieves high detection performance as well as substantially reducing computational demands, making it well-suited for real-time applications, particularly in mobile and embedded systems.