Detecting leather surface defects has become an important subject in industrial inspections and attracted significant attention as a challenging task. Traditional image processing techniques struggle with the detection of leather surface defects with a variety of shapes, sizes, backgrounds, and noise. Deep learning is a promising solution to this problem. This work focuses on the comparative study of 26 classical deep learning models in the field of leather surface defect type recognition. That aims to lay a foundation for the design and development of new schemes for leather defect inspection. Based on tanned leather from an enterprise, eight types of leather surface defects (cavities, pinholes, scratches, rotten surfaces, growth lines, healing wounds, folds, and bacterial wounds) were collected using an ultra-high definition whole leather imaging device. Two challenging datasets with various shapes, sizes, and colours were constructed. Extensive experimental evaluations were conducted. The deep learning model can achieve more than 95% accuracy when the defect imaging is ideal and limited. In case that the shapes, sizes, and colour of the above eight defects keep diverse, Densenet169 performed the best with a recognition accuracy of 72.5%, and ShuffleNet model with the worst performance reached 64.3%. Systematic in-depth experimental evaluation shows that deep learning models are promising in the field of leather surface defect detection, however, challenges remain.