Genuine leather manufacturing is a multibillion-dollar industry that processes animal hides from varying types of animals such as sheep, alligator, goat, ostrich, crocodile, and cow. Due to the industry’s immense scale, there may be numerous unavoidable causes of damages, leading to surface defects that occur during both the manufacturing process and the bovine’s own lifespan. Owing to the heterogenous and manifold nature of leather surface characteristics, great difficulties can arise during the visual inspection of raw materials by human inspectors. To mitigate the industry’s challenges in the quality control process, this paper proposes the application of a modern vision transformer (ViT) architecture for the purposes of low-resolution image-based anomaly detection for defect localisation as a means of leather surface defect classification. Utilising the low-resolution defective and non-defective images found in the opensource Leather Defect detection and Classification dataset and higher-resolution MVTec AD anomaly benchmarking dataset, three configurations of the vision transformer and three deep learning (DL) knowledge transfer methods are compared in terms of performance metrics as well as in leather defect classification and anomaly localisation. Experiments show the proposed ViT method outperforms the light-weight state-of-the-art methods in the field in the aspect of classification accuracy. Besides the classification, the low computation load and low requirements for image resolution and size of training samples are also advantages of the proposed method.