The wear particle classification algorithm proposed is based on an integrated ResNet50 and Vision Transformer, aiming to address the problems of a complex background, overlapping and similar characteristics of wear particles, low classification accuracy, and the difficult identification of small target wear particles in the region. Firstly, an ESRGAN algorithm is used to improve image resolution, and then the Separable Vision Transformer (SepViT) is introduced to replace ViT. The ResNet50-SepViT model (SV-ERnet) is integrated by combining the ResNet50 network with SepViT through weighted soft voting, enabling the intelligent identification of wear particles through transfer learning. Finally, in order to reveal the action mechanism of SepViT, the different abrasive characteristics extracted by the SepViT model are visually explained using the Grad-CAM visualization method. The experimental results show that the proposed integrated SV-ERnet has a high recognition rate and robustness, with an accuracy of 94.1% on the test set. This accuracy is 1.8%, 6.5%, 4.7%, 4.4%, and 6.8% higher than that of ResNet101, VGG16, MobileNetV2, AlexNet, and EfficientV1, respectively; furthermore, it was found that the optimal weighting factors are 0.5 and 0.5.