The fuel system serves as the core component of marine diesel engines, and timely and effective fault diagnosis is the prerequisite for the safe navigation of ships. To address the challenge of current data-driven fault-diagnosis-based methods, which have difficulty in feature extraction and low accuracy under small samples, this paper proposes a fault diagnosis method based on digital twin (DT), Siamese Vision Transformer (SViT), and K-Nearest Neighbor (KNN). Firstly, a diesel engine DT model is constructed by integrating the mathematical, mechanism, and three-dimensional physical models of the Medium-speed diesel engines of 6L21/31 Marine, completing the mapping from physical entity to virtual entity. Fault simulation calculations are performed using the DT model to obtain different types of fault data. Then, a feature extraction network combining Siamese networks with Vision Transformer (ViT) is proposed for the simulated samples. An improved KNN classifier based on the attention mechanism is added to the network to enhance the classification efficiency of the model. Meanwhile, a Weighted-Similarity loss function is designed using similarity labels and penalty coefficients, enhancing the model’s ability to discriminate between similar sample pairs. Finally, the proposed method is validated using a simulation dataset. Experimental results indicate that the proposed method achieves average accuracies of 97.22%, 98.21%, and 99.13% for training sets with 10, 20, and 30 samples per class, respectively, which can accurately classify the fault of marine fuel systems under small samples and has promising potential for applications.