Fault diagnosis methods based on deep learning have progressed greatly in recent years. However, the limited training data and complex work conditions still restrict the application of these intelligent methods. This paper proposes an intelligent bearing fault diagnosis method, i.e., Siamese Vision Transformer, suiting limited training data and complex work conditions. The Siamese Vision Transformer, combining Siamese network and Vision Transformer, is designed to efficiently extract the feature vectors of input samples in high-level space and complete the classification of the fault. In addition, a new loss function combining the Kullback-Liebler divergence both directions is proposed to improve the performance of the proposed model. Furthermore, a new training strategy termed random mask is designed to enhance input data diversity. A comparative test is conducted on the Case Western Reserve University bearing dataset and Paderborn dataset and our method achieves reasonably high accuracy with limited data and satisfactory generation capability for cross-domain tasks.