In order to improve the accuracy of bearing fault diagnosis under a small sample, variable load, and noise conditions, a new fault diagnosis method based on an image information fusion and Vision Transformer (ViT) transfer learning model is proposed in this paper. Firstly, the method applies continuous wavelet transform (CWT), Gramian angular summation field (GASF), and Gramian angular difference field (GADF) to the time series data, and generates three grayscale images. Then, the generated three grayscale images are merged into an information fusion image (IFI) using image processing techniques. Finally, the obtained IFIs are fed into the advanced ViT model and trained based on transfer learning. In order to verify the effectiveness and superiority of the proposed method, the rolling bearing dataset from Case Western Reserve University (CWRU) is used to carry out experimental studies under different working conditions. Experimental results show that the method proposed in this paper is superior to other traditional methods in terms of accuracy, and the effect of ViT model based on transfer learning (TLViT) training is better than that of the Resnet50 model based on transfer learning training (TLResnet50) under variable loads and small sample conditions. In addition, the experimental results also prove that the IFI with multiple image information has better anti-noise ability than the single information image. Therefore, the method proposed in this paper can improve the accuracy of bearing fault diagnosis under small sample, variable load and noise conditions, and provide a new method for bearing fault diagnosis.