Considering the issue of capturing the local and global contextual information and enhancing the parallel capability of bearing fault diagnosis in variable load and noise environments, a fault diagnosis method of rolling bearing based on PE-DCM and Vision Transformer (ViT) is proposed. Firstly, the one-dimensional vibration signal is converted into a two-dimensional time-frequency diagram by continuous wavelet transform in the data processing module, and the model can understand the characteristics of the vibration signal more comprehensively. Secondly, a pyramid exponential expansion convolution module is established to extract the local features of fault information. Then, the global features of the fault information are learned through the ViT network, and the adaptive multi-attention is used to dynamically adjust the attention weights according to the features of the input data so as to inhibit noise or unimportant information. Finally, the experimental verification is carried out by using Case Western Reserve University and self-made MFS-bearing data set. The experimental results show that the method can better reflect the powerful image classification ability of the ViT network and has better noise resistance and generalization compared with other fault diagnosis methods.