Rotating machinery (RM) is one of the most common mechanical equipment in engineering applications and has a broad and vital role. Rotating machinery includes gearboxes, bearing motors, generators, etc. In industrial production, the important position of rotating machinery and its variable speed and complex working conditions lead to unstable vibration characteristics, which have become a research hotspot in mechanical fault diagnosis. Aiming at the multi-classification problem of rotating machinery with variable speed and complex working conditions, this paper proposes a fault diagnosis method based on the construction of improved sensitive mode matrix (ISMM), isometric mapping (ISOMAP) and Convolution-Vision Transformer network (CvT) structure. After overlapping and sampling the variable speed signals, a high-dimensional ISMM is constructed, and the ISMM is mapped into the manifold space through ISOMAP manifold learning. This method can extract the fault transient characteristics of the variable speed signal, and the experiment proves that it can solve the problem that the conventional method cannot effectively extract the characteristics of the variable speed data. CvT combines the advantages of self-attention mechanism and convolution in CNN, so the CvT network structure is used for feature extraction and fault recognition and classification. The CvT network structure takes into account both global feature extraction and local feature extraction, which greatly reduces the number of training iterations and the size of the network model. Two data sets (the HFXZ-I planetary gearbox variable speed data set in the laboratory and the bearing variable speed public data set of the University of Ottawa in Canada) are used to experimentally verify the proposed fault diagnosis model. Experimental results show that the proposed fault diagnosis model has good recognition accuracy and robustness.