For the poor model generalization and low diagnostic efficiency of fault diagnosis under imbalanced distributions, a novel fault diagnosis method using variational autoencoder generation adversarial network and improved convolutional neural network, named VGAIC-FDM, is proposed in this paper. First, to capture local features of vibration signals, continuous wavelet transform is employed to convert the original one-dimensional fault signals into wavelet time–frequency images. Second, for the data dimensionality reduction and model simplification, the time–frequency wavelet images are processed in grayscale to generate single-channel grayscale time–frequency images. Then, sample augmentation is performed on grayscale time–frequency images to balance the dataset by using a variational autoencoder generation adversarial network. Finally, the generated images and the original images are fused and trained by using a focus-loss-optimized CNN classifier to achieve fault diagnosis under unbalanced conditions. The experimental results show that the VGAIC-FDM effectively captures the potential spatial distribution of real samples and alleviates the impact caused by the inconsistent difficulty of sample classification. As a result, it enhances the fault diagnosis performance of the model when dealing with unbalanced datasets, leading to higher accuracy and F1-score values.