In order to improve the recognition accuracy of partial discharge (PD) by making full use of the time-frequency characteristics of PD signals and employing deep learning theory, a kind of PD pattern recognition method based on variational mode decompositon (VMD)-Choi-Williams distribution (CWD) spectrum and optimized convolutional neural network (CNN) with cross-layer feature fusion is proposed in this paper. Firstly, a PD signal is decomposed into several components by VMD algorithm, and the CWD analysis of the obtained components is carried out to obtain the VMD-CWD time-frequency spectrum. Secondly, the cross-layer feature fusion and optimization CNN (CFFO-CNN) is constructed by introducing cross-layer connection and optimization algorithm. Thirdly, the VMD-CWD is regarded as the input vector to train CFFO-CNN to learn and extract the intrinsic features of the spectrum. Finally, the trained network is used to recognize the PD types of the test samples. The proposed method is compared with traditional recognition methods such as BP neural network (BPNN) and support vector machine (SVM), as well as some commonly used deep learning algorithms. The experimental results indicate that the recognition performance of the proposed method is significantly better than that of existing recognition methods with accuracy up to 99.5%. It is proved that CFFO-CNN has superior feature extraction ability, which can extract the internal features of the VMD-CWD spectrum independently with higher recognition accuracy and wider application prospect. INDEX TERMS Variational mode decomposition (VMD), Choi-Williams distribution (CWD), feature fusion, convolutional neural network (CNN), partial discharge, pattern recognition.