With the development of Industry 4.0, as a pivotal part of the power system, large-capacity power transformers are requiring fault diagnostic methods with higher intelligence, accuracy and anti-interference ability. Considering the powerful capability for extracting non-linear features and the sensitivity differences to features of deep learning methods, this paper proposes a deep parallel diagnostic method for transformer dissolved gas analysis (DGA). In view of the insufficient and imbalanced dataset of transformers, adaptive synthetic oversampling (ADASYN) was implemented to augment the fault dataset. Then, the newly constructed dataset was normalized and input into the LSTM-based diagnostic framework. Then, the dataset was converted into images as the input of the CNN-based diagnostic framework. At the same time, the problem of still insufficient data was compensated by the introduction of transfer learning technology. Finally, the diagnostic models were trained and tested respectively, and the Dempster-Shafer (DS) evidence theory was introduced to fuse the diagnostic confidence matrices of the two models to achieve deep parallel diagnosis. The results of the proposed deep parallel diagnostic method show that without complex feature extraction, the diagnostic accuracy rate could reach 96.9%. Even when the dataset was superimposed with 3% random noises, the rate only decreased by 0.62%.Appl. Sci. 2020, 10, 1329 2 of 18 their shortcomings in learning ability, processing efficiency, and feature extraction ability. For example, the learning ability of the fuzzy methods are not satisfying [7,8]. Neural networks (NN) tend to fall into local optimal solutions [9,10]. The K-nearest neighbor (KNN) method is inefficient in high-dimensional space [11]. The support vector machine (SVM) is essentially a two-classifier, which makes it more troublesome to deal with multi-classification problems [12,13]. There are also some relevant diagnosis studies on combining traditional ratio methods and intelligent methods, which are combinations of previous researches, and have achieved certain effects [14][15][16]. However, the features extracted by the ratio methods are still limited. And the anti-interference ability of the combined models need further testing. In recent years, due to the strong ability of deep learning to extract complex non-linear features, some papers have tried to introduce it into the field of transformer fault diagnosis. And the fault diagnostic accuracy of deep learning methods has been significantly improved compared to traditional machine learning diagnostic models'. The authors in [17] proposed a DBN-based fault diagnostic method that used the uncoded ratio of DGA data as the model input. Compared with the traditional methods', the accuracy was significantly improved. The authors in [18] introduced a method for identifying and locating winding faults based on CNN and transformer impulse tests. And the results showed that the method was effective. The authors in [19] studied the internal fault diagnosis of...