In the defect diagnosis of the gear-shaft-bearing system with compound defects, the generated vibration signals are complicated. In addition, the information acquired by a single sensor is easily affected by uncertain factors, and low diagnostic accuracy is caused when traditional defect diagnosis methods are used, which cannot meet the high-precision diagnosis requirements. Therefore, a method is developed to identify the defect types and defect degrees of the gear-shaft-bearing system efficiently. In this method, the vibration signals are collected using multiple sensors, the dual-tree complex wavelet and the optimal weighting factor (OWF) methods are used for the data layer fusion, and the preprocessing is realized through wavelet transform and FFT. A learning model based on two-stream CNN composed of 1D-CNN and 2D-CNN is established, and the obtained wavelet time-frequency map and FFT spectrum are used as the input. Then, the trained features from the output of the connected layer are classified by the SVM. Compared with the OWF-1DCNN and OWF-2DCNN models, the time consumption of the OWF-TSCNN model is increased by 14.5%–26.6%, and the convergence speed of the network is decreased. However, its accuracy reaches 100% and 99.83% in the training set and test set, and the loss entropy and over-fitting rate are also greatly reduced. The feature extraction ability and generalization ability of the OWF-TSCNN model are increased, reaching 100% diagnosis accuracy on different defect types and defect degrees, which is more suitable for defect diagnosis of the gear-shaft-bearing system.