Condition monitoring plays a very important role in equipment fault diagnosis technology. However, existing monitoring methods often collect equipment fault signals from a single dimension, resulting in a major lack of fault information. To improve the problem, we built a gearbox preset fault test bench and constructed a dual-sensor acquisition system to realize the multiple dimensions of vibration signal acquisition in the horizontal and vertical directions of the gearbox. At the same time, given the poor adaptability of most current signal preprocessing methods, the improved nonlinear adaptive inertial weight particle swarm optimization algorithm (NAPSO) and variational modal decomposition (VMD) are combined to optimize the key parameters in VMD with the maximum correlation kurtosis convolution (MCKD) as the fitness function. Further, after extracting fault features from the intrinsic mode functions (IMFS) decomposed by VMD, the single-layer sparse autoencoder network (SAE) and the double-layer stacked sparse autoencoder network (SSAE) with different structures are used to realize an effective fusion of multidimensional information and deep feature extraction. Finally, the hybrid fault diagnosis of gearboxes is realized by using the random forest algorithm (RF) as the classifier. The experimental results show that the accuracy of the method proposed in this paper can reach 96.0%, and the accuracy can be improved by 3.0% and 4.0%, respectively, when compared with a single horizontal or vertical sensor signal input.