An enhanced fault diagnosis approach for rolling bearings with composite faults using an optimized Squeeze and Excitation ResNet (SE-ResNet) model is proposed. This method integrates grid search (GS), support vector regression (SVR), ensemble empirical mode decomposition (EEMD), and low-rank multimodal fusion (LMF) to effectively handle the signals of acoustic–vibration fusion. By combining these techniques, the aim is to improve the accuracy and reliability of rolling bearing fault diagnosis. Firstly, improved EEMD combined with GS-SVR and a window function is used for rolling bearing vibration signal decomposition. Singular value methods are used to filter and reconstruct the results. Secondly, Markov transition fields (MTFs) are used to encode vibration signals into 2D images. LMF is used for the fusion of vibration and sound signals. An improved Squeeze and Excitation ResNet50 network is proposed for feature identification and classification of rolling bearing composite fault data. Finally, the method undergoes rigorous testing and evaluation using rolling bearing data. The experimental outcomes demonstrate that, in comparison to traditional neural networks, the enhanced SE-ResNet, integrated with GS-SVR-EEMD and LMF, attains superior diagnostic accuracy. Additionally, the proposed approach can be effectively utilized for diagnosing rolling bearing composite faults.