In this paper, a quadratic convolution neural network (QCNN) using both audio and vibration signals is utilized for bearing fault diagnosis. Specifically, to make use of multi-modal information for bearing fault diagnosis, the audio and vibration signals are first fused together using a 1 × 1 convolution. Then, a quadratic convolution neural network is applied for the fusion feature extraction. Finally, a decision module is designed for fault classification. The proposed method utilizes the complementary information of audio and vibration signals, and is insensitive to noise. The experimental results show that the accuracy of the proposed method can achieve high accuracies for both single and multiple bearing fault diagnosis in the noisy situations. Moreover, the combination of two-modal data helps improve the performance under all conditions.