This study explores a university music teaching system enhanced by auditory perception technology. It delves into the intricacies of auditory perception technology and its integration with multimodal music education, highlighting the potential applications in university settings. Using short Fourier transform and wavelet transform techniques, the system computes Mel frequency cepstrum coefficients (MFCCs) and first-order differential dynamic music characteristics. These are then utilized to construct the multimodal teaching framework through computer programming languages. The multimodal music teaching system was tested and analyzed using data analysis software. The results showed that the experimental group and the control group produced significant differences (P<0.05) in the four aspects of fluency (0.005), flexibility (0.003), originality (0.001), and the total score of singing skills (0.004) of music singing skills. This study not only enriches theoretical research on multimodal teaching innovations in music but also promotes the development of university music education.