With the continuous development of modern information technology, the combination of intelligent audio processing technology and vocal music teaching has gradually become a research hot spot. In this paper, we first build a vocal music teaching system based on music emotion and instrument recognition, optimize the support vector machine using the PSO algorithm, construct the music emotion recognition and instrument recognition method based on SVM, and control and optimize the vocal music teaching system through multi-objective proportional integral differentiation algorithm. Through the comparison experiments of different models of music emotion recognition and musical instrument recognition, the performance of music emotion recognition and musical instrument recognition of this paper’s model is explored. Then, the application effect analysis of the vocal music teaching system is carried out. The results show that the SVM model optimized by PSO has a more satisfactory effect on music emotion recognition, with a recognition accuracy 16.67% higher than the comparison model and an average adaptability of 70%–90%. In addition, this model has a higher instrument recognition rate of 18.17% and 7.45% compared to the other two models. After using the vocal teaching system, 63.04% of the students thought that it could promote learning, 47.83% of the student’s classroom interest increased, and more than 70% of the students were more satisfied with its functions. In this paper, the vocal teaching system can be applied to college vocal teaching to promote the improvement of the vocal teaching effect.