The article adopts the correlation function method to extract the fundamental frequency features of vocal speech signals and processes this information through the cepstrum method. Then, the audio signals are corrected by linear smoothing to match the features of the students’ audio signals with the features of the musical score to determine the accuracy of the vocal music. The study also includes quantization, frame-splitting, and windowing preprocessing of the audio signals, and constructing an unwritten audio signal feature set based on correlation feature selection. The article also proposes an innovative metaphorical method for improving the teaching mode of vocal music courses in colleges and universities. It analyzes the auxiliary role of information technology in teaching vocal music courses by combining simulation experiments with the effects of practical application. The study results show that the students’ vocal singing fundamental frequency range is close to the standard, roughly 123 to 200, with a maximum error value of 24. With the assistance of information fusion technology, the average score increment of the students’ vocal singing reaches 6.088 points, proving this method’s effectiveness in improving the quality of verbal music teaching.