To be able to better improve musical literacy and artistic aesthetics, this paper presents a study on the application of sentiment analysis for vocal music teaching under natural language processing. Firstly, for the training text that has been labeled with categories, a suitable classification model is built for the unlabeled test text using the model category prediction. The sum of the identified sentiment scores is calculated, and the different results are compared to set a reasonable threshold to determine the sentiment polarity. According to the mutual information statistic value of each feature item, the feature items with a mutual information statistic value higher than the threshold are retained. The weights of each feature term are normalized to the interval to which the range of weights is normalized, and the feature weights are obtained. The optimal classification surface is calculated by converting the Lagrangian optimization method into a problem of finding the pair, which makes the low-dimensional problem transformed into a high-dimensional space and combines the advantages of Bayes’ ability to automatically adjust the weight coefficients in training and the huge training capacity of neural networks to improve the flexibility and robustness of the training model. Feature extraction is performed by convolutional layers, and the dimensionality of the feature vector is reduced by the maximum pooling method to realize an emotional-semantic vocal teaching model. The results show that the proposed method can improve the flexibility and accuracy of the semantic analysis model, and also enhance the training robustness of the network, with the F1 value reaching 91.65%, which can accurately mine the application data of “one core and three integrations” vocal teaching model and improve the teaching model in universities.