Music education, as a quality training education, plays an important role in promoting the all-round development of students. Music evaluation is often highly subjective. However, the objective quantification can be achieved to a certain extent through the construction of music education and teaching quality evaluation system. In this study, CNN algorithm is applied to the evaluation model of music education teaching. Compared with the traditional manual feature extraction method, CNN algorithm can automatically capture the target depth features. In the experiment, the effectiveness of convolutional neural network is evaluated by means of model comparison and analysis. Taking GTZAN database as the experimental training sample, 2000 Mel spectra, 1500 training sets and 500 test sets were obtained, and the relevant parameters were initialised. The research results show that the dense network model has the highest accuracy rate in the training set in the identification and analysis of the relevant features of Mel spectrum. The model’s value is about 0.917, which is 0.024, 0.041, 0.086 and 0.098 higher than Res Net model, Perception Net model, VGG Net model and Alex Net model, respectively. Meanwhile, its loss curve effect, [Formula: see text]-value and AUC value reached the optimal level in all models, with the highest [Formula: see text]-value and AUC value reaching 0.926 and 0.922, respectively. In addition, in the test set, the average accuracy of this model is also better than other models. The comprehensive analysis shows that the model performs well in the evaluation system, which plays an important role in the evaluation of music education and teaching quality.