The purpose is to study the interactive teaching mode of human action recognition technology in music and dance teaching under computer vision. The human action detection and recognition system based on a three-dimensional (3D) convolutional neural network (CNN) is established. Then, a human action recognition model based on the dual channel is proposed on the basis of CNN, and the visual attention mechanism using the interframe differential channel is introduced into the model. Through experiments, the performance of the system in the process of human dance image recognition based on the Kungliga Tekniska Högskolan (KTH) dataset is verified. The results show that the dual-channel 3D CNN human action recognition system can achieve high accuracy in the first few rounds of training after the frame difference channel is added, the error can be reduced quickly, and the convergence can start quickly; the recognition accuracy of the system on KTH dataset is 96.6%, which is higher than that of other methods; for
3
×
3
×
3
basic convolution kernel, the best performance of the classification network can be obtained by pushing forward 0.0091 seconds in the calculation. Thereby, the dual-channel 3D CNN recognition system has good human action recognition accuracy in the dance interactive teaching mode of music teaching.