Virtual Reality (VR) technology uses computers to simulate the real world comprehensively. VR has been widely used in college teaching and has a huge application prospect. To better apply computer-aided instruction technology in music teaching, a music teaching system based on VR technology is proposed. First, a virtual piano is developed using the HTC Vive kit and the Leap Motion sensor fixed on the helmet as the hardware platform, and using Unity3D, related SteamVR plug-ins, and Leap Motion plug-ins as software platforms. Then, a gesture recognition algorithm is proposed and implemented. Specifically, the Dual Channel Convolutional Neural Network (DCCNN) is adopted to collect the user’s gesture command data. The dual-size convolution kernel is applied to extract the feature information in the image and the gesture command in the video, and then the DCCNN recognizes it. After the spatial and temporal information is extracted, Red-Green-Blue (RGB) color pattern images and optical flow images are input into the DCCNN. The prediction results are merged to obtain the final recognition result. The experimental results reveal that the recognition accuracy of DCCNN for the Curwen gesture is as high as 96%, and the recognition accuracy varies with different convolution kernels. By comparison, it is found that the recognition effect of DCCNN is affected by the size of the convolution kernel. Combining convolution kernels of size 5×5 and 7×7 can improve the recognition accuracy to 98%. The research results of this study can be used for music teaching piano and other VR products, with extensive popularization and application value.