In order for computers to understand people’s gestures through cameras and react accordingly, through in-depth research, gesture recognition technology in natural human-computer interaction is proposed. Combining natural human-computer interaction technology with music performance, using computer vision-based gestures, music is played in a virtual environment. Experiments show that the virtual piano has 14 piano keys. For the realization of piano performance, it is each piano key; once the piano key is greater than the set value, just call m_Wave.Load() to make a sound. According to the CWave class in the object-oriented MFC class library under VC++, create an object m_Wave of class CWave. Then, according to the m_Wave.Load() function of the CWave class, the connection of the sound is realized. It successfully solves the troubles of music lovers, enriches people’s spiritual life, and has certain practicability and scalability.