Today, with the rapid development of computer technology such as Internet of things (IoT), wireless communications, edge computing, and data mining [1-18], various advanced multimedia technologies emerge one after another. Due to the "immersive" realism, Virtual Reality (VR) can bring a new experience to users in a more natural and realistic human-computer interaction [19-21]. Many kinds of multimedia applications based on VR technology have gradually become the hotspots of future cultural, art and entertainment markets, such as virtual shopping community, immersive virtual reality games, virtual landscape roaming and virtual art stage performances [22-24]. Among them, the multimedia human-computer interaction technology in the art scene needs to capture and recognize the human body motion in real time and accurately, in order to achieve better interaction effect and artistic sensory experience. In order to enable more natural and effective communication between people and computers, the motion recognition interactive system needs to be able to accurately identify various complex and varied human actions. As shown in Fig. 1, in the digital performance, to digitally preview the dance, first capture the action of the stage dancers. Then, as shown in Fig. 2, the dance behavior after the capture is digitally recognized and presented. Figure 3 shows the interaction of the identified actions in the VR scenario.