Gesture recognition has always been a research hotspot in the field of human-computer interaction. Its purpose is to realize the natural interaction with the machine by recognizing the semantics expressed by gesture. In the process of gesture recognition, the occlusion of gesture is an inevitable problem. In the process of gesture recognition, some or even all of the gesture features will be lost due to the occlusion of the gesture, resulting in the wrong recognition or even unrecognizability of the gesture. Therefore, it is of great significance to study gesture recognition under occlusion. The single shot multibox detector (SSD) algorithm is analyzed, and the front-end network is compared. Mobilenets is selected as the front-end network, and the Mobilenets-SSD network is improved. In tensorflow environment, based on the improved network model, the self-occlusion gesture and object occluding gesture are trained in color map, depth map, and color and depth fusion respectively. The recognition models of self-occlusion gestures and object-occlusion gestures in color map, depth map, and color and depth fusion are obtained. And compare and analyze the learning rate, loss function, and average accuracy of various models obtained for occlusion gesture recognition.