The combination of gesture recognition and aerospace exploration robots can realize the efficient non-contact control of the robots. In the harsh aerospace environment, the captured gesture images are usually blurred and damaged inevitably. The motion blurred images not only cause part of the transmitted information to be lost, but also affect the effect of neural network training in the later stage. To improve the speed and accuracy of motion blurred gestures recognition, the algorithm of YOLOv4 (You Only Look Once, vision 4) is studied from the two aspects of motion blurred image processing and model optimization. The DeblurGanv2 is employed to remove the motion blur of the gestures in YOLOv4 network input pictures. In terms of model structure, the K-means++ algorithm is used to cluster the priori boxes for obtaining the more appropriate size parameters of the priori boxes. The CBAM attention mechanism and SPP (spatial pyramid pooling layer) structure are added to YOLOv4 model to improve the efficiency of network learning. The dataset for network training is designed for the human–computer interaction in the aerospace space. To reduce the redundant features of the captured images and enhance the effect of model training, the Wiener filter and bilateral filter are superimposed on the blurred images in the dataset to simply remove the motion blur. The augmentation of the model is executed by imitating different environments. A YOLOv4-gesture model is built, which collaborates with K-means++ algorithm, the CBAM and SPP mechanism. A DeblurGanv2 model is built to process the input images of the YOLOv4 target recognition. The YOLOv4-motion-blur-gesture model is composed of the YOLOv4-gesture and the DeblurGanv2. The augmented and enhanced gesture data set is used to simulate the model training. The experimental results demonstrate that the YOLOv4-motion-blur-gesture model has relatively better performance. The proposed model has the high inclusiveness and accuracy recognition effect in the real-time interaction of motion blur gestures, it improves the network training speed by 30%, the target detection accuracy by 10%, and the value of mAP by about 10%. The constructed YOLOv4-motion-blur-gesture model has a stable performance. It can not only meet the real-time human–computer interaction in aerospace space under real-time complex conditions, but also can be applied to other application environments under complex backgrounds requiring real-time detection.