Wi-Fi wireless sensing technology has become a research hotspot in the field of perception, which can realize the intelligent perception of human activities and the surrounding environment. Aiming at the problem that the existing wireless sensing models have a large number of parameters, which makes it difficult to sense in real-time in scenarios with limited computing power, such as mobile devices, a lightweight feature extraction module based on Depthwise Separable Convolution (DSC) mixed with Stacked Gate Recur-rent Unit (SGRU) is proposed as a recognition model. The model first captures the spatial features of human gestures using DSC and keeps the temporal characteristics of the features unchanged, and then learns the spatio-temporal characteristics of the gestures using the SGRU network. The performance of the model is validated using the open source dataset Widar. The results show that the proposed DSC-SGRU model has only 236.891 K parameters with an accuracy of 77.6\%. Compared with existing gesture recognition models, DSC-SGRU greatly reduces the number of parameters of the model with approximate performance.