Power quality disturbance (PQD) is an important problem affecting the safe and stable operation of power system. Traditional single modal methods not only have a large number of parameters, but also usually focus on only one type of feature, resulting in incomplete information about the extracted features, and it is difficult to identify complex and diverse PQD types in modern power systems. In this regard, this paper proposes a multi-modal parallel feature extraction and classification model. The model pays attention to both temporal and spatial features of PQD, which effectively improves classification accuracy. And a lightweight approach is adopted to reduce the number of parameters of the model. The model uses Long Short Term Memory Neural Network (LSTM) to extract the temporal features of one-dimensional temporal modes of PQD. At the same time, a lightweight residual network (LResNet) is designed to extract the spatial features of the two-dimensional image modality of PQD. Then, the two types of features are fused into multi-modal spatio-temporal features (MSTF). Finally, MSTF is input to a Support Vector Machine (SVM) for classification. Simulation results of 20 PQD signals show that the classification accuracy of the multi-modal model proposed in this paper reaches 99.94%, and the parameter quantity is only 0.08 MB. Compared with ResNet18, the accuracy of the proposed method has been improved by 2.55% and the number of parameters has been reduced by 99.25%.