Thanks to the development of artificial intelligence algorithms, the event recognition of distributed optical fiber sensing systems has achieved high classification accuracy on many deep learning models. However, the large-scale samples required for the deep learning networks are difficult to collect for the optical fiber vibration sensing systems in actual scenarios. An overfitting problem due to insufficient data in the network training process will reduce the classification accuracy. In this paper, we propose a fused feature extract method suitable for the small dataset of Φ-OTDR systems. The high-dimensional features of signals in the frequency domain are extracted by a transfer learning method based on the VGGish framework. Combined with the characteristics of 12 different acquisition points in the space, the spatial distribution characteristics of the signal can be reflected. Fused with the spatial and temporal features, the features undergo a sample feature correction algorithm and are used in a SVM classifier for event recognition. Experimental results show that the VGGish, a pre-trained convolutional network for audio classification, can extract the knowledge features of Φ-OTDR vibration signals more efficiently. The recognition accuracy of six types of intrusion events can reach 95.0% through the corrected multi-domain features when only 960 samples are used as the training set. The accuracy is 17.7% higher than that of the single channel trained on VGGish without fine-tuning. Compared to other CNNs, such as ResNet, the feature extract method proposed can improve the accuracy by at least 4.9% on the same dataset.