Epilepsy is a common neurological disorder, and its diagnosis mainly relies on the analysis of electroencephalogram (EEG) signals. However, the raw EEG signals contain limited recognizable features, and in order to increase the recognizable features in the input of the network, the differential features of the signals, the amplitude spectrum and the phase spectrum in the frequency domain are extracted to form a two-dimensional feature vector. In order to solve the problem of recognizing multimodal features, a neural network model based on a multimodal dual-stream network is proposed, which uses a mixture of one-dimensional convolution, two-dimensional convolution and LSTM neural networks to extract the spatial features of the EEG two-dimensional vectors and the temporal features of the signals, respectively, and combines the advantages of the two networks, using the hybrid neural network to extract both the temporal and spatial features of the signals at the same time. In addition, a channel attention module was used to focus the model on features related to seizures. Finally, multiple sets of experiments were conducted on the Bonn and New Delhi data sets, and the highest accuracy rates of 99.69% and 97.5% were obtained on the test set, respectively, verifying the superiority of the proposed model in the task of epileptic seizure detection.