As a major daily task for the popularization of artificial intelligence technology, more and more attention has been paid to the scientific research of mental state electroencephalogram (EEG) in recent years. To retain the spatial information of EEG signals and fully mine the EEG timing-related information, this paper proposes a novel EEG emotion recognition method. First, to obtain the frequency, spatial, and temporal information of multichannel EEG signals more comprehensively, we choose the multidimensional feature structure as the input of the artificial neural network. Then, a neural network model based on depthwise separable convolution is proposed, extracting the input structure’s frequency and spatial features. The network can effectively reduce the computational parameters. Finally, we modeled using the ordered neuronal long short-term memory (ON-LSTM) network, which can automatically learn hierarchical information to extract deep emotional features hidden in EEG time series. The experimental results show that the proposed model can reasonably learn the correlation and temporal dimension information content between EEG multi-channel and improve emotion classification performance. We performed the experimental validation of this paper in two publicly available EEG emotional datasets. In the experiments on the DEAP dataset (a dataset for emotion analysis using EEG, physiological, and video signals), the mean accuracy of emotion recognition for arousal and valence is 95.02% and 94.61%, respectively. In the experiments on the SEED dataset (a dataset collection for various purposes using EEG signals), the average accuracy of emotion recognition is 95.49%.