Objective: emotion recognition on the basis of electroencephalography (EEG) signals has received a significant amount of attention in the areas of cognitive science and human-computer interaction (HCI). However, most existing studies either focus on one-dimensional EEG data ignoring the relationship between channels, or only extract time-frequency features while not involving spatial features. Approach: we develop spatial-temporal features based EEG emotion recognition using graph convolution network (GCN) and long short-term memory (LSTM) named ERGL. Firstly, the one dimensional EEG vector is converted into a two-dimensional mesh matrix, so that the matrix configuration corresponds to the distribution of brain regions at EEG electrode locations, thus to represent the spatial correlation between multiple adjacent channels in a better way. Secondly, the GCN and LSTM are employed together to extract spatial-temporal features, and GCN is used to extract spatial features, while LSTM units are applied to extract temporal features. Finally, the softmax layer is applied to emotion classification. Main results: extensive experiments are conducted on the DEAP and SEED datasets. The classification results of accuracy, precision and F-score for valence and arousal dimensions on DEAP achieved 90.67% and 90.33%, 92.38% and 91.72% and 91.34% and 90.86%, respectively. The accuracy, precision and F-score of positive, neutral and negative classification reached 94.92%, 95.34% and 94.17%, respectively on SEED dataset. Significance: the above results demonstrate the ERGL method is encouraging in comparison to the state-of-the-art recognition researches.