Deep learning based on convolutional neural networks (CNN) has achieved success in brain-computer interfaces (BCIs) using scalp electroencephalography (EEG). However, the interpretation of the so-called 'black box' method and its application in stereo-electroencephalography (SEEG)-based BCIs remain largely unknown. Therefore, in this paper, an evaluation is performed on the decoding performance of deep learning methods on SEEG signals. Methods: Thirty epilepsy patients were recruited, and a paradigm including five hand and forearm motion types was designed. Six methods, including filter bank common spatial pattern (FBCSP) and five deep learning methods (EEGNet, shallow and deep CNN, ResNet, and a deep CNN variant named STSCNN), were used to classify the SEEG data. Various experiments were conducted to investigate the effect of windowing, model structure, and the decoding process of ResNet and STSCNN. Results: The average classification accuracy for EEGNet, FBCSP, shallow CNN, deep CNN, STSCNN, and ResNet were 35 ± 6.1%, 38 ± 4.9%, 60 ± 3.9%, 60 ± 3.3%, 61 ± 3.2%, and 63 ± 3.1% respectively. Further analysis of the proposed method demonstrated clear separability between different classes in the spectral domain. Conclusion: ResNet and STSCNN achieved the first-and second-highest decoding accuracy, respectively. The STSCNN demonstrated that an extra spatial convolution layer was beneficial, and the decoding process can be partially interpreted from spatial and spectral perspectives. Significance: This study is the first to investigate the performance of deep learning on SEEG signals.In addition, this paper demonstrated that the so-called 'black-box' method can be partially interpreted.Index Terms-stereo-electroencephalography (SEEG), brain-computer interface (BCI), forearm and hand motion, deep learning, convolutional neural networks (CNN)