Objective: A motor imagery-based brain-computer interface (MI-BCI) translates spontaneous movement intention from the brain to outside devices. MI-BCI systems based on a single modality have been widely researched in recent decades. Lately, along with the development of neuroimaging methods, multimodal MI-BCI studies that use multiple neural signals have been proposed, which are promising for enhancing the decoding accuracy of MI-BCI. Multimodal MI data contain rich common and complementary information. Effective feature representations are helpful to promote the performance of classification tasks. Thus, it is very important to explore and extract features with higher separability and robustness from the rich information in multimodal data.
Approach: In this study, a five-class motor imagery experiment was designed. Electroencephalography and functional near infrared spectroscopy data were collected simultaneously. A multimodal MI decoding neural network was proposed. In this network, to enhance the feature representation, the heterogeneous data of different modalities in the spatial dimension were aligned through the proposed spatial alignment losses. Also, the multimodal features were aligned and fused in the temporal dimension by an attention-based modality fusion module.
Main results and Significance: The collected dataset was analyzed from temporal, spatial and frequency perspectives. The results showed that the multimodal data contain visually separable motor imagery patterns. The experimental results show that the proposed decoding method achieved the highest decoding accuracy among the compared methods on the self-collected dataset and a public dataset. Ablation results show that each part of the proposed method is effective. Compared with single-modality decoding, the proposed method obtained 4.6% higher decoding accuracy on the self-collected dataset. This indicates that the proposed method can improve the performance of multimodal MI decoding. This study provides a new approach for capturing the rich information in multimodal MI data and enhancing multimodal MI-BCI decoding accuracy.