IntroductionIn recent years, with advancements in wearable devices and biosignal analysis technologies, sports performance analysis has become an increasingly popular research field, particularly due to the growing demand for real-time monitoring of athletes' conditions in sports training and competitive events. Traditional methods of sports performance analysis typically rely on video data or sensor data for motion recognition. However, unimodal data often fails to fully capture the neural state of athletes, leading to limitations in accuracy and real-time performance when dealing with complex movement patterns. Moreover, these methods struggle with multimodal data fusion, making it difficult to fully leverage the deep information from electroencephalogram (EEG) signals.MethodsTo address these challenges, this paper proposes a "Cerebral Transformer" model based on EEG signals and video data. By employing an adaptive attention mechanism and cross-modal fusion, the model effectively combines EEG signals and video streams to achieve precise recognition and analysis of athletes' movements. The model's effectiveness was validated through experiments on four datasets: SEED, DEAP, eSports Sensors, and MODA. The results show that the proposed model outperforms existing mainstream methods in terms of accuracy, recall, and F1 score, while also demonstrating high computational efficiency.Results and discussionThe significance of this study lies in providing a more comprehensive and efficient solution for sports performance analysis. Through cross-modal data fusion, it not only improves the accuracy of complex movement recognition but also provides technical support for monitoring athletes' neural states, offering important applications in sports training and medical rehabilitation.