CNN has demonstrated remarkable performance in EEG signal detection, yet it still faces limitations in terms of global perception. Additionally, due to individual differences in EEG signals, the generalization ability of epilepsy detection models is week. To address this issue, this paper presents a cross-patient epilepsy detection method utilizing a multi-head self-attention mechanism. This method first utilizes Short-Time Fourier Transform (STFT) to transform the original EEG signals into time-frequency features, then models local information using Convolutional Neural Network (CNN), subsequently captures global dependency relationships between features using the multi-head self-attention mechanism of Transformer, and finally performs epilepsy detection using these features. Meanwhile, this model employs a light multi-head attention mechanism module with an alternating structure, which can comprehensively extract multi-scale features while significantly reducing computational costs. Experimental results on the CHB-MIT dataset show that the proposed model achieves accuracy, sensitivity, specificity, F1 score, and AUC of 92.89%, 96.17%, 92.99%, 94.41%, and 96.77%, respectively. Compared to the existing methods, the method proposed in this paper obtains better performance along with better generalization.