Understanding learners’ emotions can help optimize instruction sand further conduct effective learning interventions. Most existing studies on student emotion recognition are based on multiple manifestations of external behavior, which do not fully use physiological signals. In this context, on the one hand, a learning emotion EEG dataset (LE-EEG) is constructed, which captures physiological signals reflecting the emotions of boredom, neutrality, and engagement during learning; on the other hand, an EEG emotion classification network based on attention fusion (ECN-AF) is proposed. To be specific, on the basis of key frequency bands and channels selection, multi-channel band features are first extracted (using a multi-channel backbone network) and then fused (using attention units). In order to verify the performance, the proposed model is tested on an open-access dataset SEED (N = 15) and the self-collected dataset LE-EEG (N = 45), respectively. The experimental results using five-fold cross validation show the following: (i) on the SEED dataset, the highest accuracy of 96.45% is achieved by the proposed model, demonstrating a slight increase of 1.37% compared to the baseline models; and (ii) on the LE-EEG dataset, the highest accuracy of 95.87% is achieved, demonstrating a 21.49% increase compared to the baseline models.