Facial emotion expressions are among the most potent, natural, and powerful means of human communication. Due to the COVID-19 pandemic, educational institutions worldwide are forced to switch rapidly to remote and online learning. Students are currently in an emergency state and must adapt to various and readily accessible learning methods, such as mobile learning applications or an e-learning system. A systematic literature review (SLR) is conducted to extract and synthesize information such as the emotion classifier used in the facial expression recognition (FER) system, the dataset used, the preprocessing technique applied, the feature extraction approach used, and the strength and limitation of the previous studies. Based on the search criteria, 701 publications were initially retrieved from five different digital databases, of which 48 studies have been chosen as primary studies for further analysis. Based on the findings of this study, the deep learning approach is the most frequently adopted approach in classifying student emotions during online learning. FER-2013 is the most commonly used FER dataset in FER studies, while DAiSEE is the most used academic emotion dataset. Moreover, support vector machine (SVM) is the conventional learning emotion classifier that is widely used in the FER systems, while convolutional neural network (CNN) is the most frequently used deep learning classifier. Next, it was found that the number of real-time FER systems is less than that of non-real-time FER systems. Finally, the top-1 accuracy of 94.6% was achieved by the long-term recurrent convolutional network on the academic emotion dataset, and the limitation is that it has low illumination and a lack of frontal pose.