In this paper, we consider a problem of an automatic analysis of the emotional state of students during online classes based on video surveillance data. This problem is actual in the field of e-learning. We propose a novel neural network model for recognition of students’ emotions based on video images of their faces and use it to construct an algorithm for classifying the individual and group emotions of students by video clips. At the first step, it performs detection of the faces and extracts their features followed by grouping the face of each student. To increase the accuracy, we propose to match students’ names selected with the aid of the algorithms of the text recognition. At the second step, specially learned efficient neural networks perform the extraction of emotional features of each selected person, their aggregation with the aid of statistical functions, and the subsequent classification. At the final step, it is possible to visualize fragments of the video lesson with the most pronounced emotions of the student. Our experiments with some datasets from EmotiW (Emotion Recognition in the Wild) show that the accuracy of the developed algorithms is comparable with their known analogous. However, when classifying emotions, the computational performance of these algorithms is higher.