Students' disengagement problem has become critical in the modern scenario due to various distractions and lack of student-teacher interactions. This problem is exacerbated with large offline classrooms, where it becomes challenging for teachers to monitor students' engagement and maintain the right-level of interactions. Traditional ways of monitoring students' engagement rely on self-reporting or using physical devices, which have limitations for offline classroom use. Student's academic affective states (e.g., moods and emotions) analysis has potential for creating intelligent classrooms, which can autonomously monitor and analyse students' engagement and behaviours in real-time. In recent literature, a few computer vision based methods have been proposed, but they either work only in the e-learning domain or have limitations in real-time processing and scalability for large offline classes. This paper presents a real-time system for student group engagement monitoring by analysing their facial expressions and recognizing academic affective states: 'boredom,' 'confuse,' 'focus,' 'frustrated,' 'yawning,' and 'sleepy,' which are pertinent in the learning environment. The methodology includes certain pre-processing steps like face detection, a convolutional neural network (CNN) based facial expression recognition model, and post-processing steps like frame-wise group engagement estimation. For training the CNN model, we created a dataset of the aforementioned facial expressions from classroom lecture videos and added related samples from three publicly available datasets, BAUM-1, DAiSEE, and YawDD, to generalize the model predictions. The trained model has achieved train and test accuracy of 78.70% and 76.90%, respectively. The proposed methodology gave promising results when compared with self-reported engagement levels by students.