Due to the COVID-19 crisis, the education sector has been shifted to a virtual environment. Monitoring the engagement level and providing regular feedback during e-classes is one of the major concerns, as this facility lacks in the e-learning environment due to no physical observation of the teacher. According to present study, an engagement detection system to ensure that the students get immediate feedback during e-Learning. Our proposed engagement system analyses the student's behaviour throughout the e-Learning session. The proposed novel approach evaluates three modalities based on the student's behaviour, such as facial expression, eye blink count, and head movement, from the live video streams to predict student engagement in e-learning. The proposed system is implemented based on deep-learning approaches such as VGG-19 and ResNet-50 for facial emotion recognition and the facial landmark approach for eye-blinking and head movement detection. The results from different modalities (for which the algorithms are proposed) are combined to determine the EI (engagement index). Based on EI value, an engaged or disengaged state is predicted. The present study suggests that the proposed facial cues-based multimodal system accurately determines student engagement in real time. The experimental research achieved an accuracy of 92.58% and showed that the proposed engagement detection approach significantly outperforms the existing approaches.