After the COVID-19 pandemic, no one refutes the importance of smart online learning systems in the educational process. Measuring student engagement is a crucial step towards smart online learning systems. A smart online learning system can automatically adapt to learners' emotions and provide feedback about their motivations. In the last few decades, online learning environments have generated tremendous interest among researchers in computer-based education. The challenge that researchers face is how to measure student engagement based on their emotions. There has been an increasing interest towards computer vision and camera-based solutions as technology that overcomes the limits of both human observations and expensive equipment used to measure student engagement. Several solutions have been proposed to measure student engagement, but few are behavior-based approaches. In response to these issues, in this paper, we propose a new automatic multimodal approach to measure student engagement levels in real time. Thus, to offer robust and accurate student engagement measures, we combine and analyze three modalities representing students' behaviors: emotions from facial expressions, keyboard keystrokes, and mouse movements. Such a solution operates in real time while providing the exact level of engagement and using the least expensive equipment possible. We validate the proposed multimodal approach through three main experiments, namely single, dual, and multimodal research modalities in novel engagement datasets. In fact, we build new and realistic student engagement datasets to validate our contributions. We record the highest accuracy value (95.23%) for the multimodal approach and the lowest value of "0.04" for mean square error (MSE).