In this paper we address the problem of recognizing student engagement in prosocial games by exploiting engagement cues from different input modalities. Since engagement is a multifaceted phenomenon with different dimensions, i.e., behavioral, cognitive and affective, we propose the modeling of student engagement using real-time data from both the students and the game. More specifically, we apply body motion and facial expression analysis to identify the affective state of students, while we extract features related to their cognitive and behavioral engagement based on the analysis of their interaction with the game. For the automatic recognition of engagement, we adopt a machine learning approach based on artificial neural networks, while for the annotation of the engagement data, we introduce a novel approach based on the use of games with different degrees of challenge in conjunction with a retrospective self-reporting method. To evaluate the proposed methodology, we conducted real-life experiments in four classes, in three primary schools, with 72 students and 144 gameplay recordings in total. Experimental results show the great potential of the proposed methodology, which improves the classification accuracy of the three distinct dimensions with a detection rate of 85%. A detailed analysis of the role of each component of the Game Engagement Questionnaire (GEQ), i.e., immersion, presence, flow and absorption, in the classification process is also presented in this paper.