Computer vision has shown great accomplishments in a wide variety of classification, segmentation and object recognition tasks, but tends to encounter more difficulties when tasks require more contextual assessment. Measuring the engagement of students is an example of such a complex task, as it requires a strong interpretative component. This research describes a methodology to measure students’ engagement, taking both an individual (student-level) and a collective (classroom) approach. Results show that students’ individual behaviour, such as note-taking or hand-raising, is challenging to recognise, and does not correlate with students’ self-reported engagement. Interestingly, students’ collective behaviour can be quantified in a more generic way using measures for students’ symmetry, reaction times and eye-gaze intersections. Nonetheless, the evidence for a connection between these collective measures and engagement is rather weak. Although this study does not succeed in providing a proxy of students’ self-reported engagement, our approach sheds light on the needs for future research. More concretely, we suggest that not only the behavioural, but also the emotional and cognitive component of engagement should be captured.