Monitoring classrooms using cameras is a non-invasive approach of digitizing students' behaviour. Understanding students' attention span and what type of behaviours may indicate a lack of attention is fundamental for understanding and consequently improving the dynamics of a lecture. Recent studies show useful information regarding classrooms and their students' behaviour throughout the lecture. In this paper we start by presenting an overview about the state of the art on this topic, presenting what we consider to be the most robust and efficient Computer Vision techniques for monitoring classrooms. After the analysis of relevant state of the art, we propose an agent that is theoretically capable of tracking the students' attention and output that data. The main goal of this paper is to contribute to the development of an autonomous agent able to provide information to both teachers and students and we present preliminary results on this topic. We believe this autonomous agent features the best solution for monitoring classrooms since it uses the most suited state of the art approaches for each individual role.