PurposeAttention is one of the most important factors to affect the academic performance of students. Effectively analyzing students' attention in class can promote teachers' precise teaching and students' personalized learning. To intelligently analyze the students' attention in classroom from the first-person perspective, this paper proposes a fusion model based on gaze tracking and object detection. In particular, the proposed attention analysis model does not depend on any smart equipment.Design/methodology/approachGiven a first-person view video of students' learning, the authors first estimate the gazing point by using the deep space–time neural network. Second, single shot multi-box detector and fast segmentation convolutional neural network are comparatively adopted to accurately detect the objects in the video. Third, they predict the gazing objects by combining the results of gazing point estimation and object detection. Finally, the personalized attention of students is analyzed based on the predicted gazing objects and the measurable eye movement criteria.FindingsA large number of experiments are carried out on a public database and a new dataset that is built in a real classroom. The experimental results show that the proposed model not only can accurately track the students' gazing trajectory and effectively analyze the fluctuation of attention of the individual student and all students but also provide a valuable reference to evaluate the process of learning of students.Originality/valueThe contributions of this paper can be summarized as follows. The analysis of students' attention plays an important role in improving teaching quality and student achievement. However, there is little research on how to automatically and intelligently analyze students' attention. To alleviate this problem, this paper focuses on analyzing students' attention by gaze tracking and object detection in classroom teaching, which is significant for practical application in the field of education. The authors proposed an effectively intelligent fusion model based on the deep neural network, which mainly includes the gazing point module and the object detection module, to analyze students' attention in classroom teaching instead of relying on any smart wearable device. They introduce the attention mechanism into the gazing point module to improve the performance of gazing point detection and perform some comparison experiments on the public dataset to prove that the gazing point module can achieve better performance. They associate the eye movement criteria with visual gaze to get quantifiable objective data for students' attention analysis, which can provide a valuable basis to evaluate the learning process of students, provide useful learning information of students for both parents and teachers and support the development of individualized teaching. They built a new database that contains the first-person view videos of 11 subjects in a real classroom and employ it to evaluate the effectiveness and feasibility of the proposed model.