Accurately detecting student classroom behaviors in classroom videos is beneficial for analyzing students’ classroom performance and consequently enhancing teaching effectiveness. To address challenges such as object density, occlusion, and multi-scale scenarios in classroom video images, this paper introduces an improved YOLOv8 classroom detection model. Firstly, by combining modules from the Res2Net and YOLOv8 network models, a novel C2f_Res2block module is proposed. This module, along with MHSA and EMA, is integrated into the YOLOv8 model. Experimental results on a classroom detection dataset demonstrate that the improved model in this paper exhibits better detection performance compared to the original YOLOv8, with an average precision (mAP@0.5) increase of 4.2%.