2023
DOI: 10.3390/s23208385
|View full text |Cite
|
Sign up to set email alerts
|

Student Behavior Detection in the Classroom Based on Improved YOLOv8

Haiwei Chen,
Guohui Zhou,
Huixin Jiang

Abstract: Accurately detecting student classroom behaviors in classroom videos is beneficial for analyzing students’ classroom performance and consequently enhancing teaching effectiveness. To address challenges such as object density, occlusion, and multi-scale scenarios in classroom video images, this paper introduces an improved YOLOv8 classroom detection model. Firstly, by combining modules from the Res2Net and YOLOv8 network models, a novel C2f_Res2block module is proposed. This module, along with MHSA and EMA, is … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 22 publications
(6 citation statements)
references
References 29 publications
0
6
0
Order By: Relevance
“…Leveraging the simple and lightweight CSPDarkNet-53 network as the basis for the backbone network, YOLOv8 enables the achievement of rapid real-time target detection, particularly suited for scenarios necessitating efficient processing of numerous images. The backbone network comprises five ConvModule modules and four CSPLayer_2Conv modules, each housing multiple CNN layers that excel at capturing local information [33]. However, unlike CNN, the Transformer is not constrained by local interactions, due to its self-attention mechanism allowing parallel computation.…”
Section: Proposed Methodologymentioning
confidence: 99%
“…Leveraging the simple and lightweight CSPDarkNet-53 network as the basis for the backbone network, YOLOv8 enables the achievement of rapid real-time target detection, particularly suited for scenarios necessitating efficient processing of numerous images. The backbone network comprises five ConvModule modules and four CSPLayer_2Conv modules, each housing multiple CNN layers that excel at capturing local information [33]. However, unlike CNN, the Transformer is not constrained by local interactions, due to its self-attention mechanism allowing parallel computation.…”
Section: Proposed Methodologymentioning
confidence: 99%
“…Through prior work, we can accurately identify and frame targets within the original image, leading to the critical next step for computing target coordinate information [71]. This study initially collected target image data in various scenarios and periods [67]. Subsequently, the EasyData data-processing platform was utilized to categorize and label the original images, ultimately yielding 12,004 labeled images.…”
Section: Target Coordinate Information Solutionmentioning
confidence: 99%
“…At the same time, this part is carried out from the detector to further predict the category and position of the object. Common ways to generate high-resolution features are Perceptual GANs and BFFBB GANs (Better to Follow, Follow to Be Better Generative Adversarial Networks) [20,21]. Its core ideological history takes the input image shrunk by two times and the features extracted through feature extraction as the low-resolution feature and the features extracted from the original image as the corresponding real high-resolution feature; the generator based on a low false high-resolution feature is generated by the resolution feature.…”
Section: Related Workmentioning
confidence: 99%