In online Civic and Political Education classroom teaching activities, students as the main body of learning activities, and their classroom behaviour is a direct reflection of the teaching effect. In this paper, we take the teaching video of the online Civic and Political Education course as the data source, annotate and preprocess the students’ classroom behaviour data, and enhance the classroom behaviour data through spatial coordinate transformation and grey scale interpolation. The traditional posture recognition backbone network is improved by a joint linear inverse residual structure and lightweight attention model, and the OpenPose algorithm is combined to extract the skeletal key point information of students’ classroom learning behaviours. The YOLOv5 network is then used as the backbone network, and a feature pyramid is introduced to deliver the image semantic information from top to bottom, and a path aggregation network are combined to deliver the localisation information from bottom to top to achieve feature fusion at different levels. Finally, the hybrid attention mechanism is combined to further enhance the feature extraction and recognition of students’ classroom behavior. The average accuracy of the OpenPose algorithm in locating skeletal key points for different types of classroom behaviors is 98.03%, which is 13.84% higher than that of the AlphaPose algorithm. The average accuracy of the YOLOv5-MA model in recognizing students’ classroom behaviors is 5.28 percentage points higher than that of the CNN-LSTM model. The model is 5.28 percentage points higher, the learning motivation of 60-120 seconds students has increased, and their positive behavior rate is between 75% and 85%. Online Civic Education needs to focus on students’ positive classroom behavior in order to better provide for the enhancement of students’ ideological literacy level.