Expectations by computer vision technology to change traditional teaching model has become the research focus of many scholars, but due to lack of specific training data, and the classroom students intensive distribution, variety, easy to keep out each other, the existing multiple object detection and object tracking technique is applied to the poor performance of the teaching scene. For the above problems, in the stage of multi object detection, considering that the head can not only represent the multi-pose student object, but also is not easy to be completely blocked, this paper established a small dataset of teaching scenes with the head as the detection object to train and improve the accuracy of the Faster R-CNN detector. In the multi object tracking stage, the Kalman filter and Hungarian algorithm are used for tracking, and then the unmatched trajectory is remapped to the depth feature map based on the strong feature extraction ability of Faster R-CNN backbone neural network. Then, we correct the unmatched trajectory by fusing the depth feature information and historical trajectory position information of the object, and calculate the trajectory similarity, which improves the problems of identity switching and trajectory interruption caused by the change of object attitude and occlusion. The research of this paper is used in real teaching scenes, which can assist teachers to track students' status and the formed student object trajectory sequence, and can provide data input for subsequent classroom behavior analysis.