The human behavior datasets have the characteristics of complex background, diverse poses, partial occlusion, and diverse sizes. Firstly, this paper adopts YOLO v3 and YOLO v4 algorithms to detect human objects in videos, and qualitatively analyzes and compares detection performance of two algorithms on UTI, UCF101, HMDB51 and CASIA datasets. Then, this paper proposed an improved YOLO v4 algorithm since the vanilla YOLO v4 has incomplete human detection in specific video frames. Specifically, the improved YOLO v4 introduces the Ghost module in the CBM module to further reduce the number of parameters. Lateral connection is added in the CSP module to improve the feature representation capability of the network. Furthermore, we also substitute MaxPool with SoftPool in the primary SPP module, which not only avoids the feature loss, but also provides a regularization effect for the network, thus improving the generalization ability of the network. Finally, this paper qualitatively compares the detection effects of the improved YOLO v4 and primary YOLO v4 algorithm on specific datasets. The experimental results show that the improved YOLO v4 can solve the problem of complex targets in human detection tasks effectively, and further improve the detection speed.