With the wide application of video surveillance system, visual information has become the key research element of modern security technology. Computer vision related technology can be applied to the field of intelligent surveillance, so that computers can process video. People can use computers to understand video surveillance, directly get the number of people in an area, or get the distribution of people. This paper first analyzes the existing two types of target detection algorithms, and chooses Fast R-CNN algorithm as the research object of this paper. This paper combines the research method of background modeling with the research method of deep convolutional neural network based on statistical learning to fuse all the calibration boxes of pedestrian detection results. A pedestrian count evaluation method is proposed, and the pedestrian count results are smoothed and fused.