With the development of deep learning technology and people's demand for intelligent security, human-computer interaction, shopping guide and other technologies, computer vision technology for pedestrian identification shows great application value. In this paper, pedestrian identification method based on multi-scale feature learning in surveillance video images is studied. Firstly, the deep residual network ResNet and densely connected convolutional network DenseNet are introduced as baseline networks. A model is constructed based on hybrid hourglass network module, enhanced weighted feature pyramid fusion network module and post-processing module. The loss function is designed, which is unified with other traditional models, and the optimization objective of the loss function is respectively corresponding to three parts, namely, the prediction error of corresponding center point, the prediction error of offset and the prediction error of bounding box size. The experimental results verify the effectiveness of the proposed model.