In recent years, the pedestrian detection technology of a single 2D image has been dramatically improved. When the scene becomes very crowded, the detection performance will deteriorate seriously and cannot meet the requirements of autonomous driving perception. With the introduction of the multi-view method, the task of pedestrian detection in crowded or fuzzy scenes has been significantly improved and has become a widely used method in autonomous driving. In this paper, we construct a double-branch feature fusion structure, the first branch adopts a lightweight structure, the second branch further extracts features and gets the feature map obtained from each layer. At the same time, the receptive field is enlarged by expanding convolution. To improve the speed of the model, the keypoint is used instead of the entire object for regression without an NMS post-processing operation. Meanwhile, the whole model can be learned from end to end. Even in the presence of many people, the method can still perform better on accuracy and speed. In the standard of Wildtrack and MultiviewX dataset, the accuracy and running speed both perform better than the state-of-the-art model, which has great practical significance in the autonomous driving field.