Nowadays, pedestrian detection is widely used in fields such as driving assistance and video surveillance with the progression of technology. However, although the research of single-modal visible pedestrian detection has been very mature, it is still not enough to meet the demand of pedestrian detection at all times. Thus, a multi-spectral pedestrian detection method via image fusion and convolutional neural networks is proposed in this paper. The infrared intensity distribution and visible appearance features are retained with a total variation model based on local structure transfer, and pedestrian detection is realized with the multi-spectral fusion results and the target detection network YOLOv3. The detection performance of the proposed method is evaluated and compared with the detection methods based on the other four pixel-level fusion algorithms and two fusion network architectures. The results attest that our method has superior detection performance, which can detect pedestrian targets robustly even in the case of harsh illumination conditions and cluttered backgrounds.