In this paper, we propose a multimodal pedestrian detection algorithm based on heuristic optimization and deep learning model, which is designed to cope with the challenges of multi-scale, partial occlusion, and environmental interference. The algorithm combines the advantages of traditional feature extraction methods and deep neural networks to improve the accuracy of pedestrian detection.In the heuristic and deep learning multimodal model, first, a cross-scale feature fusion module is designed to improve the detection performance of multiscale and partially occluded objects, including classical features such as appearance and pose as well as some deep features in vector format. The module fuses global features from different residual layers and multi-scale local area features, which improves the multi-scale feature fusion capability of the model and expands the feature acceptance domain of the backbone network. Then, Euclidean distance (ED) and location constrained linear coding (LLC) are introduced to complete the matching problem of the target, decoupling the input features from the channel dimension, which can be used for classification and localization. Finally, according to the error of feature matching, the IOU filtering module (IFM) is applied to refine the target state and filter out some invalid candidate targets, respectively,to learn the real regression parameters, thus simplifying the network structure and improving the model generalization ability. The experimental results show that compared with other pedestrian detection algorithms, the proposed algorithm is more accurate in detecting pedestrian targets in complex environments, and the Ran-1 is improved by 9.2% compared with the single-stage algorithm.