The realization of a novel human gesture recognition algorithm is essential to enable the effective collision avoidance of autonomous vehicles. Compared to visible spectrum cameras, the use of infrared imaging can enable more robust human gesture recognition in a complex environment. However, gesture recognition in infrared images has not been extensively investigated. In this work, we propose a model to detect human gestures, based on the improved YOLO-V3 network involving a saliency map as the second input channel to enhance the reuse of features and improve the network performance. Three DenseNet blocks are added before the residual components in the YOLO-V3 network to enhance the convolutional feature propagation. The saliency maps are obtained by multiscale superpixel segmentation, superpixel block clustering and cellular automata saliency detection. The obtained five scale saliency maps are fused using a Bayesian based fusion algorithm, and the final saliency image is generated. The infrared images composed of 4 main gesture classes are collected, each of which contains several approximated gestures in morphological terms. The training and testing datasets are generated, including original and augmented infrared images with a resolution of 640 × 480. The experimental results show that the proposed approach can enable real time human gesture detection for autonomous vehicles, with an average detection accuracy of 86.2%. INDEX TERMS Human gesture recognition, autonomous vehicles, deep learning approach, infrared images, saliency maps.