The RGB complementary metal-oxidesemiconductor (CMOS) sensor works within the visible light spectrum. Therefore it is very sensitive to environmental light conditions. On the contrary, a long-wave infrared (LWIR) sensor operating in 8-14 µm spectral band, functions independent of visible light.In this paper, we exploit both visual and thermal perception units for robust object detection purposes. After delicate synchronization and (cross-) labeling of the FLIR [1] dataset, this multi-modal perception data passes through a convolutional neural network (CNN) to detect three critical objects on the road, namely pedestrians, bicycles, and cars. After evaluation of RGB and infrared (thermal and infrared are often used interchangeably) sensors separately, various network structures are compared to fuse the data at the feature level effectively. Our RGB-thermal (RGBT) fusion network, which takes advantage of a novel entropy-block attention module (EBAM), outperforms the state-of-the-art network [2] by 10% with 82.9% mAP.
I. INTRODUCTIONA statistical projection of traffic fatalities in the United States for the first half of 2021 shows that an estimated 20,160 people died in motor vehicle traffic crashes. This represents an increase of about 18.4 percent as compared to 17,020 fatalities that were reported in the first half of 2020 [3]. Looking at the fatal accidents of 2019 based on the time, one can see that there are 1,000 more fatal accidents during the night-time compared to the day-time [4]. Given less average traffic during the night-time, the importance of visibility in dark is inevitable.The number of publications on RGB-IR sensor fusion for multi-spectral object detection in the automotive sector has increased within the past two years. However, the lack of data in this research area is still noticeable. There are two main sources of data, namely FLIR thermal dataset [1] and KAIST multi-spectral pedestrian detection benchmark [5], which provide a dataset containing IR and RGB pair images. FLIR mainly provides three classes car, pedestrian, and bicycle, whereas KAIST only contains pedestrians.Nevertheless, the FLIR dataset comes only with IR labels. That introduces the first challenge to researchers. Previously published papers [2], [6] and [7] have made various objections to the dataset. For instance, the usage of different 1 The authors are with the Elektronische Fahrwerksysteme GmbH,