Multispectral Object Detection for Autonomous Vehicles

Karasawa, Takumi; Watanabe, Kohei; Ha, Qishen; Tejero-de-Pablos, Antonio; Ushiku, Yoshitaka; Harada, Tatsuya

doi:10.1145/3126686.3126727

Cited by 124 publications

(83 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many works [61], [91], [99]- [101], [106], [108], [109], [111], [117]- [120], [123] deal with the 2D object detection problem on the front-view 2D image plane. Compared to 2D detection, 3D detection is more challenging since the object's distance to the ego-vehicle needs to be estimated.…”

Section: ) 2d or 3d Detectionmentioning

confidence: 99%

Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges

Feng

Haase-Schütz

Rosenbaum

et al. 2021

IEEE Trans. Intell. Transport. Syst.

885

338

View full text Add to dashboard Cite

Recent advancements in perception for autonomous driving are driven by deep learning. In order to achieve robust and accurate scene understanding, autonomous vehicles are usually equipped with different sensors (e.g. cameras, LiDARs, Radars), and multiple sensing modalities can be fused to exploit their complementary properties. In this context, many methods have been proposed for deep multi-modal perception problems. However, there is no general guideline for network architecture design, and questions of "what to fuse", "when to fuse", and "how to fuse" remain open. This review paper attempts to systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving. To this end, we first provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection and semantic segmentation in autonomous driving research. We then summarize the fusion methodologies and discuss challenges and open questions. In the appendix, we provide tables that summarize topics and methods. We also provide an interactive online platform to navigate each reference: https://boschresearch.github.io/multimodalperception/. 0.99 0.8 0.98 0.99 0.96 0.96 0.94 Vehicle Person Road sign Traffic light LiDAR Points Map Radar Points RGB Image

show abstract

Section: ) 2d or 3d Detectionmentioning

confidence: 99%

Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges

Feng

Haase-Schütz

Rosenbaum

et al. 2021

IEEE Trans. Intell. Transport. Syst.

885

338

View full text Add to dashboard Cite

show abstract

“…Considering non-optimal weather conditions, Pfeuffer and Dietmayer [56] investigated a robust fusion approach for foggy scene segmentation. Besides the image segmentation task mentioned above, there are many other scene understanding tasks that benefit from multimodal fusion, such as object detection [13,87,88], human detection [89,14,90,91], salient object detection [92,93], trip hazard detection [94] and object tracking [69]. Especially for autonomous systems, LiDAR is always employed to provide highly accurate threedimensional point cloud information [95,96].…”

Section: Applications For Scene Understandingmentioning

confidence: 99%

Deep multimodal fusion for semantic image segmentation: A survey

Zhang

Sidibé

Morel

et al. 2021

Image and Vision Computing

144

View full text Add to dashboard Cite

Recent advances in deep learning have shown excellent performance in various scene understanding tasks. However, in some complex environments or under challenging conditions, it is necessary to employ multiple modalities that provide complementary information on the same scene. A variety of studies have demonstrated that deep multimodal fusion for semantic image segmentation achieves significant performance improvement. These fusion approaches take the benefits of multiple information sources and generate an optimal joint prediction automatically. This paper describes the essential background concepts of deep multimodal fusion and the relevant applications in computer vision. In particular, we provide a systematic survey of multimodal fusion methodologies, multimodal segmentation datasets, and quantitative evaluations on the benchmark datasets. Existing fusion methods are summarized according to a common taxonomy: early fusion, late fusion, and hybrid fusion. Based on their performance, we analyze the strengths and weaknesses of different fusion strategies. Current challenges and design choices are discussed, aiming to provide the reader with a comprehensive and heuristic view of deep multimodal image segmentation.

show abstract

“…Multispectral 23 : The UTokyo dataset contains a total of 7,512 images (3,740: during day and 3,772: during night), which are taken in a university environment at 1 fps using visible-band (RGB colour), Far Infrared (FIR), Mid Infrared (MIR), and Near Infrared (NIR) cameras (as specified within 23 ). In this work, we utilise only the Far Infrared images (FIR), taken by Nippon Avionics, InfReC R500, as a dataset for object detection (denoted as MultispectralFIR).…”

Section: Datasetsmentioning

confidence: 99%

“…22 Introducing deep learning to object detection within infrared-band (thermal) imagery is significantly hindered by by the absence of such annotated datasets of the same scale and variety. Comparatively the available datasets for infrared-band (thermal) imagery 23,24 are relatively small. In infrared-band (thermal) imagery the lack of such datasets, which is attributable to the lesser prevalence of this sensing modality in general, artificially restricts an equivalent level of CNN success for this spectral band.…”

Section: Introductionmentioning

confidence: 99%