Fusing LIDAR and images for pedestrian detection using convolutional neural networks

Schlosser, Joel; Chow, Christopher K.; Kira, Zsolt

doi:10.1109/icra.2016.7487370

Cited by 117 publications

(69 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, in [141], the image and the depth map went through two separated CNN networks, and only the feature vectors from the last layer were concatenated to jointly carry out the final detection task. In [142], the point cloud was first converted into a three-channel HHA map (which contains Horizontal disparity, Height above ground, and Angle). The HHA and RGB (Red-Green-Blue color channel) images went through two different CNN networks as well but the author found that the fusion should be done at the early to middle layers of the CNN instead of the last layer.…”

Section: Fusionmentioning

confidence: 99%

Perception, Planning, Control, and Coordination for Autonomous Vehicles

et al. 2017

View full text Add to dashboard Cite

Autonomous vehicles are expected to play a key role in the future of urban transportation systems, as they offer potential for additional safety, increased productivity, greater accessibility, better road efficiency, and positive impact on the environment. Research in autonomous systems has seen dramatic advances in recent years, due to the increases in available computing power and reduced cost in sensing and computing technologies, resulting in maturing technological readiness level of fully autonomous vehicles. The objective of this paper is to provide a general overview of the recent developments in the realm of autonomous vehicle software systems. Fundamental components of autonomous vehicle software are reviewed, and recent developments in each area are discussed.

show abstract

Section: Fusionmentioning

confidence: 99%

Perception, Planning, Control, and Coordination for Autonomous Vehicles

et al. 2017

View full text Add to dashboard Cite

show abstract

“…Many works [61], [91], [99]- [101], [106], [108], [109], [111], [117]- [120], [123] deal with the 2D object detection problem on the front-view 2D image plane. Compared to 2D detection, 3D detection is more challenging since the object's distance to the ego-vehicle needs to be estimated.…”

Section: ) 2d or 3d Detectionmentioning

confidence: 99%

Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges

Feng

Haase-Schütz

Rosenbaum

et al. 2021

IEEE Trans. Intell. Transport. Syst.

892

338

View full text Add to dashboard Cite

Recent advancements in perception for autonomous driving are driven by deep learning. In order to achieve robust and accurate scene understanding, autonomous vehicles are usually equipped with different sensors (e.g. cameras, LiDARs, Radars), and multiple sensing modalities can be fused to exploit their complementary properties. In this context, many methods have been proposed for deep multi-modal perception problems. However, there is no general guideline for network architecture design, and questions of "what to fuse", "when to fuse", and "how to fuse" remain open. This review paper attempts to systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving. To this end, we first provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection and semantic segmentation in autonomous driving research. We then summarize the fusion methodologies and discuss challenges and open questions. In the appendix, we provide tables that summarize topics and methods. We also provide an interactive online platform to navigate each reference: https://boschresearch.github.io/multimodalperception/. 0.99 0.8 0.98 0.99 0.96 0.96 0.94 Vehicle Person Road sign Traffic light LiDAR Points Map Radar Points RGB Image

show abstract

“…Eitel et al [13] proposed to carry out objection recognition by fusing depth maps and color images with a CNN. In [14], LIDAR point clouds were transformed into their HHA (horizontal disparity, height above the ground, and angle) representation [15] and then combined with RGB images using a variety of CNN fusion strategies for performing pedestrian detection. More recently, Asvadi et al [16] developed a system for vehicle detection that integrates LIDAR and color camera data within a deep learning framework.…”

Section: Related Workmentioning

confidence: 99%

LIDAR–camera fusion for road detection using fully convolutional neural networks

Caltagirone

Bellone

Svensson

et al. 2019

Robotics and Autonomous Systems

280

144

View full text Add to dashboard Cite

In this work, a deep learning approach has been developed to carry out road detection by fusing LIDAR point clouds and camera images. An unstructured and sparse point cloud is first projected onto the camera image plane and then upsampled to obtain a set of dense 2D images encoding spatial information. Several fully convolutional neural networks (FCNs) are then trained to carry out road detection, either by using data from a single sensor, or by using three fusion strategies: early, late, and the newly proposed cross fusion. Whereas in the former two fusion approaches, the integration of multimodal information is carried out at a predefined depth level, the cross fusion FCN is designed to directly learn from data where to integrate information; this is accomplished by using trainable cross connections between the LIDAR and the camera processing branches.To further highlight the benefits of using a multimodal system for road detection, a data set consisting of visually challenging scenes was extracted from driving sequences of the KITTI raw data set. It was then demonstrated that, as expected, a purely camera-based FCN severely underperforms on this data set. A multimodal system, on the other hand, is still able to provide high accuracy. Finally, the proposed cross fusion FCN was evaluated on the KITTI road benchmark where it achieved excellent performance, with a MaxF score of 96.03%, ranking it among the top-performing approaches.

show abstract

Fusing LIDAR and images for pedestrian detection using convolutional neural networks

Cited by 117 publications

References 12 publications

Perception, Planning, Control, and Coordination for Autonomous Vehicles

Perception, Planning, Control, and Coordination for Autonomous Vehicles

Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges

LIDAR–camera fusion for road detection using fully convolutional neural networks

Contact Info

Product

Resources

About