In this paper, we explore various aspects of fusing LIDAR and color imagery for pedestrian detection in the context of convolutional neural networks (CNNs), which have recently become state-of-art for many vision problems. We incorporate LIDAR by up-sampling the point cloud to a dense depth map and then extracting three features representing different aspects of the 3D scene. We then use those features as extra image channels. Specifically, we leverage recent work on HHA [9] (horizontal disparity, height above ground, and angle) representations, adapting the code to work on up-sampled LIDAR rather than Microsoft Kinect depth maps. We show, for the first time, that such a representation is applicable to up-sampled LIDAR data, despite its sparsity. Since CNNs learn a deep hierarchy of feature representations, we then explore the question: At what level of representation should we fuse this additional information with the original RGB image channels? We use the KITTI pedestrian detection dataset for our exploration. We first replicate the finding that region-CNNs (R-CNNs) [8] can outperform the original proposal mechanism using only RGB images, but only if fine-tuning is employed. Then, we show that: 1) using HHA features and RGB images performs better than RGB-only, even without any fine-tuning using large RGB web data, 2) fusing RGB and HHA achieves the strongest results if done late, but, under a parameter or computational budget, is best done at the early to middle layers of the hierarchical representation, which tend to represent midlevel features rather than low (e.g. edges) or high (e.g. object class decision) level features, 3) some of the less successful methods have the most parameters, indicating that increased classification accuracy is not simply a function of increased capacity in the neural network.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.