Environment perception constitutes one of the most critical operations performed by semiand fully-autonomous vehicles. In recent years, Deep Neural Networks (DNNs) have become the standard tool for perception solutions owing to their impressive capabilities in analyzing and modelling complex and dynamic scenes, from (often muti-modal) sensory inputs. However, the well-established performance of DNNs comes at the cost of increased time and storage complexity, which may become problematic in automotive perception systems due to the requirement for a short prediction horizon (as in many cases inference must be performed in real time) and the limited computational, storage, and energy resources of mobile systems. A common way of addressing this problem is to transform the original large pretrained networks into new smaller models, by utilizing Model Compression and Acceleration (MCA) techniques, improving both their storage and execution efficiency. Within the MCA framework, in this paper, we investigate the application of two state-of-the-art weight-sharing MCA techniques, namely a Vector Quantization (VQ) and a Dictionary Learning (DL) one, as well as two novel extensions, towards the acceleration and compression of widely used DNNs for 2D and 3D object-detection in automotive applications. Apart from the individual (uni-modal) networks, we also present and evaluate a multi-modal late-fusion algorithm for combining the detection results of the 2D and 3D detectors. Our evaluation studies are carried out on the KITTI Dataset. The obtained results lend themselves to a twofold interpretation. On the one hand, they showcase the significant acceleration and compression gains that can be achieved via the application of weight sharing on the selected DNN detectors, with limited accuracy loss, as well as highlight the performance differences between the two utilized weight sharing approaches. On the other, they demonstrate the substantial boost in detection performance obtained by combining the outcome of the two unimodal individual detectors, using the proposed late-fusion based multi-modal approach. Indeed, as our experiments have shown, pairing the high-performance DL-based MCA technique with the loss-mitigating effect of the multi-modal fusion approach, leads to highly accelerated models (up to approximately 2.5× and 6× for the 2D and 3D detectors, respectively) with the performance loss of the fused results ranging in most cases within single-digits figures (as low as around 1% for the class "cars").INDEX TERMS model compression and acceleration, multi-modal fusion, object detection, scene analysis, scene understanding, experimental evaluation
I. INTRODUCTIONAutonomous vehicles (AV) are an integral part of the continuously evolving field of Intelligent Transportation Systems (ITS) [1], and introduce a variety of technical challenges intertwined with the levels of driving automation, as defined for example by the Society of Automobile Engineers (SAE) J3016 standard [2], ranging from "no driving automation" (level 0) to ...