Alexander Frickenstein scite author profile

ACM Trans. Embed. Comput. Syst.

et al. 2021

Model compression through quantization is commonly applied to convolutional neural networks (CNNs) deployed on compute and memory-constrained embedded platforms. Different layers of the CNN can have varying degrees of numerical precision for both weights and activations, resulting in a large search space. Together with the hardware (HW) design space, the challenge of finding the globally optimal HW-CNN combination for a given application becomes daunting. To this end, we propose HW-FlowQ, a systematic approach that enables the co-design of the target hardware platform and the compressed CNN model through quantization. The search space is viewed at three levels of abstraction, allowing for an iterative approach for narrowing down the solution space before reaching a high-fidelity CNN hardware modeling tool, capable of capturing the effects of mixed-precision quantization strategies on different hardware architectures (processing unit counts, memory levels, cost models, dataflows) and two types of computation engines (bit-parallel vectorized, bit-serial). To combine both worlds, a multi-objective non-dominated sorting genetic algorithm (NSGA-II) is leveraged to establish a Pareto-optimal set of quantization strategies for the target HW-metrics at each abstraction level. HW-FlowQ detects optima in a discrete search space and maximizes the task-related accuracy of the underlying CNN while minimizing hardware-related costs. The Pareto-front approach keeps the design space open to a range of non-dominated solutions before refining the design to a more detailed level of abstraction. With equivalent prediction accuracy, we improve the energy and latency by 20% and 45% respectively for ResNet56 compared to existing mixed-precision search methods.

show abstract

BinaryCoP: Binary Neural Network-based COVID-19 Face-Mask Wear and Positioning Predictor on Edge Devices

Fasfous

et al. 2021

Binary DAD-Net: Binarized Driveable Area Detection Network for Autonomous Driving

Mayr

et al. 2020

Driveable area detection is a key component for various applications in the field of autonomous driving (AD), such as ground-plane detection, obstacle detection and maneuver planning. Additionally, bulky and over-parameterized networks can be easily forgone and replaced with smaller networks for faster inference on embedded systems. The driveable area detection, posed as a two class segmentation task, can be efficiently modeled with slim binary networks. This paper proposes a novel binarized driveable area detection network (binary DAD-Net), which uses only binary weights and activations in the encoder, the bottleneck, and the decoder part. The latent space of the bottleneck is efficiently increased (×32→×16 downsampling) through binary dilated convolutions, learning more complex features. Along with automatically generated training data, the binary DAD-Net outperforms state-of-theart semantic segmentation networks on public datasets. In comparison to a full-precision model, our approach has a ×14.3 reduced compute complexity on an FPGA and it requires only 0.9MB memory resources. Therefore, commodity SIMD-based AD-hardware is capable of accelerating the binary DAD-Net.

show abstract

DSC: Dense-Sparse Convolution for Vectorized Inference of Convolutional Neural Networks

Unger

et al. 2019