Benchmarking TensorFlow Lite Quantization Algorithms for Deep Neural Networks

Orasan, Ioan Lucan; Seiculescu, Ciprian; Căleanu, Cătălin Daniel

doi:10.1109/saci55618.2022.9919465

Cited by 5 publications

(2 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Quantization is applied to reduce the numerical representation of the neural network parameters with the aim of decreasing the memory footprint and consequently the model size. Since neural network models are usually highly over-parameterized, the precision could be maintained at a high level [11].…”

Section: B Neural Model Optimization and Compression Techniquesmentioning

confidence: 99%

“…Furthermore, representations with less than 8 bits have already been proposed, and even binarization [10]. I. Orasan et al [11] investigated several post-training quantization solutions using the TensorFlow Lite deep learning framework on CNN models of different sizes. The obtained compression ratio is up to 4 times, and the worst-case accuracy degradation is only 0.43%.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Deep Learning-Based Eye Gaze Estimation for Automotive Applications Using Knowledge Distillation

Orăşan,

Bublea,

Căleanu

2023

IEEE Access

View full text Add to dashboard Cite

Deep neural networks are currently applied in multiple domains, especially in the automotive industry. The main reason for this is related to the more complex challenges found in the field of signal processing, especially when the tasks involve image and video data types. Using conventional/statistical algorithms to deal with these high-complexity challenges is no longer a viable approach. Therefore, the involvement of artificial intelligence solutions like deep neural networks has significantly increased. In recent years, numerous architectures have been developed with the aim of maximizing performance. However, their size and computation requirements have increased at the same time. For this reason, special attention is currently being paid to the optimization of deep neural networks while trying to maintain (almost) the same performance. In this work, we aim to tackle the problem of eye gaze estimation considered within the automotive framework. Our proposal uses a knowledge distillation concept applied to a custom CNN architecture, called the teacher model. Based on this, several CNN student models are derived using layerwise and widthwise compression techniques. Furthermore, they are evaluated with respect to certain performance metrics, e.g. neural network size and inference time. In the experimental results, we propose certain compression methods which can address specific user requirements like model size, accuracy, and inference time. Finally, the student models are evaluated using an EdgeAI embedded device (STM32H747I-DISCO) in terms of accuracy, memory utilization, MACC complexity, and inference time. The combination of layerwise and widthwise compression results as the optimal method to derive student models with a good trade-off between the above-mentioned metrics. Using knowledge distillation, the accuracy can be improved by up to 9.5% over the conventional training procedure.

show abstract

Section: B Neural Model Optimization and Compression Techniquesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%