2021
DOI: 10.3390/electronics10040396
|View full text |Cite
|
Sign up to set email alerts
|

Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

Abstract: This paper compares the latency, accuracy, training time and hardware costs of neural networks compressed with our new multi-objective evolutionary algorithm called NEMOKD, and with quantisation. We evaluate NEMOKD on Intel’s Movidius Myriad X VPU processor, and quantisation on Xilinx’s programmable Z7020 FPGA hardware. Evolving models with NEMOKD increases inference accuracy by up to 82% at the cost of 38% increased latency, with throughput performance of 100–590 image frames-per-second (FPS). Quantisation id… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(2 citation statements)
references
References 32 publications
0
2
0
Order By: Relevance
“…In addition, Han et al [20] added Huffman coding after quantization, which can further reduce the memory size and operation time of the model. Wu, Stewart and Wang et al [21][22][23] designed a new quantization framework for the hardware level, and provided different quantization strategies for different neural networks and hardware structures. Besides pruning and quantization, knowledge distillation is also an effective method of model compression.…”
Section: Introductionmentioning
confidence: 99%
“…In addition, Han et al [20] added Huffman coding after quantization, which can further reduce the memory size and operation time of the model. Wu, Stewart and Wang et al [21][22][23] designed a new quantization framework for the hardware level, and provided different quantization strategies for different neural networks and hardware structures. Besides pruning and quantization, knowledge distillation is also an effective method of model compression.…”
Section: Introductionmentioning
confidence: 99%
“…However, with the advent of the Internet of Things, how to deploy high-performance DCNNs on embedded devices with limited hardware resources has become an urgent problem. To solve this problem, many model compression methods which reduce the model size and computational burden have been proposed, such as network quantization [7,8], model pruning [9], knowledge distillation [10,11], and lightweight model design [12].…”
Section: Introductionmentioning
confidence: 99%