A Hardware-Friendly High-Precision CNN Pruning Method and Its FPGA Implementation

Sui, Xuefu; Lv, Qunbo; Zhi, Liangjie; Zhu, Bofeng; Yang, Yuanbo; Tan, Zheng

doi:10.3390/s23020824

Cited by 11 publications

(1 citation statement)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To overcome these issues, many efforts have been dedicated to reducing the model size and the computation cost without affecting the accuracy by suggesting several optimization techniques. The most commonly used techniques are quantization [8] and pruning [9].…”

Section: Introductionmentioning

confidence: 99%

CNN inference acceleration on limited resources FPGA platforms_epilepsy detection case study

Saidi

Othman

Dhouibi

et al. 2023

IJ-ICT

View full text Add to dashboard Cite

<span lang="EN-US">The use of a convolutional neural network (CNN) to analyze and classify electroencephalogram (EEG) signals has recently attracted the interest of researchers to identify epileptic seizures. This success has come with an enormous increase in the computational complexity and memory requirements of CNNs. For the sake of boosting the performance of CNN inference, several hardware accelerators have been proposed. The high performance and flexibility of the field programmable gate array (FPGA) make it an efficient accelerator for CNNs. Nevertheless, for resource-limited platforms, the deployment of CNN models poses significant challenges. For an ease of CNN implementation on such platforms, several tools and frameworks have been made available by the research community along with different optimization techniques. In this paper, we proposed an FPGA implementation for an automatic seizure detection approach using two CNN models, namely VGG-16 and ResNet-50. To reduce the model size and computation cost, we exploited two optimization approaches: pruning and quantization. Furthermore, we presented the results and discussed the advantages and limitations of two implementation alternatives for the inference acceleration of quantized CNNs on Zynq-7000: an advanced RISC machine (ARM) software implementation-based ARM, NN, software development kit (SDK) and a software/hardware implementation-based deep learning processor unit (DPU) accelerator and DNNDK toolkit.</span>

show abstract

Section: Introductionmentioning

confidence: 99%

CNN inference acceleration on limited resources FPGA platforms_epilepsy detection case study

Saidi

Othman

Dhouibi

et al. 2023

IJ-ICT

View full text Add to dashboard Cite

show abstract

A comprehensive review of model compression techniques in machine learning

Dantas,

Sabino da Silva,

Cordeiro

et al. 2024

Appl Intell

View full text Add to dashboard Cite

This paper critically examines model compression techniques within the machine learning (ML) domain, emphasizing their role in enhancing model efficiency for deployment in resource-constrained environments, such as mobile devices, edge computing, and Internet of Things (IoT) systems. By systematically exploring compression techniques and lightweight design architectures, it is provided a comprehensive understanding of their operational contexts and effectiveness. The synthesis of these strategies reveals a dynamic interplay between model performance and computational demand, highlighting the balance required for optimal application. As machine learning (ML) models grow increasingly complex and data-intensive, the demand for computational resources and memory has surged accordingly. This escalation presents significant challenges for the deployment of artificial intelligence (AI) systems in real-world applications, particularly where hardware capabilities are limited. Therefore, model compression techniques are not merely advantageous but essential for ensuring that these models can be utilized across various domains, maintaining high performance without prohibitive resource requirements. Furthermore, this review underscores the importance of model compression in sustainable artificial intelligence (AI) development. The introduction of hybrid methods, which combine multiple compression techniques, promises to deliver superior performance and efficiency. Additionally, the development of intelligent frameworks capable of selecting the most appropriate compression strategy based on specific application needs is crucial for advancing the field. The practical examples and engineering applications discussed demonstrate the real-world impact of these techniques. By optimizing the balance between model complexity and computational efficiency, model compression ensures that the advancements in AI technology remain sustainable and widely applicable. This comprehensive review thus contributes to the academic discourse and guides innovative solutions for efficient and responsible machine learning practices, paving the way for future advancements in the field. Graphical abstract

show abstract