2021
DOI: 10.48550/arxiv.2106.14681
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation

Abstract: As edge devices become prevalent, deploying Deep Neural Networks (DNN) on edge devices has become a critical issue. However, DNN requires a high computational resource which is rarely available for edge devices. To handle this, we propose a novel model compression method for the devices with limited computational resources, called PQK consisting of pruning, quantization, and knowledge distillation (KD) processes. Unlike traditional pruning and KD, PQK makes use of unimportant weights pruned in the pruning proc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 18 publications
0
6
0
Order By: Relevance
“…As the other approaches such as pruning and quantization is capable of balancing a trade-off between accuracy and compression ratio. Some approaches [140], [227], [175] also involves using combination of multiple compression techniques: knowledge distillation, pruning, and quantization to achieve better accuracy and compression ratio.…”
Section: E Energy Efficient Approaches In Autonomous Drivingmentioning
confidence: 99%
“…As the other approaches such as pruning and quantization is capable of balancing a trade-off between accuracy and compression ratio. Some approaches [140], [227], [175] also involves using combination of multiple compression techniques: knowledge distillation, pruning, and quantization to achieve better accuracy and compression ratio.…”
Section: E Energy Efficient Approaches In Autonomous Drivingmentioning
confidence: 99%
“…Cui and Li, the architects of [18], unveil a complex model compression approach that combines structural pruning with dense knowledge distillation for large language models. Kim et al [19] address the needs of edge devices with PQK, an innovative combination of pruning, quantization, and knowledge distillation. A structured progression of pruning, quantization, and distillation provides a comprehensive strategy for efficient edge-based model deployment.…”
Section: Combination Of Pruning and Knowledge Distillationmentioning
confidence: 99%
“…The commonly used model compression techniques [23][24][25][26][27] are quantization [28][29][30][31][32][33][34] and pruning [35][36][37][38][39] -weight pruning, layer pruning, filter pruning, and channel pruning. Other approaches are KD, 40 layer partitioning, channel partitioning, spatial partitioning, and workload partitioning. Layer-wise, fused layer parallelization, 41,42 and fused layer partitioning 5 are the few other strategies for implementing DL at the edge.…”
Section: Background and Literature Surveymentioning
confidence: 99%