2019 2nd Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2) 2019
DOI: 10.1109/emc249363.2019.00013
|View full text |Cite
|
Sign up to set email alerts
|

Run-Time Efficient RNN Compression for Inference on Edge Devices

Abstract: Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints. As a result, there is a need for compression techniques that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper explores a new compressed RNN cell implementation called Hybrid Matrix Decomposition (HMD) that achieves this dual objec… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
8
2

Relationship

5
5

Authors

Journals

citations
Cited by 17 publications
(8 citation statements)
references
References 18 publications
0
8
0
Order By: Relevance
“…Analo-gously, [53] proposes hybrid network architectures combing binary and full-precision sections to achieve significant energy efficiency and memory compression with performance guaranteed. Thakker et al study a compressed RNN cell implementation called Hybrid Matrix Decomposition (HMD) for model inference [54]. It divides the matrix of network weights into two parts: an unconstrained upper half and a lower half composed of rank-1 blocks.…”
Section: A State Of the Art 1) Model Compressionmentioning
confidence: 99%
“…Analo-gously, [53] proposes hybrid network architectures combing binary and full-precision sections to achieve significant energy efficiency and memory compression with performance guaranteed. Thakker et al study a compressed RNN cell implementation called Hybrid Matrix Decomposition (HMD) for model inference [54]. It divides the matrix of network weights into two parts: an unconstrained upper half and a lower half composed of rank-1 blocks.…”
Section: A State Of the Art 1) Model Compressionmentioning
confidence: 99%
“…Tensor factorization is an approach that decomposes a single layer into smaller, more efficient layers [15,[21][22][23][24][25]. Such decomposition creates a smaller network.…”
Section: Tensor Factorizationmentioning
confidence: 99%
“…Designing efficient CNNs has been an active, ongoing area of research. In recent years, numerous research efforts have been devoted to compressing neural networks through the use of model pruning [10], quantization [7][8][9], low-rank tensor decomposition [28][29][30][31], compact network architecture design [32], etc.…”
Section: Related Workmentioning
confidence: 99%