2022
DOI: 10.3390/info13040176
|View full text |Cite
|
Sign up to set email alerts
|

Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models

Abstract: Recurrent neural networks (RNNs) are neural networks (NN) designed for time-series applications. There is a growing interest in running RNNs to support these applications on edge devices. However, RNNs have large memory and computational demands that make them challenging to implement on edge devices. Quantization is used to shrink the size and the computational needs of such models by decreasing weights and activation precision. Further, the delta networks method increases the sparsity in activation vectors b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
4
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 24 publications
0
4
0
Order By: Relevance
“…Thus, we had a better opportunity to have multiple trials to explore our methodology. In our research, we have shown that SRU and LiGRU-based models can provide smaller compressed models than other RNN models [15]. However, they suffer more from high error rates at high compression ratios.…”
Section: Discussion and Limitationsmentioning
confidence: 79%
See 2 more Smart Citations
“…Thus, we had a better opportunity to have multiple trials to explore our methodology. In our research, we have shown that SRU and LiGRU-based models can provide smaller compressed models than other RNN models [15]. However, they suffer more from high error rates at high compression ratios.…”
Section: Discussion and Limitationsmentioning
confidence: 79%
“…In LiGRU, we also apply the integer quantization to the weight matrices only. In our work, we have found that in the case of the LiGRU unit, there exists a weight matrix that is more sensitive to quantization [15]. This matrix is the matrix multiplied by the hiddenstate vector in the candidate state vector computation.…”
Section: Post-training Quantization Of Sru and Ligrumentioning
confidence: 94%
See 1 more Smart Citation
“…For existing solutions, we summarize the results reported in prior work. For solutions that do not apply quantization to the weights or activations, we estimate their precision at 16 bits per element rather than 32 since neural networks can be quantized to 16-bit fixed-point through post-training quantization without significant accuracy degradation [50].…”
Section: Evaluation Of Joint Pruning and Quantizationmentioning
confidence: 99%
“…First, we use post-training quantization as a compression method. Posttraining quantization has become a more and more reliable compression method recently [12][13][14][15]. Thus, the evaluation of one candidate solution would require only running the inference of the NN model.…”
Section: Introductionmentioning
confidence: 99%