Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models

Rezk, Nesma M.; Nordström, Tomas; Ul-Abdin, Zain

doi:10.3390/info13040176

Cited by 4 publications

(5 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, we had a better opportunity to have multiple trials to explore our methodology. In our research, we have shown that SRU and LiGRU-based models can provide smaller compressed models than other RNN models [15]. However, they suffer more from high error rates at high compression ratios.…”

Section: Discussion and Limitationsmentioning

confidence: 79%

“…In LiGRU, we also apply the integer quantization to the weight matrices only. In our work, we have found that in the case of the LiGRU unit, there exists a weight matrix that is more sensitive to quantization [15]. This matrix is the matrix multiplied by the hiddenstate vector in the candidate state vector computation.…”

Section: Post-training Quantization Of Sru and Ligrumentioning

confidence: 94%

“…First, we use post-training quantization as a compression method. Posttraining quantization has become a more and more reliable compression method recently [12][13][14][15]. Thus, the evaluation of one candidate solution would require only running the inference of the NN model.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

MOHAQ: Multi-Objective Hardware-Aware Quantization of recurrent neural networks

Rezk

Nordström

Stathis

et al. 2022

Journal of Systems Architecture

Self Cite

View full text Add to dashboard Cite

Section: Discussion and Limitationsmentioning

confidence: 79%

Section: Post-training Quantization Of Sru and Ligrumentioning

confidence: 94%

See 1 more Smart Citation

MOHAQ: Multi-Objective Hardware-Aware Quantization of recurrent neural networks

Rezk

Nordström

Stathis

et al. 2022

Journal of Systems Architecture

Self Cite

View full text Add to dashboard Cite

“…For existing solutions, we summarize the results reported in prior work. For solutions that do not apply quantization to the weights or activations, we estimate their precision at 16 bits per element rather than 32 since neural networks can be quantized to 16-bit fixed-point through post-training quantization without significant accuracy degradation [50].…”

Section: Evaluation Of Joint Pruning and Quantizationmentioning

confidence: 99%

Learning Low-Precision Structured Subnetworks Using Joint Layerwise Channel Pruning and Uniform Quantization

2022

View full text Add to dashboard Cite

Pruning and quantization are core techniques used to reduce the inference costs of deep neural networks. Among the state-of-the-art pruning techniques, magnitude-based pruning algorithms have demonstrated consistent success in the reduction of both weight and feature map complexity. However, we find that existing measures of neuron (or channel) importance estimation used for such pruning procedures have at least one of two limitations: (1) failure to consider the interdependence between successive layers; and/or (2) performing the estimation in a parametric setting or by using distributional assumptions on the feature maps. In this work, we demonstrate that the importance rankings of the output neurons of a given layer strongly depend on the sparsity level of the preceding layer, and therefore, naïvely estimating neuron importance to drive magnitude-based pruning will lead to suboptimal performance. Informed by this observation, we propose a purely data-driven nonparametric, magnitude-based channel pruning strategy that works in a greedy manner based on the activations of the previous sparsified layer. We demonstrate that our proposed method works effectively in combination with statistics-based quantization techniques to generate low precision structured subnetworks that can be efficiently accelerated by hardware platforms such as GPUs and FPGAs. Using our proposed algorithms, we demonstrate increased performance per memory footprint over existing solutions across a range of discriminative and generative networks.

show abstract

Section: Introductionmentioning

confidence: 99%