Trainable Thresholds for Neural Network Quantization

Goncharenko, Alexander; Denisov, Andrey; Alyamkin, Sergey; Terentev, Evgeny

doi:10.1007/978-3-030-20518-8_26

Cited by 3 publications

(3 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Various quantization techniques have been proposed to make DNNs perform faster and fit larger networks on edge devices with limited storage capacity and energy budget [6][7][8]. An unfortunate consequence of quantization is the reduced accuracy, which can be tackled by increasing the network size, performing quantization only on parameters (and not on activations), or fine-tuning and re-training the network.…”

Section: Introductionmentioning

confidence: 99%

Self-Compression in Bayesian Neural Networks

Carannante

Dera

Rasool

et al. 2020

2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP)

View full text Add to dashboard Cite

Machine learning models have achieved human-level performance on various tasks. This success comes at a high cost of computation and storage overhead, which makes machine learning algorithms difficult to deploy on edge devices. Typically, one has to partially sacrifice accuracy in favor of an increased performance quantified in terms of reduced memory usage and energy consumption. Current methods compress the networks by reducing the precision of the parameters or by eliminating redundant ones. In this paper, we propose a new insight into network compression through the Bayesian framework. We show that Bayesian neural networks automatically discover redundancy in model parameters, thus enabling self-compression, which is linked to the propagation of uncertainty through the layers of the network. Our experimental results show that the network architecture can be successfully compressed by deleting parameters identified by the network itself while retaining the same level of accuracy.

show abstract

Section: Introductionmentioning

confidence: 99%

Self-Compression in Bayesian Neural Networks

Carannante

Dera

Rasool

et al. 2020

2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP)

View full text Add to dashboard Cite

show abstract

“…[23] Recent research shows that post-training quantization that doesn't involve retraining or fine-tuning the full model can also effectively quantize the model. [24][25][26][27] Another critical issue of training convergence appears when utilizing the quantized version of neural networks during both training and testing. Chen et al proposed a hardware accelerator for convolutional neural networks and showed that the training precision should be selected as 32 bits instead of 16 bits to guarantee the convergence.…”

Section: General Approaches Towards Noise In Neural Networkmentioning

confidence: 99%

“…[ 23 ] Recent research shows that post‐training quantization that doesn't involve retraining or fine‐tuning the full model can also effectively quantize the model. [ 24–27 ]…”

Section: Introductionmentioning

confidence: 99%

Tolerating Noise Effects in Processing‐in‐Memory Systems for Neural Networks: A Hardware–Software Codesign Perspective

Yang

et al. 2022

Advanced Intelligent Systems

View full text Add to dashboard Cite

Neural networks have been widely used for advanced tasks from image recognition to natural language processing. Many recent works focus on improving the efficiency of executing neural networks in diverse applications. Researchers have advocated processing‐in‐memory (PIM) architecture as a promising candidate for training and testing neural networks because PIM design can reduce the communication cost between storage and computing units. However, there exist noises in the PIM system generated from the intrinsic physical properties of both memory devices and the peripheral circuits. The noises introduce challenges in stably training the systems and achieving high test performance, e.g., accuracy in classification tasks. This review discusses the current approaches to tolerating noise effects for both training and inference in PIM systems and provides an analysis from a hardware–software codesign perspective. Noise‐tolerant strategies for PIM systems based on resistive random‐access memory (ReRAM), including circuit‐level, algorithm‐level, and system‐level solutions are explained. In addition, we also present some selected noise‐tolerate cases in PIM systems for generative adversarial networks and physical neural networks.

show abstract