Binarized Neural Network With Parameterized Weight Clipping and Quantization Gap Minimization for Online Knowledge Distillation

Kang, Ju Yeon; Ryu, Chang Ho; Han, Tae Hee

doi:10.1109/access.2023.3238715

Cited by 2 publications

(2 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some researchers focused on extreme quantization, in which only binary or ternary weights and activations are involved [22], [23], [27]. These methods used bit-shift logic instead of high-precision multiplications to achieve a significant acceleration but often result in substantial performance degradation.…”

Section: Related Work a Network Quantizationmentioning

confidence: 99%

CANET: Quantized Neural Network Inference With 8-bit Carry-Aware Accumulator

Yang,

Wang,

Jiang

2024

IEEE Access

View full text Add to dashboard Cite

Neural network quantization represents weights and activations with few bits, greatly reducing the overhead of multiplications. However, due to the recursive accumulation operations, high-precision accumulators are still required in multiply-accumulate (MAC) units to avoid overflow, incurring significant computational overhead. This constraint limits the efficient deployment of quantized NNs on resourceconstrained platforms. To address this problem, we present a novel framework named CANET, which adapts the 8-bit quantized model to execute MAC operations with 8-bit accumulators. CANET not only employs 8bit carry-aware accumulators to represent overflow data correctly, but also adaptively learns the optimal format per layer to minimize truncation errors. Meanwhile, a weight-oriented reordering method is developed to reduce the transfer length of the carry. CANET is evaluated on three networks in the ImageNet classification task, where comparable performance with state-of-the-art methods is realized. Finally, we implement the proposed architecture on a custom hardware platform, demonstrating a reduction of 40% in power and 49% in area compared with the MAC unit with 32-bit accumulators.

show abstract

Section: Related Work a Network Quantizationmentioning

confidence: 99%

CANET: Quantized Neural Network Inference With 8-bit Carry-Aware Accumulator

Yang,

Wang,

Jiang

2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…The first approach involves amplifying the representational capability of each layer by expanding the diversity of cases that the parameters can represent [4]- [11]. The second method focuses on refining the gradient mismatch in the backward path [1], [12], [13]. By employing straight-through estimation (STE) [14] and scaling factors, Rastegari et al [2] demonstrated a notable expansion in the network representation and more accurate parameter updates.…”

Section: Introductionmentioning

confidence: 99%

SFAO: Sign-Flipping-Aware Optimization for Early-Stopping of Binarized Neural Networks

Kang,

Ryu,

Kang

et al. 2023

IEEE Access

Self Cite

View full text Add to dashboard Cite

One of the vital challenges for the binary neural networks (BNNs) is improving their inference performance by expanding their data representation capabilities for figuring out delicate patterns and nuances in the data. Addressing the explosive computational demands on neural network training is essential to guarantee sustainable development and scalable deployment. However, mitigating the increase in the computational cost during the training phase is critical for ensuring sustainability and scalability during deployment. In this study, an advanced sign-flipping-aware optimizer (SFAO) that focuses on BNNs was introduced to diminish the computational burden. SFAO balanced the model performance and computational cost through sign-flipping-aware updating rules throughout the training of BNNs. SFAO optimizer, tailored for BNNs with binary weight-specific updating rules, considerably reduced the computing resources needed for training on the CIFAR-10 dataset. Specifically, it surpassed the conventional full-precision updating rule by reducing the total instruction count by 21.89%. In contrast, SFAO showed a marginal 0.44% decline in the image classification accuracy relative to the updating rules for the full-precision parameters. Furthermore, the implementation of early stopping using the sign flip rate led to a notable reduction of 9.37% in the average computation time per network for the ImageNet dataset.INDEX TERMS artificial intelligence, model compression, optimizer, efficient machine learning, binarized neural networks, layer freezing.

show abstract

Binarized Neural Network With Parameterized Weight Clipping and Quantization Gap Minimization for Online Knowledge Distillation

Cited by 2 publications

References 22 publications

CANET: Quantized Neural Network Inference With 8-bit Carry-Aware Accumulator

CANET: Quantized Neural Network Inference With 8-bit Carry-Aware Accumulator

SFAO: Sign-Flipping-Aware Optimization for Early-Stopping of Binarized Neural Networks

Contact Info

Product

Resources

About