Model Compression via Position-Based Scaled Gradient

Kim, Jangho; Yoo, KiYoon; Kwak, Nojun

doi:10.1109/access.2022.3231455

Cited by 3 publications

(8 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…QAT method can not perform well in various bit-width and it needs to be retrained when the target bit-width of the model is changed. To give quantization robustness, we utilize the gradient rescaling update rule method [9] which gives the model quantization robustness. More details of model weight updates using gradients are illustrated in the following knowledge distillation section.…”

Section: A Pruningmentioning

confidence: 99%

“…In this work, we consider the training framework as a compression-friendly model that focuses on the model's sparsity and quantization robustness without a retraining phase (on-the-fly quantization [8], [9]). We introduce the efficient training framework including knowledge distillation, pruning and quantization method dubbed as Quantization Robust Pruning with knowledge distillation (QRPK) method.…”

Section: Introductionmentioning

confidence: 99%

“…We directly train the sparse model with a pruning method and utilize the knowledge distillation similar to [10]. To satisfy a quantization robustness condition, we adopt the gradient rescaling method [9] to train the pruned network. Inessential weights in Dense network are updated by the original gradient, which is used for knowledge distillation and dynamic pruning.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Quantization Robust Pruning With Knowledge Distillation

Kim

2023

IEEE Access

Self Cite

View full text Add to dashboard Cite

To resolve the problem that deep neural networks (DNN) require a large number of network parameters, many researchers have sought to compress the network. Network pruning, quantization and knowledge distillation have been studied for this purpose. Considering realistic scenarios such as deploying DNN on the resource constraint device where the network uploaded in the device performs wells in various bit-widths without re-training and the network with reasonable performance, we propose quantization robust pruning with knowledge distillation (QRPK) method. In QRPK, model weights are divided into essential weigths and inessential weights based on their magnitude value. Then, QRPK trains the quantization robustness model with a high pruning ratio by making the distribution of essential weights as a quantization friendly distribution. We conducted experiments on CIFAR-10 and CIFAR-100 to verify the effectiveness of QRPK and a QRPK trained model performs well in various bit-width, as it designed by pruning, quantization robustness and knowledge distillation.

show abstract

Section: A Pruningmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Quantization Robust Pruning With Knowledge Distillation

Kim

2023

IEEE Access

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, this is not the biggest problem that model compression can cause to a model. In Kim's work [6], he proposed a positionbased scaled gradient as a training optimizer that scales the gradient depending on the position of a weight vector for friendly model compression. While for previous work of [9], [7] and [5], they focus on mimicking activation by mean and variance to represent the distribution of activation in the training dataset.…”

Section: Introductionmentioning

confidence: 99%

“…Moreover, when considering bringing model compression to an object detection scheme, the model is more numerically sensitive than the Image Classification process. For [6] and [2] work of Image Classification, the final result for a model is a certain value which is clamping by softmax distribution between 0 and 1 and pushing its maximum a posteriori on a correct class. While a maximum posterior distribution value does not need to be determined exactly, the bounding boxes represented for object detection have to be fitted with the image's pixel location, especially for lower scale objects in the image.…”

Section: Introductionmentioning

confidence: 99%

GreedySlide: An Efficient Sliding Window for Improving Edge-Object Detectors

Thien¹,

Duong²,

Le³

2022

Proceedings of the Seventh International Conference on Research in Intelligent and Computing in Engineering

View full text Add to dashboard Cite

The recent development in deep learning and edge hardware architecture has provided artificial applications with a robust foundation to move into real-life applications and allow a model to inference right on edge. If a well-trained edge object detection (OD) model is acquired, multiple scenarios such as autonomous driving, autonomous hospital management, or a self-shopping cart can be achieved. However, to make a model well-inference on edge, a model needs to be quantized to scale down the size and speed up at inference. This quantization scheme creates a degradation in the model where each layer is restricted to at most lower representations, forcing an output layer only to have fewer options to circle an object. Furthermore, it also limits model generalization where the behavior of the dataset gets cut off each activation layer. We proposed a novel method GreedySlide by sliding window that divides a capture into windows to make an object fits better on the quantization bound to address this problem. Even though the technique sounds simple, it helps increase the number of options for bounding an object and clips the variance that can have by scanning the whole image. Our work has improved an original edge model on its corresponding benchmark by experimenting and increasing the model generalization on other related datasets without retraining the model.

show abstract

HyperBlock Floating Point: Generalised Quantization Scheme for Gradient and Inference Computation

Nascimento

Prisacariu

Fawcett

et al. 2023

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

Prior quantization methods focus on producing networks for fast and lightweight inference. However, the cost of unquantised training is overlooked, despite requiring significantly more time and energy than inference. We present a method for quantizing convolutional neural networks for efficient training. Quantizing gradients is challenging because it requires higher granularity and their values span a wider range than the weight and feature maps. We propose an extension of the Channel-wise Block Floating Point format that allows for quick gradient computation, using a minimal amount of quantization time. This is achieved through sharing an exponent across both depth and batch dimensions in order to quantize tensors once and reuse them during backpropagation. We test our method using standard models such as AlexNet, VGG, and ResNet, on the CI-FAR10, SVHN and ImageNet datasets. We show no loss of accuracy when quantizing AlexNet weights, activations and gradients to only 4 bits training ImageNet.

show abstract

Model Compression via Position-Based Scaled Gradient

Cited by 3 publications

References 31 publications

Quantization Robust Pruning With Knowledge Distillation

Quantization Robust Pruning With Knowledge Distillation

GreedySlide: An Efficient Sliding Window for Improving Edge-Object Detectors

HyperBlock Floating Point: Generalised Quantization Scheme for Gradient and Inference Computation

Contact Info

Product

Resources

About