2022
DOI: 10.1109/access.2022.3231455
|View full text |Cite
|
Sign up to set email alerts
|

Model Compression via Position-Based Scaled Gradient

Abstract: We propose the position-based scaled gradient (PSG) that scales the gradient depending on the position of a weight vector to make it more compression-friendly. First, we theoretically show that applying PSG to the standard gradient descent (GD), which is called PSGD, is equivalent to the GD in the warped weight space, a space made by warping the original weight space via an appropriately designed invertible function. Second, we empirically show that PSG acting as a regularizer to the weight vectors is favorabl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(8 citation statements)
references
References 31 publications
0
8
0
Order By: Relevance
“…QAT method can not perform well in various bit-width and it needs to be retrained when the target bit-width of the model is changed. To give quantization robustness, we utilize the gradient rescaling update rule method [9] which gives the model quantization robustness. More details of model weight updates using gradients are illustrated in the following knowledge distillation section.…”
Section: A Pruningmentioning
confidence: 99%
See 2 more Smart Citations
“…QAT method can not perform well in various bit-width and it needs to be retrained when the target bit-width of the model is changed. To give quantization robustness, we utilize the gradient rescaling update rule method [9] which gives the model quantization robustness. More details of model weight updates using gradients are illustrated in the following knowledge distillation section.…”
Section: A Pruningmentioning
confidence: 99%
“…In this work, we consider the training framework as a compression-friendly model that focuses on the model's sparsity and quantization robustness without a retraining phase (on-the-fly quantization [8], [9]). We introduce the efficient training framework including knowledge distillation, pruning and quantization method dubbed as Quantization Robust Pruning with knowledge distillation (QRPK) method.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, this is not the biggest problem that model compression can cause to a model. In Kim's work [6], he proposed a positionbased scaled gradient as a training optimizer that scales the gradient depending on the position of a weight vector for friendly model compression. While for previous work of [9], [7] and [5], they focus on mimicking activation by mean and variance to represent the distribution of activation in the training dataset.…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, when considering bringing model compression to an object detection scheme, the model is more numerically sensitive than the Image Classification process. For [6] and [2] work of Image Classification, the final result for a model is a certain value which is clamping by softmax distribution between 0 and 1 and pushing its maximum a posteriori on a correct class. While a maximum posterior distribution value does not need to be determined exactly, the bounding boxes represented for object detection have to be fitted with the image's pixel location, especially for lower scale objects in the image.…”
Section: Introductionmentioning
confidence: 99%