2021
DOI: 10.1109/tpami.2021.3084839
|View full text |Cite
|
Sign up to set email alerts
|

Transform Quantization for CNN Compression

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
21
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 40 publications
(21 citation statements)
references
References 52 publications
0
21
0
Order By: Relevance
“…We now present a lower bound for the asymptotic minimax risk (4) in estimating θ from the linear model (2). The proof sketch in this section is outlined via Lemmas 4.1, 4.2, and 4.3 and build up to our proposed lower bound in Thm.…”
Section: Lower Bound For the Minimax Riskmentioning
confidence: 99%
See 2 more Smart Citations
“…We now present a lower bound for the asymptotic minimax risk (4) in estimating θ from the linear model (2). The proof sketch in this section is outlined via Lemmas 4.1, 4.2, and 4.3 and build up to our proposed lower bound in Thm.…”
Section: Lower Bound For the Minimax Riskmentioning
confidence: 99%
“…Model compression techniques employ lossless and lossy source coding schemes to quantize the model parameters subject to finite precision constraints in order to make them deployable on memory-constrained devices [2,3]. In this work, we adopt an information-theoretic approach to the problem of quantizing linear models without any distributional assumptions on the data or the true model.…”
Section: Significance and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…More recently, an iterative pruning and retraining algorithm to further reduce the size of deep models was proposed in [9,21]. The method of network quantization or weight sharing, i.e., employing a clustering algorithm to group the weights in a neural network, and its variants, including vector quantization [22], soft quantization [23,24], fixed point quantization [25], transform quantization [26], and Hessian weighted quantization [11], have been extensively investigated. Matrix factorization, where low-rank approximation of the weights in neural networks is used instead of the original weight matrix, has also been widely studied in [27][28][29].…”
Section: Related Workmentioning
confidence: 99%
“…We consider the following two compression algorithms. The first one is the conditional distribution P Ŵ|W in the proof of achievability (26), which requires the knowledge of w * and is denoted as "Oracle". The second one is the well-known K-means clustering algorithm, where the weights in W are grouped into K clusters and represented by the cluster centers in the reconstruction Ŵ.…”
Section: Evaluation and Visualizationmentioning
confidence: 99%