2020
DOI: 10.48550/arxiv.2009.05647
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation

Murad Tukan,
Alaa Maalouf,
Matan Weksler
et al.

Abstract: A common technique for compressing a neural network is to compute the k-rank 2 approximation A k,2 of the matrix A ∈ R n×d that corresponds to a fully connected layer (or embedding layer). Here, d is the number of the neurons in the layer, n is the number in the next one, and A k,2 can be stored in O((n + d)k) memory instead of O(nd). This 2-approximation minimizes the sum over every entry to the power of p = 2 in the matrix A − A k,2 , among every matrix A k,2 ∈ R n×d whose rank is k. While it can be computed… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 36 publications
0
4
0
Order By: Relevance
“…Similarly to the original variant (the case of squared distance), we can use the EM method to obtain a "good guess" for a solution of this problem, however, the EM method requires an algorithm that solves the problem for the case of k = 1, i.e., computing the subspace that minimizes the sum of (non-squared) distances from those columns (for the sum of squared distances case, SVD is this algorithm). Unfortunately, there is only approximation algorithms (Clarkson and Woodruff, 2015;Tukan et al, 2020) for this case, and the deterministic versions are expensive in terms of running time.…”
Section: A2 Clustering Methodsmentioning
confidence: 99%
“…Similarly to the original variant (the case of squared distance), we can use the EM method to obtain a "good guess" for a solution of this problem, however, the EM method requires an algorithm that solves the problem for the case of k = 1, i.e., computing the subspace that minimizes the sum of (non-squared) distances from those columns (for the sum of squared distances case, SVD is this algorithm). Unfortunately, there is only approximation algorithms (Clarkson and Woodruff, 2015;Tukan et al, 2020) for this case, and the deterministic versions are expensive in terms of running time.…”
Section: A2 Clustering Methodsmentioning
confidence: 99%
“…Knowledge Distillation (KD) Hinton et al (2015) is leveraged in some works (Jiao et al (2019); Sanh et al (2019)) to bridge the gap between a compact model and BERT. Further, many works Mao et al (2020); Tukan et al (2020) use Low rank matrix factorization to deal with the issue. The authors of ROSITA (Liu et al (2021)) outline a methodology to combine weight pruning, KD and low rank factorization.…”
Section: Compressionmentioning
confidence: 99%
“…The only difference is that the SVD computation of the optimal subspace for a cluster of points (k = 1) should be replaced by more involved approximation algorithm for computing the subspace that minimizes sum over distances to the power of q; see e.g. Tukan et al (2020b); Clarkson and Woodruff (2015).…”
Section: Generalizations and Extensionsmentioning
confidence: 99%