Automated Multi-Stage Compression of Neural Networks

Gusak, Julia; Kholiavchenko, Maksym; Ponomarev, Evgeny; Markeeva, Larisa; Blagoveschensky, Philip; Cichocki, Andrzej; Oseledets, Ivan

doi:10.1109/iccvw.2019.00306

Cited by 45 publications

(46 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is mentioned in[46] that the difference among different random matrices is negligible. We have also confirmed this in our simulations 5. Depending on which side the random matrix is multiplied.VOLUME , 2020…”

supporting

confidence: 82%

“…This procedure is performed by multiplying a given matrix by a random matrix from the right-hand or left-hand side. It has been shown that this preserves the Euclidean distances among columns or rows approximately 5 [44], [45]. Let X ∈ R I×J be a given data matrix, and R be a target rank.…”

Section: A Random Projectionmentioning

confidence: 99%

“…It is crucial to preserve the multidimensional structure of the data tensors in order to extract meaningful latent variables and reveal the hidden structures of the data tensors. Tucker decomposition [1]- [3] is a natural generalization of the SVD to higher-order data tensors and has found various applications such as reducing the number of parameters in deep neural networks [4], [5], handwritten digit classification [6], computer vision [7], recommender systems [8]- [11], signal processing [12]- [14], etc. Deterministic algorithms for decomposing large-scale data tensors into the Tucker format are prohibitive and require high memory and computational complexity or only applicable for structured data tensors [15], [16].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Randomized Algorithms for Computation of Tucker Decomposition and Higher Order SVD (HOSVD)

et al. 2021

Self Cite

View full text Add to dashboard Cite

Big data analysis has become a crucial part of new emerging technologies such as the internet of things, cyber-physical analysis, deep learning, anomaly detection, etc. Among many other techniques, dimensionality reduction plays a key role in such analyses and facilitates feature selection and feature extraction. Randomized algorithms are efficient tools for handling big data tensors. They accelerate decomposing large-scale data tensors by reducing the computational complexity of deterministic algorithms and the communication among different levels of memory hierarchy, which is the main bottleneck in modern computing environments and architectures. In this paper, we review recent advances in randomization for computation of Tucker decomposition and Higher Order SVD (HOSVD). We discuss random projection and sampling approaches, single-pass and multi-pass randomized algorithms and how to utilize them in the computation of the Tucker decomposition and the HOSVD. Simulations on synthetic and real datasets are provided to compare the performance of some of best and most promising algorithms.

show abstract

supporting

confidence: 82%

Section: A Random Projectionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Randomized Algorithms for Computation of Tucker Decomposition and Higher Order SVD (HOSVD)

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…Low-rank decomposition: Low-rank decomposition algorithms [ 30 , 31 , 32 ] use a lower-rank set instead of the original set of parameters to approximate the CNN to achieve compression. Swaminathan et al [ 31 ] argue that the low-rank decomposition of weight matrices should consider influence of both input as well as output neurons of a layer.…”

Section: Related Workmentioning

confidence: 99%

Implementation of Lightweight Convolutional Neural Networks via Layer-Wise Differentiable Compression

Diao

Hao

et al. 2021

Sensors

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) have achieved significant breakthroughs in various domains, such as natural language processing (NLP), and computer vision. However, performance improvement is often accompanied by large model size and computation costs, which make it not suitable for resource-constrained devices. Consequently, there is an urgent need to compress CNNs, so as to reduce model size and computation costs. This paper proposes a layer-wise differentiable compression (LWDC) algorithm for compressing CNNs structurally. A differentiable selection operator OS is embedded in the model to compress and train the model simultaneously by gradient descent in one go. Instead of pruning parameters from redundant operators by contrast to most of the existing methods, our method replaces the original bulky operators with more lightweight ones directly, which only needs to specify the set of lightweight operators and the regularization factor in advance, rather than the compression rate for each layer. The compressed model produced by our method is generic and does not need any special hardware/software support. Experimental results on CIFAR-10, CIFAR-100 and ImageNet have demonstrated the effectiveness of our method. LWDC obtains more significant compression than state-of-the-art methods in most cases, while having lower performance degradation. The impact of lightweight operators and regularization factor on the compression rate and accuracy also is evaluated.

show abstract

“…Additionally, low-rank approximations are also useful for speeding up the evaluation of convolutional neural networks [151] by using a low-rank representation of the filters, which are used for detecting image features. For a very similar task the authors in [121,122,172] rely on optimized tensor decompositions. Note that recently these techniques have also been applied to adversarial networks [48].…”

Section: Numerical Linear Algebra In Deep Learningmentioning

confidence: 99%

A literature survey of matrix methods for data science

Stoll

2020

GAMM-Mitteilungen

View full text Add to dashboard Cite

Efficient numerical linear algebra is a core ingredient in many applications across almost all scientific and industrial disciplines. With this survey we want to illustrate that numerical linear algebra has played and is playing a crucial role in enabling and improving data science computations with many new developments being fueled by the availability of data and computing resources. We highlight the role of various different factorizations and the power of changing the representation of the data as well as discussing topics such as randomized algorithms, functions of matrices, and high‐dimensional problems. We briefly touch upon the role of techniques from numerical linear algebra used within deep learning.

show abstract

Automated Multi-Stage Compression of Neural Networks

Cited by 45 publications

References 19 publications

Randomized Algorithms for Computation of Tucker Decomposition and Higher Order SVD (HOSVD)

Randomized Algorithms for Computation of Tucker Decomposition and Higher Order SVD (HOSVD)

Implementation of Lightweight Convolutional Neural Networks via Layer-Wise Differentiable Compression

A literature survey of matrix methods for data science

Contact Info

Product

Resources

About