On Compressing Deep Models by Low Rank and Sparse Decomposition

Yu, Xiyu; Liu, Tongliang; Wang, Xinchao; Tao, Dacheng

doi:10.1109/cvpr.2017.15

Cited by 356 publications

(200 citation statements)

References 6 publications

Supporting

Mentioning

198

Contrasting

Order By: Relevance

“…Compression Finetuned SVD 2 [34] 2.6x Circulant CNN 2 [7] 3.6x Adaptive Fastfood-16 [34] 3.7x Collins et al [8] 4x Zhou et al [39] 4.3x ACDC [27] 6.3x Network Pruning [14] 9.1x Deep Compression [14] 9.1x GreBdec [38] 10.2x Srinivas et al [30] 10.3x Guo et al [13] 17.9x Binarization ≈32x with interesting areas to explore, such as fast classification and sketch-based image retrieval. Reproducibility: Our implementation can be found on GitHub 1…”

Section: Methodsmentioning

confidence: 99%

Distribution-Aware Binarization of Neural Networks for Sketch Recognition

Prabhu

Batchu

Munagala

et al. 2018

2018 IEEE Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

Deep neural networks are highly effective at a range of computational tasks. However, they tend to be computationally expensive, especially in vision-related problems, and also have large memory requirements. One of the most effective methods to achieve significant improvements in computational/spatial efficiency is to binarize the weights and activations in a network. However, naive binarization results in accuracy drops when applied to networks for most tasks. In this work, we present a highly generalized, distribution-aware approach to binarizing deep networks that allows us to retain the advantages of a binarized network, while reducing accuracy drops. We also develop efficient implementations for our proposed approach across different architectures. We present a theoretical analysis of the technique to show the effective representational power of the resulting layers, and explore the forms of data they model best. Experiments on popular datasets show that our technique offers better accuracies than naive binarization, while retaining the same benefits that binarization provides -with respect to run-time compression, reduction of computational costs, and power consumption.

show abstract

Section: Methodsmentioning

confidence: 99%

Distribution-Aware Binarization of Neural Networks for Sketch Recognition

Prabhu

Batchu

Munagala

et al. 2018

2018 IEEE Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

show abstract

“…[7,9,22]). Building on the observation that weight matrices are often redundant, another line of research has proposed to use matrix factorization [10,15,35] in order to decompose large weight matrices into factors of smaller matrices before inference.…”

Section: Related Workmentioning

confidence: 99%

Training Compact Deep Learning Models for Video Classification Using Circulant Matrices

Araújo

Négrevergne

Chevaleyre

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

In real world scenarios, model accuracy is hardly the only factor to consider. Large models consume more memory and are computationally more intensive, which makes them difficult to train and to deploy, especially on mobile devices. In this paper, we build on recent results at the crossroads of Linear Algebra and Deep Learning which demonstrate how imposing a structure on large weight matrices can be used to reduce the size of the model. We propose very compact models for video classification based on state-of-the-art network architectures such as Deep Bag-of-Frames, NetVLAD and NetFisherVectors. We then conduct thorough experiments using the large YouTube-8M video classification dataset. As we will show, the circulant DBoF embedding achieves an excellent trade-off between size and accuracy.

show abstract

“…Apart from pruning, other techniques for CNN acceleration include quantization [10,6], knowledge distillation [16,39], tensor decomposition [11,38] and low-bit arithmetic [35,34]. These methods are complementary and perpendicular to our pruning-based method, so we do not cover these approaches in the experiments, as a common practice in other works [21,20].…”

Section: Related Workmentioning

confidence: 99%

“…Neural network compression and acceleration is an effective solution to this problem. Several neural network compression techniques have been proposed during the past years, for example, knowledge distillation [16,39], tensor decomposition [11,38], quantization [10,6], and low-bit arithmetic [35,34]. Among these techniques, pruning is an important approach.…”

Section: Introductionmentioning

confidence: 99%

A One-step Pruning-recovery Framework for Acceleration of Convolutional Neural Networks

Wang

Bai

Zhou

et al. 2019

2019 IEEE 31st International Conference on Tools With Artificial Intelligence (ICTAI)

View full text Add to dashboard Cite

Acceleration of convolutional neural network has received increasing attention during the past several years. Among various acceleration techniques, filter pruning has its inherent merit by effectively reducing the number of convolution filters. However, most filter pruning methods resort to tedious and time-consuming layer-by-layer pruningrecovery strategy to avoid a significant drop of accuracy. In this paper, we present an efficient filter pruning framework to solve this problem. Our method accelerates the network in one-step pruning-recovery manner with a novel optimization objective function, which achieves higher accuracy with much less cost compared with existing pruning methods. Furthermore, our method allows network compression with global filter pruning. Given a global pruning rate, it can adaptively determine the pruning rate for each single convolutional layer, while these rates are often set as hyper-parameters in previous approaches. Evaluated on VGG-16 and ResNet-50 using ImageNet, our approach outperforms several state-of-the-art methods with less accuracy drop under the same and even much fewer floatingpoint operations (FLOPs).

show abstract

On Compressing Deep Models by Low Rank and Sparse Decomposition

Cited by 356 publications

References 6 publications

Distribution-Aware Binarization of Neural Networks for Sketch Recognition

Distribution-Aware Binarization of Neural Networks for Sketch Recognition

Training Compact Deep Learning Models for Video Classification Using Circulant Matrices

A One-step Pruning-recovery Framework for Acceleration of Convolutional Neural Networks

Contact Info

Product

Resources

About