C
            <scp>ir</scp>
            CNN

Ding, Caiwen; Liao, Siyu; Wang, Yanzhi; Li, Zhe; Liu, Ning; Zhuo, Youwei; Wang, Chao; Qian, Xuehai; Bai, Yu; Yuan, Geng; Ma, Xiaolong; Zhang, Yipeng; Tang, Jian; Qiu, Qinru; Lin, Xue; Yuan, Bo

doi:10.1145/3123939.3124552

Cited by 179 publications

(12 citation statements)

References 70 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We do an extensive comparison of HMD with 2 other compression techniques -model pruning and matrix factorization. Additionally, we also compared HMD with a structured matrix-based compression technique called block circular decomposition (BCD) [2,9]. BCD-compressed networks were able to recover the baseline accuracy for 2× -4× compression.…”

Section: Resultsmentioning

confidence: 99%

Run-Time Efficient RNN Compression for Inference on Edge Devices

Thakker¹,

Beu²,

Gope³

et al. 2019

2019 2nd Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)

View full text Add to dashboard Cite

Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints. As a result, there is a need for compression techniques that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper explores a new compressed RNN cell implementation called Hybrid Matrix Decomposition (HMD) that achieves this dual objective. This scheme divides the weight matrix into two parts -an unconstrained upper half and a lower half composed of rank-1 blocks. This results in output features where the upper subvector has "richer" features while the lower-sub vector has "constrained" features". HMD can compress RNNs by a factor of 2-4× while having a faster run-time than pruning and retaining more model accuracy than matrix factorization. We evaluate this technique on 3 benchmarks.

show abstract

Section: Resultsmentioning

confidence: 99%

Run-Time Efficient RNN Compression for Inference on Edge Devices

Thakker¹,

Beu²,

Gope³

et al. 2019

2019 2nd Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)

View full text Add to dashboard Cite

show abstract

“…CirCNN networks. CirCNN implementation of neural networks in introduced in [2] is a promising approach to reduce number of parameters while preserving networks' topologies. This is done by replacing matrices and convolution kernels in a neural network by block circulant matrices and block circulant convolution kernels.…”

Section: Notations and Preliminariesmentioning

confidence: 99%

“…For a convolution layer with kernel W = (W ijk ), where the k and indices represent the input and output channels and the i and j indices represent 2D kernels, we say W is block-circulant if for each fixed i, j the resulting matrix (W ijk ) k is block-circulant. As suggested in [2], for a single layer the block size should be a constant, while one can choose different block sizes for different layers. Using CirCNN implementation can significantly reduce number of learnable parameters.…”

Section: Notations and Preliminariesmentioning

confidence: 99%

“…Entries in y are identically distributed with distribution Y with mean 0; 2. entries in ∂L/∂x are identically distributed with distribution ∂L/∂X with mean 0; 3. V(Y) = V(X ) and V(∂L/∂Y) = V(∂L/∂X ); 4. and E(||∆w|| 2 2 ) = E(||∂L/∂w|| 2 2 ). Note that conditions 1 to 3 are the adaptations of the Xavier/He initialization conditions to the layer type (6), while the condition 4 is a probabilistic description of a norm equality, which is explained at the end of this section.…”

mentioning

confidence: 99%

See 1 more Smart Citation

A new initialization method based on normed statistical spaces in deep networks

Yang¹,

Ding

Chan³

et al. 2021

Inverse Problems &Amp; Imaging

View full text Add to dashboard Cite

Training deep neural networks can be difficult. For classical neural networks, the initialization method by Xavier and Yoshua which is later generalized by He, Zhang, Ren and Sun can facilitate stable training. However, with the recent development of new layer types, we find that the above mentioned initialization methods may fail to lead to successful training. Based on these two methods, we will propose a new initialization by studying the parameter space of a network. Our principal is to put constrains on the growth of parameters in different layers in a consistent way. In order to do so, we introduce a norm to the parameter space and use this norm to measure the growth of parameters. Our new method is suitable for a wide range of layer types, especially for layers with parameter-sharing weight matrices.

show abstract

“…Structured matrices have shown significant potential for compression of NN (Sindhwani et al, 2015;Ding et al, 2017;Cheng et al, 2015;Thakker et al, 2020). Block circular compression is an extension of structured matrix based compression technique, converting every block in a matrix into structured matrix.…”

Section: Related Workmentioning

confidence: 99%

Rank and run-time aware compression of NLP Applications

Thakker

Beu²,

Gope

et al. 2020

Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing

View full text Add to dashboard Cite

Sequence model based NLP applications can be large. Yet, many applications that benefit from them run on small devices with very limited compute and storage capabilities, while still having run-time constraints. As a result, there is a need for a compression technique that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper proposes a new compression technique called Hybrid Matrix Factorization that achieves this dual objective. HLF improves low-rank matrix factorization (LMF) techniques by doubling the rank of the matrix using an intelligent hybrid-structure leading to better accuracy than LMF. Further, by preserving dense matrices, it leads to faster inference run-time than pruning or structure matrix based compression technique. We evaluate the impact of this technique on 5 NLP benchmarks across multiple tasks (Translation, Intent Detection, Language Modeling) and show that for similar accuracy values and compression factors, HLF can achieve more than 2.32× faster inference run-time than pruning and 16.77% better accuracy than LMF.

show abstract

C ir CNN

Cited by 179 publications

References 70 publications

Run-Time Efficient RNN Compression for Inference on Edge Devices

Run-Time Efficient RNN Compression for Inference on Edge Devices

A new initialization method based on normed statistical spaces in deep networks

Rank and run-time aware compression of NLP Applications

Contact Info

Product

Resources

About