2017 IEEE International Workshop on Signal Processing Systems (SiPS) 2017
DOI: 10.1109/sips.2017.8110021
|View full text |Cite
|
Sign up to set email alerts
|

Structured sparse ternary weight coding of deep neural networks for efficient hardware implementations

Abstract: Deep neural networks (DNNs) usually demand a large amount of operations for real-time inference. Especially, fully-connected layers contain a large number of weights, thus they usually need many off-chip memory accesses for inference. We propose a weight compression method for deep neural networks, which allows values of +1 or -1 only at predetermined positions of the weights so that decoding using a table can be conducted easily. For example, the structured sparse (8,2) coding allows at most two non-zero valu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 11 publications
0
9
0
Order By: Relevance
“…There are few studies on pruning with considering the divergent distributions of remaining non-zero weights. The study presented in [33] considers the distribution of nonzero weights, but deals with only the fully-connected layers. Furthermore, the target was to reduce the width of the ternary weight coding without consideration of the accelerator architectures.…”
Section: E Previous Pruning Scheme and Accelerator Architecturementioning
confidence: 99%
See 1 more Smart Citation
“…There are few studies on pruning with considering the divergent distributions of remaining non-zero weights. The study presented in [33] considers the distribution of nonzero weights, but deals with only the fully-connected layers. Furthermore, the target was to reduce the width of the ternary weight coding without consideration of the accelerator architectures.…”
Section: E Previous Pruning Scheme and Accelerator Architecturementioning
confidence: 99%
“…A similar scheme was presented in [33], but the regularity was used only to reduce the amount of weight storage. They did not consider the effect on the accelerator architecture.…”
Section: A Proposed Accelerator-aware Pruning Schemementioning
confidence: 99%
“…Interest in low precision CNNs has dramatically increased in recent years due to research which has shown that similar accuracy to floating point can be achieved [Boo and Sung 2017;Courbariaux et al 2016;Faraone et al 2017;Mellempudi et al 2017;Rastegari et al 2016;. Due to the high computational requirements of CNNs, reduced precision implementations offer opportunities to reduce hardware costs and training times.…”
Section: Low Precision Networkmentioning
confidence: 99%
“…The computational complexity of convolutional neural networks (CNN) imposes limits to certain applications in practice [Jouppi et al 2017]. There are many approaches to this problem with a common strategy for the inference problem being to reduce the precision of arithmetic operations, or to increase sparsity [Boo and Sung 2017;Courbariaux et al 2016;Faraone et al 2017;Mellempudi et al 2017;Rastegari et al 2016;. It has been shown that low precision networks can achieve comparable performance to their full precision counterparts [Courbariaux et al 2016;].…”
Section: Introductionmentioning
confidence: 99%
“…For hardware implementation in embedded systems, it is important to achieve high performance and high recognition accuracy with compact network models. Boo the structured sparsity [1], where a rule for look-up tables is applied to the training algorithm. Chen et al propose a reconfigurable accelerator which contains a Run-Length Coding (RLC) module to compress the feature maps with consecutive zeros [4].…”
Section: Introductionmentioning
confidence: 99%