Structured sparse ternary weight coding of deep neural networks for efficient hardware implementations

Boo, Yoonho; Sung, Wonyong

doi:10.1109/sips.2017.8110021

Cited by 11 publications

(9 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are few studies on pruning with considering the divergent distributions of remaining non-zero weights. The study presented in [33] considers the distribution of nonzero weights, but deals with only the fully-connected layers. Furthermore, the target was to reduce the width of the ternary weight coding without consideration of the accelerator architectures.…”

Section: E Previous Pruning Scheme and Accelerator Architecturementioning

confidence: 99%

See 1 more Smart Citation

Accelerator-Aware Pruning for Convolutional Neural Networks

Kang

2020

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Convolutional neural networks have shown tremendous performance capabilities in computer vision tasks, but their excessive amounts of weight storage and arithmetic operations prevent them from being adopted in embedded environments. One of the solutions involves pruning, where certain unimportant weights are forced to have a value of zero. Many pruning schemes have been proposed, but these have mainly focused on the number of pruned weights, scarcely considering ASIC or FPGA accelerator architectures. When a pruned network is run on an accelerator, the lack of the architecture consideration causes some inefficiency problems, including internal buffer misalignments and load imbalances. This paper proposes a new pruning scheme that reflects accelerator architectures. In the proposed scheme, pruning is performed so that the same number of weights remain for each weight group corresponding to activations fetched simultaneously. In this way, the pruning scheme resolves the inefficiency problems, doubling the accelerator performance. Even with this constraint, the proposed pruning scheme reached a pruning ratio similar to that of previous unconstrained pruning schemes, not only on AlexNet and VGG16 but also on state-ofthe-art very deep networks such as ResNet. Furthermore, the proposed scheme demonstrated a comparable pruning ratio on compact networks such as MobileNet and on slimmed networks that were already pruned in a channel-wise manner. In addition to improving the efficiency of previous sparse accelerators, it will be also shown that the proposed pruning scheme can be used to reduce the logic complexity of sparse accelerators.Index Terms-Deep learning, convolutional neural networks, neural network pruning, neural network accelerator

show abstract

Section: E Previous Pruning Scheme and Accelerator Architecturementioning

confidence: 99%

“…A similar scheme was presented in [33], but the regularity was used only to reduce the amount of weight storage. They did not consider the effect on the accelerator architecture.…”

Section: A Proposed Accelerator-aware Pruning Schemementioning

confidence: 99%

Accelerator-Aware Pruning for Convolutional Neural Networks

Kang

2020

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

show abstract

“…Interest in low precision CNNs has dramatically increased in recent years due to research which has shown that similar accuracy to floating point can be achieved [Boo and Sung 2017;Courbariaux et al 2016;Faraone et al 2017;Mellempudi et al 2017;Rastegari et al 2016;. Due to the high computational requirements of CNNs, reduced precision implementations offer opportunities to reduce hardware costs and training times.…”

Section: Low Precision Networkmentioning

confidence: 99%

“…The computational complexity of convolutional neural networks (CNN) imposes limits to certain applications in practice [Jouppi et al 2017]. There are many approaches to this problem with a common strategy for the inference problem being to reduce the precision of arithmetic operations, or to increase sparsity [Boo and Sung 2017;Courbariaux et al 2016;Faraone et al 2017;Mellempudi et al 2017;Rastegari et al 2016;. It has been shown that low precision networks can achieve comparable performance to their full precision counterparts [Courbariaux et al 2016;].…”

Section: Introductionmentioning

confidence: 99%

Unrolling Ternary Neural Networks

Tridgell

Kumm

Hardieck

et al. 2019

ACM Trans. Reconfigurable Technol. Syst.

View full text Add to dashboard Cite

The computational complexity of neural networks for large scale or real-time applications necessitates hardware acceleration. Most approaches assume that the network architecture and parameters are unknown at design time, permitting usage in a large number of applications. This paper demonstrates, for the case where the neural network architecture and ternary weight values are known a priori, that extremely high throughput implementations of neural network inference can be made by customising the datapath and routing to remove unnecessary computations and data movement. This approach is ideally suited to FPGA implementations as a specialized implementation of a trained network improves efficiency while still retaining generality with the reconfigurability of an FPGA. A VGG style network with ternary weights and fixed point activations is implemented for the CIFAR10 dataset on Amazon's AWS F1 instance. This paper demonstrates how to remove 90% of the operations in convolutional layers by exploiting sparsity and compile-time optimizations. The implementation in hardware achieves 90.9 ± 0.1% accuracy and 122 k frames per second, with a latency of only 29 µs, which is the fastest CNN inference implementation reported so far on an FPGA.

show abstract

“…For hardware implementation in embedded systems, it is important to achieve high performance and high recognition accuracy with compact network models. Boo the structured sparsity [1], where a rule for look-up tables is applied to the training algorithm. Chen et al propose a reconfigurable accelerator which contains a Run-Length Coding (RLC) module to compress the feature maps with consecutive zeros [4].…”

Section: Introductionmentioning

confidence: 99%

Condensation-Net: Memory-Efficient Network Architecture With Cross-Channel Pooling Layers and Virtual Feature Maps

Tse-Wei

Yoshinaga

Gao³

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

View full text Add to dashboard Cite

Lightweight convolutional neural networks" is an important research topic in the field of embedded vision. To implement image recognition tasks on a resource-limited hardware platform, it is necessary to reduce the memory size and the computational cost. The contribution of this paper is stated as follows. First, we propose an algorithm to process a specific network architecture (Condensation-Net) without increasing the maximum memory storage for feature maps. The architecture for virtual feature maps saves 26.5% of memory bandwidth by calculating the results of cross-channel pooling before storing the feature map into the memory. Second, we show that cross-channel pooling can improve the accuracy of object detection tasks, such as face detection, because it increases the number of filter weights. Compared with Tiny-YOLOv2, the improvement of accuracy is 2.0% for quantized networks and 1.5% for full-precision networks when the false-positive rate is 0.1. Last but not the least, the analysis results show that the overhead to support the cross-channel pooling with the proposed hardware architecture is negligible small. The extra memory cost to support Condensation-Net is 0.2% of the total size, and the extra gate count is only 1.0% of the total size.

show abstract

Structured sparse ternary weight coding of deep neural networks for efficient hardware implementations

Cited by 11 publications

References 11 publications

Accelerator-Aware Pruning for Convolutional Neural Networks

Accelerator-Aware Pruning for Convolutional Neural Networks

Unrolling Ternary Neural Networks

Condensation-Net: Memory-Efficient Network Architecture With Cross-Channel Pooling Layers and Virtual Feature Maps

Contact Info

Product

Resources

About