Performance Guaranteed Network Acceleration via High-Order Residual Quantization

Li, Zefan; Ni, Bingbing; Zhang, Wenjun; Yang, Xiaokang; Gao, Wen

doi:10.1109/iccv.2017.282

Cited by 100 publications

(71 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…BNNs [23,40] propose to constrain both weights and activations to binary values (i.e., +1 and -1), where the multiply-accumulations can be replaced by purely xnor(·) and popcount(·) operations. To make a trade-off between accuracy and complexity, [13,15,29,48] propose to recursively perform residual quantization and yield a series of binary tensors with decreasing magnitude scales. However, multiple binarizations are sequential process which cannot be paralleled.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation

Zhuang

Shen

Tan

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

160

View full text Add to dashboard Cite

In this paper, we propose to train convolutional neural networks (CNNs) with both binarized weights and activations, leading to quantized models specifically for mobile devices with limited power capacity and computation resources. Previous works on quantizing CNNs seek to approximate the floating-point information using a set of discrete values, which we call value approximation, but typically assume the same architecture as the full-precision networks.In this paper, however, we take a novel "structure approximation" view for quantization-it is very likely that a different architecture may be better for best performance. In particular, we propose a "network decomposition" strategy, named Group-Net, in which we divide the network into groups. In this way, each full-precision group can be effectively reconstructed by aggregating a set of homogeneous binary branches. In addition, we learn effective connections among groups to improve the representational capability. Moreover, the proposed Group-Net shows strong generalization to other tasks. For instance, we extend Group-Net for highly accurate semantic segmentation by embedding rich context into the binary structure. Experiments on both classification and semantic segmentation tasks demonstrate the superior performance of the proposed methods over various popular architectures. In particular, we outperform the previous best binary neural networks in terms of accuracy and major computation savings.

show abstract

Section: Related Workmentioning

confidence: 99%

“…We explore the difference between layer-wise and group-wise design strategies in approach can be treated as a kind of tensor approximation which has similarities with multiple binarizations methods in [13,15,29,30,48] and the differences are described in Sec. 4.…”

Section: Layer-wise Vs Group-wise Binary Decompositionmentioning

confidence: 99%

Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation

Zhuang

Shen

Tan

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

160

View full text Add to dashboard Cite

show abstract

“…Binarization and Convolution Process of XNOR-NetTo further reduce the quantization error, High-Order Residual Quantization (HORQ)[70] adopts a recursive approximation to the full-precision activation based on the quantized residual, instead of one-step approximation used in XNOR-Net. It generates the final quantized activation by a linear combination of the approximation in each recursive step.…”

mentioning

confidence: 99%

Binary neural networks: A survey

Qin

Gong

Liu

et al. 2020

Pattern Recognition

415

171

View full text Add to dashboard Cite

The binary neural network, largely saving the storage and computation, serves as a promising technique for deploying deep models on resource-limited devices.However, the binarization inevitably causes severe information loss, and even worse, its discontinuity brings difficulty to the optimization of the deep network.To address these issues, a variety of algorithms have been proposed, and achieved satisfying progress in recent years. In this paper, we present a comprehensive survey of these algorithms, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error. We also investigate other practical aspects of binary neural networks such as the hardware-friendly design and the training tricks. Then, we give the evaluation and discussions on different tasks, including image classification, object detection and semantic segmentation. Finally, the challenges that may be faced in future research are prospected.the heavy computation and storage still inevitably limit the applications of the deep CNNs in practice. Besides, due to the huge model parameter space, the prediction of the neural networks is usually viewed as a black-box, which brings great challenges to the interpretability of CNNs. Some works like [21,22,23] empirically explore the function of each layer in the network. They visualize the feature maps extracted by different filters and view each filter as a visual unit focusing on different visual components.of the ResNet-50 [28], and meanwhile save more than 75% of parameters and 50% computational time. In the literature, approaches for compressing the deep networks can be classified into five categories: parameter pruning [26,29,30,31], parameter quantizing [32,33,34,35,36,37,38,39,40,41], low-rank parameter factorization [42,43,44,45,46], transferred/compact convolutional filters [47,48,49,50], and knowledge distillation [51,52,53,54,55,56]. The parameter pruning and quantizing mainly focus on eliminating the redundancy in the model parameters respectively by removing the redundant/uncritical ones or compressing the parameter space (e.g. , from the floating-point weights to the integer ones). Low-rank factorization applies the matrix/tensor decomposition techniques to estimate the informative parameters using the proxy ones of small size. The compact convolutional filter based approaches rely on the carefullydesigned structural convolutional filters to reduce the storage and computation complexity. The knowledge distillation methods try to distill a more compact model to reproduce the output of a larger network.Among the existing network compression techniques, quantization based one serves as a promising and fast solution that yields highly compact models compared to their floating-point counterparts, by representing the network weights with very low precision. Along this direction, the most extreme quantization is binarization, the interest...

show abstract

“…Specifically, the quantized weights are exactly equal to αM T Z when Eq.3 has the optimal solution. In contrast, the re-training strategy with the cluster regularization brings less quantization error than the reconstruction-based methods [11,15], since the weight W remains in a highly clustered state after re-training. To further reduce the effects of quantization error to the classification loss, we fine-tune the re-trained model for several epochs.…”

Section: The Whole Quantization Frameworkmentioning

confidence: 99%

“…Rastegari et al presented XNOR-Network [11] that approximated the full-precision weights by introducing a scaling factor during binarization. For pursuing higher accuracy, High-Order Residual Quantization (HORQ) [15] sought to compensate the information loss of binary quantization by conducting convolutional operations on inputs in different scales and then combined the results. The Ternary Weight Network (TWN) [12] introduced zero as a third quantized value and was the first method that achieved decent results on the ILSVRC-12 dataset.…”

Section: Related Workmentioning

confidence: 99%

Cluster Regularized Quantization for Deep Networks Compression

Long

et al. 2019

2019 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

Deep neural networks (DNNs) have achieved great success in a wide range of computer vision areas, but the applications to mobile devices is limited due to their high storage and computational cost. Much efforts have been devoted to compress DNNs. In this paper, we propose a simple yet effective method for deep networks compression, named Cluster Regularized Quantization (CRQ), which can reduce the presentation precision of a full-precision model to ternary values without significant accuracy drop. In particular, the proposed method aims at reducing the quantization error by introducing a cluster regularization term, which is imposed on the full-precision weights to enable them naturally concentrate around the target values. Through explicitly regularizing the weights during the re-training stage, the full-precision model can achieve the smooth transition to the low-bit one. Comprehensive experiments on benchmark datasets demonstrate the effectiveness of the proposed method.

show abstract

Performance Guaranteed Network Acceleration via High-Order Residual Quantization

Cited by 100 publications

References 14 publications

Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation

Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation

Binary neural networks: A survey

Cluster Regularized Quantization for Deep Networks Compression

Contact Info

Product

Resources

About