2018
DOI: 10.1109/tnnls.2017.2774288
|View full text |Cite
|
Sign up to set email alerts
|

Quantized CNN: A Unified Approach to Accelerate and Compress Convolutional Networks

Abstract: We are witnessing an explosive development and widespread application of deep neural networks (DNNs) in various fields. However, DNN models, especially a convolutional neural network (CNN), usually involve massive parameters and are computationally expensive, making them extremely dependent on high-performance hardware. This prohibits their further extensions, e.g., applications on mobile devices. In this paper, we present a quantized CNN, a unified approach to accelerate and compress convolutional networks. G… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
63
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 107 publications
(63 citation statements)
references
References 11 publications
0
63
0
Order By: Relevance
“…. , C y andW cy is the c y -th column vector ofW and C x = C y = 1024 is selected for f (14) (·). In (10), {f (i) (·)} i={2,5,8,11} represent the convolutional layers.…”
Section: Network Architecture and Trainingmentioning
confidence: 99%
See 1 more Smart Citation
“…. , C y andW cy is the c y -th column vector ofW and C x = C y = 1024 is selected for f (14) (·). In (10), {f (i) (·)} i={2,5,8,11} represent the convolutional layers.…”
Section: Network Architecture and Trainingmentioning
confidence: 99%
“…where d x × d y is the size of the convolutional kernel, and V x × V y are the size of the response of a convolutional layer.W vy,v k ∈ R Vx denotes the weights of the v y -th convolutional kernel, andX px ∈ R Vx is the input feature map at spatial position p x . Hence we define p x and p k as the 2-D spatial positions in the feature maps and convolutional kernels, respectively [13], [14]. In the proposed architecture, we use 256 filters, the first two of which are of size 5 × 5 and the remaining two have 3 × 3 filters.…”
Section: Network Architecture and Trainingmentioning
confidence: 99%
“…In order to accelerate inference and compress the size of DNN models, many network quantization methods are proposed. Some studies focus on scalar and vector quantization [4,7], while others center on fixed-point quantization [18,19].…”
Section: Neural Network Quantizationmentioning
confidence: 99%
“…Recently, many neural network quantization methods have been proposed. Gong Y. et al [7] and Cheng J. et al [4] explored scalar and vector quantization methods for compressing DNNs. Zhou A. et al [18], Zhou S. et al [19] proposed fixed-point quantization methods.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, a variety of CNN compression methods have been proposed to tackle the aforementioned issues such as quantization [9], [10], weight and feature approximation [11], encoding [12], approximation [13], and pruning [14], [15]. Wherein, weight pruning based methods achieve the highest compression performance since there are considerable subtle weights in most of pre-trained CNNs.…”
Section: Introductionmentioning
confidence: 99%