We are witnessing an explosive development and widespread application of deep neural networks (DNNs) in various fields. However, DNN models, especially a convolutional neural network (CNN), usually involve massive parameters and are computationally expensive, making them extremely dependent on high-performance hardware. This prohibits their further extensions, e.g., applications on mobile devices. In this paper, we present a quantized CNN, a unified approach to accelerate and compress convolutional networks. Guided by minimizing the approximation error of individual layer's response, both fully connected and convolutional layers are carefully quantized. The inference computation can be effectively carried out on the quantized network, with much lower memory and storage consumption. Quantitative evaluation on two publicly available benchmarks demonstrates the promising performance of our approach: with comparable classification accuracy, it achieves 4 to $6 \times $ acceleration and 15 to $20\times $ compression. With our method, accurate image classification can even be directly carried out on mobile devices within 1 s.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.