Computational scalability allows neural networks on embedded systems to provide desirable inference performance while satisfying severe power consumption and computational resource constraints. This paper presents a simple yet scalable inference method called ProgressiveNN, consisting of bitwise binary (BWB) quantization, accumulative bit-serial (ABS) inference, and batch normalization (BN) retraining. ProgressiveNN does not require any network structure modification and obtains the network parameters from a single training. BWB quantization decomposes and transforms each parameter into a bitwise format for ABS inference, which then utilizes the parameters in the most-significant-bit-first order, enabling progressive inference. The evaluation result shows that the proposed method provides computational scalability from 12.5% to 100% for ResNet18 on CIFAR-10/100 with a single set of network parameters. It also shows that BN retraining suppresses accuracy degradation of training performed with low computational cost and restores inference accuracy to 65% at 1-bit width inference. This paper also presents a method to dynamically adjust the bit-precision of the ProgressiveNN to achieve a better trade-off between computational resource use and accuracy for practical applications using sequential data with proximity resemblance. The evaluation result indicates that the accuracy increases by 1.3% with an average bit-length of 2 compared with only the 2-bit BWB network.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.