This work proposes a novel Energy-Aware Network Operator Search (ENOS) approach to address the energy-accuracy trade-offs of a deep neural network (DNN) accelerator. In recent years, novel inference operators such as binary weight, multiplication-free, and deep shift have been proposed to improve the computational efficiency of a DNN. Augmenting the operators, their corresponding novel computing modes such as compute-in-memory and XOR networks have also been explored. However, simplification of DNN operators invariably comes at the cost of lower accuracy, especially on complex processing tasks. While the prior works explore an end-to-end DNN processing with the same operator and computing mode, our proposed ENOS framework allows an optimal layerwise integration of inference operators and computing modes to achieve the desired balance of energy and accuracy. The search in ENOS is formulated as a continuous optimization problem, solvable using typical gradient descent methods, thereby scalable to larger DNNs with minimal increase in training cost. We characterize ENOS under two settings. In the first setting, for digital accelerators, we discuss ENOS on multiplyaccumulate (MAC) cores that can be reconfigured to different operators. ENOS training methods with single and bi-level optimization objectives are discussed and compared. We also discuss a sequential operator assignment strategy in ENOS that only learns the assignment for one layer in one training step, enabling greater flexibility in converging towards the optimal operator allocations. Furthermore, following Bayesian principles, a sampling-based variational mode of ENOS is also presented. ENOS is characterized on popular DNNs ShuffleNet and SqueezeNet on CIFAR10 and CIFAR100. Compared to the conventional uni-operator approaches, under the same energy budget, ENOS improves accuracy by 10-20%. In the second setting, for a hybrid digital and compute-in-memory accelerator, we characterize ENOS to assign both layer-wise computing mode (high precision digital or low precision analog compute-in-memory) as well as operator while staying within the total compute-in-memory budget. Under varying configurations of hybrid accelerator, ENOS can leverage higher energy efficiency of compute-in-memory operations to reduce the operating energy of DNNs by 5× while suffering <1% reduction in accuracy. Characterization results using ENOS show interesting insights, such as amenability of different filters to using low complexity operators, minimizing the energy of inference while maintaining high prediction accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.