Summary
Computationally intensive inference tasks of deep neural networks have brought about a revolution in accelerator architecture, aiming to reduce power consumption as well as latency. The key figure‐of‐merit in hardware inference accelerators is the number of multiply‐and‐accumulation operations per watt (MACs/W); the state‐of‐ the‐art MACs/W, so far, has been several hundreds Giga‐MACs/W. We propose a Tera‐ MACS/W neural hardware inference accelerator (TMA) with 8‐bit activations and scalable integer weights less than 1‐byte. The architecture's main feature is a configurable neural processing element for matrix‐vector operations. The proposed neural processing element uses a multiplier‐less massive parallel processor that works without multipliers, which makes it attractive for energy efficient high‐performance neural network applications. We benchmark our system's latency, power, and performance using Alexnet trained on ImageNet. Finally, we compared our accelerator's throughput and power consumption to that of the prior works. The proposed accelerator outperforms the state‐of‐the‐art counterparts, in terms of the energy and area efficiency, achieving 2.3 TMACs/W@1.0 V on a 28‐nm Virtex‐7 FPGA chip.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.