The current trend for deep learning has come with an enormous computational need for billions of Multiply-Accumulate (MAC) operations per inference. Fortunately, reduced precision has demonstrated large benefits with low impact on accuracy, paving the way towards processing in mobile devices and IoT nodes. To this end, various precision-scalable MAC architectures optimized for neural networks have recently been proposed. Yet, it has been hard to comprehend their differences and make a fair judgment of their relative benefits as they have been implemented with different technologies and performance targets. To overcome this, this work exhaustively reviews the state-of-the-art precision-scalable MAC architectures and unifies them in a new taxonomy. Subsequently, these different topologies are thoroughly benchmarked in a 28 nm commercial CMOS process, across a wide range of performance targets, and with precision ranging from 2 to 8 bits. Circuits are analyzed for each precision as well as jointly in practical use cases, highlighting the impact of architectures and scalability in terms of energy, throughput, area and bandwidth, aiming to understand the key trends to reduce computation costs in neural-network processing.Index Terms-ASIC, deep neural networks, precision-scalable circuits, configurable circuits, MAC, multiply-accumulate units.
I. INTRODUCTIONE MBEDDED deep learning has gained a lot of attention nowadays due to its broad application prospects and vast potential market. However, the main challenge to embrace this era of edge intelligence comes from the supply-anddemand gap between the limited energy budget of embedded devices, often battery powered, and the computationallyintensive deep-learning algorithms, requiring billions of Multiply-Accumulate (MAC) operations and data movements.To alleviate this unbalanced relationship, many approaches have been investigated at different levels of abstraction. At algorithmic level, researchers have introduced hardware-
To enable energy-efficient embedded execution of Deep Neural Networks (DNNs), the critical sections of these workloads, their multiply-accumulate (MAC) operations, need to be carefully optimized. The SotA pursues this through runtime precision-scalable MAC operators, which can support the varying precision needs of DNNs in an energy-efficient way. Yet, to implement the adaptable precision MAC operation, most SotA solutions rely on separately optimized low precision multipliers and a precision-variable accumulation scheme, with the possible disadvantages of a high control complexity and degraded throughput. This paper, first optimizes one of the most effective SotA techniques to support fully-connected DNN layers. This mode, exploiting the transformation of a high precision multiplier into independent parallel low-precision multipliers, will be called the Sum Separate (SS) mode. In addition, this work suggests an alternative low-precision scheme, i.e. the implicit accumulation of multiple low precision products within the multiplier itself, called the Sum Together (ST) mode. Based on the two types of MAC arrangements explored, corresponding architectures have been proposed to implement DNN processing. The two architectures, yielding the same throughput, are compared in different working precisions (2/4/8/16-bit), based on Post-Synthesis simulation. The result shows that the proposed ST-Mode based architecture outperforms the earlier SS-Mode by up to ×1.6 on Energy Efficiency (TOPS/W) and ×1.5 on Area Efficiency (GOPS/mm 2 ).
This paper assesses the benefits and pitfalls of Analog in-Memory Compute (AiMC) solutions from the accelerator level. The study shows that AiMC can improve efficiency significantly, yet only when the AiMC array topology and the memory technology are co-optimized with the memory and system architecture. The paper provides design guidelines to maximally exploit the opportunities of AiMC technology, and gives an outlook towards emerging memory technologies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.