Linyan Mei scite author profile

Linyan Mei

5Publications

91Citation Statements Received

86Citation Statements Given

How they've been cited

210

How they cite others

Affiliations

KU Leuven, META Health

Publications

Order By: Most citations

Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing

Camus

Mei

Enz

et al. 2019

IEEE J. Emerg. Sel. Topics Circuits Syst.

View full text Add to dashboard Cite

The current trend for deep learning has come with an enormous computational need for billions of Multiply-Accumulate (MAC) operations per inference. Fortunately, reduced precision has demonstrated large benefits with low impact on accuracy, paving the way towards processing in mobile devices and IoT nodes. To this end, various precision-scalable MAC architectures optimized for neural networks have recently been proposed. Yet, it has been hard to comprehend their differences and make a fair judgment of their relative benefits as they have been implemented with different technologies and performance targets. To overcome this, this work exhaustively reviews the state-of-the-art precision-scalable MAC architectures and unifies them in a new taxonomy. Subsequently, these different topologies are thoroughly benchmarked in a 28 nm commercial CMOS process, across a wide range of performance targets, and with precision ranging from 2 to 8 bits. Circuits are analyzed for each precision as well as jointly in practical use cases, highlighting the impact of architectures and scalability in terms of energy, throughput, area and bandwidth, aiming to understand the key trends to reduce computation costs in neural-network processing.Index Terms-ASIC, deep neural networks, precision-scalable circuits, configurable circuits, MAC, multiply-accumulate units. I. INTRODUCTIONE MBEDDED deep learning has gained a lot of attention nowadays due to its broad application prospects and vast potential market. However, the main challenge to embrace this era of edge intelligence comes from the supply-anddemand gap between the limited energy budget of embedded devices, often battery powered, and the computationallyintensive deep-learning algorithms, requiring billions of Multiply-Accumulate (MAC) operations and data movements.To alleviate this unbalanced relationship, many approaches have been investigated at different levels of abstraction. At algorithmic level, researchers have introduced hardware-

show abstract

ZigZag: Enlarging Joint Architecture-Mapping Design Space Exploration for DNN Accelerators

Mei

Houshmand

Jain

et al. 2021

IEEE Trans. Comput.

View full text Add to dashboard Cite

Sub-Word Parallel Precision-Scalable MAC Engines for Efficient Embedded DNN Inference

Mei

Dandekar

Rodopoulos

et al. 2019

View full text Add to dashboard Cite

To enable energy-efficient embedded execution of Deep Neural Networks (DNNs), the critical sections of these workloads, their multiply-accumulate (MAC) operations, need to be carefully optimized. The SotA pursues this through runtime precision-scalable MAC operators, which can support the varying precision needs of DNNs in an energy-efficient way. Yet, to implement the adaptable precision MAC operation, most SotA solutions rely on separately optimized low precision multipliers and a precision-variable accumulation scheme, with the possible disadvantages of a high control complexity and degraded throughput. This paper, first optimizes one of the most effective SotA techniques to support fully-connected DNN layers. This mode, exploiting the transformation of a high precision multiplier into independent parallel low-precision multipliers, will be called the Sum Separate (SS) mode. In addition, this work suggests an alternative low-precision scheme, i.e. the implicit accumulation of multiple low precision products within the multiplier itself, called the Sum Together (ST) mode. Based on the two types of MAC arrangements explored, corresponding architectures have been proposed to implement DNN processing. The two architectures, yielding the same throughput, are compared in different working precisions (2/4/8/16-bit), based on Post-Synthesis simulation. The result shows that the proposed ST-Mode based architecture outperforms the earlier SS-Mode by up to ×1.6 on Energy Efficiency (TOPS/W) and ×1.5 on Area Efficiency (GOPS/mm 2 ).

show abstract

Opportunities and Limitations of Emerging Analog in-Memory Compute DNN Architectures

Houshmand¹,

Cosemans²,

Mei³

et al. 2020

View full text Add to dashboard Cite

This paper assesses the benefits and pitfalls of Analog in-Memory Compute (AiMC) solutions from the accelerator level. The study shows that AiMC can improve efficiency significantly, yet only when the AiMC array topology and the memory technology are co-optimized with the memory and system architecture. The paper provides design guidelines to maximally exploit the opportunities of AiMC technology, and gives an outlook towards emerging memory technologies.

show abstract

Processor Architecture Optimization for Spatially Dynamic Neural Networks

Colleman

Verelst

Mei

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Linyan Mei

Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing

ZigZag: Enlarging Joint Architecture-Mapping Design Space Exploration for DNN Accelerators

Sub-Word Parallel Precision-Scalable MAC Engines for Efficient Embedded DNN Inference

Opportunities and Limitations of Emerging Analog in-Memory Compute DNN Architectures

Processor Architecture Optimization for Spatially Dynamic Neural Networks

Contact Info

Product

Resources

About