A layer‐wise frequency scaling for a neural processing unit

Chung, Jaehoon; Kim, HyunMi; Shin, Kyoung-Seon; Lyuh, Chun‐Gi; Cho, Yong Cheol Peter; Han, Jinho; Kwon, Young–Se; Gong, Young‐Ho; Chung, Sung Woo

doi:10.4218/etrij.2022-0094

ETRI Journal

2022

DOI: 10.4218/etrij.2022-0094

|View full text |Cite

A layer‐wise frequency scaling for a neural processing unit

Jaehoon Chung

HyunMi Kim

Kyoung-Seon Shin

et al.

Abstract: Dynamic voltage frequency scaling (DVFS) has been widely adopted for runtime power management of various processing units. In the case of neural processing units (NPUs), power management of neural network applications is required to adjust the frequency and voltage every layer to consider the power behavior and performance of each layer. Unfortunately, DVFS is inappropriate for layer-wise run-time power management of NPUs due to the long latency of voltage scaling compared with each layer execution time. Becau… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article2

Relationship

Self Cite0

Independent2

Authors

Journals

Cited by 2 publications

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

PF‐GEMV: Utilization maximizing architecture in fast matrix–vector multiplication for GPT‐2 inference

Kim,

Lee,

Lyuh

2024

ETRI Journal

View full text Add to dashboard Cite

Owing to the widespread advancement of transformer‐based artificial neural networks, artificial intelligence (AI) processors are now required to perform matrix–vector multiplication in addition to the conventional matrix–matrix multiplication. However, current AI processor architectures are optimized for general matrix–matrix multiplications (GEMMs), which causes significant throughput degradation when processing general matrix–vector multiplications (GEMVs). In this study, we proposed a port‐folding GEMV (PF‐GEMV) scheme employing multiformat and low‐precision techniques while reusing an outer product‐based processor optimized for conventional GEMM operations. This approach achieves 93.7% utilization in GEMV operations with an 8‐bit format on an 8 8 processor, thus resulting in a 7.5 increase in throughput compared with that of the original scheme. Furthermore, when applied to the matrix operation of the GPT‐2 large model, an increase in speed by 7 is achieved in single‐batch inferences.

show abstract

PF‐GEMV: Utilization maximizing architecture in fast matrix–vector multiplication for GPT‐2 inference

Kim,

Lee,

Lyuh

2024

ETRI Journal

View full text Add to dashboard Cite

show abstract

XEM: Tensor accelerator for AB21 supercomputing artificial intelligence processor

Jeon,

Lee,

Lee

et al. 2024

ETRI Journal

View full text Add to dashboard Cite

As computing systems become increasingly larger, high‐performance computing (HPC) is gaining importance. In particular, as hyperscale artificial intelligence (AI) applications, such as large language models emerge, HPC has become important even in the field of AI. Important operations in hyperscale AI and HPC are mainly linear algebraic operations based on tensors. An AB21 supercomputing AI processor has been proposed to accelerate such applications. This study proposes a XEM accelerator to accelerate linear algebraic operations in an AB21 processor effectively. The XEM accelerator has outer product‐based parallel floating‐point units that can efficiently process tensor operations. We provide hardware details of the XEM architecture and introduce new instructions for controlling the XEM accelerator. Additionally, hardware characteristic analyses based on chip fabrication and simulator‐based functional verification are conducted. In the future, the performance and functionalities of the XEM accelerator will be verified using an AB21 processor.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

A layer‐wise frequency scaling for a neural processing unit

Cited by 2 publications

References 12 publications

PF‐GEMV: Utilization maximizing architecture in fast matrix–vector multiplication for GPT‐2 inference

PF‐GEMV: Utilization maximizing architecture in fast matrix–vector multiplication for GPT‐2 inference

XEM: Tensor accelerator for AB21 supercomputing artificial intelligence processor

Contact Info

Product

Resources

About