Jungrae Kim scite author profile

Jungrae Kim

5Publications

4Citation Statements Received

187Citation Statements Given

How they've been cited

How they cite others

153

186

Affiliations

Sungkyunkwan University

Publications

Order By: Most citations

Auto-Tiler: Variable-Dimension Autoencoder with Tiling for Compressing Intermediate Feature Space of Deep Neural Networks for Internet of Things

Park

Kim

2021

Sensors

View full text Add to dashboard Cite

Due to limited resources of the Internet of Things (IoT) edge devices, deep neural network (DNN) inference requires collaboration with cloud server platforms, where DNN inference is partitioned and offloaded to high-performance servers to reduce end-to-end latency. As data-intensive intermediate feature space at the partitioned layer should be transmitted to the servers, efficient compression of the feature space is imperative for high-throughput inference. However, the feature space at deeper layers has different characteristics than natural images, limiting the compression performance by conventional preprocessing and encoding techniques. To tackle this limitation, we introduce a new method for compressing DNN intermediate feature space using a specialized autoencoder, called auto-tiler. The proposed auto-tiler is designed to include the tiling process and provide multiple input/output dimensions to support various partitioned layers and compression ratios. The results show that auto-tiler achieves 18% to 67% higher percent point accuracy compared to the existing methods at the same bitrate while reducing the process latency by 73% to 81%. The dimension variability of an auto-tiler also reduces the storage overhead by 62% with negligible accuracy loss.

show abstract

Demand MemCpy: Overlapping of Computation and Data Transfer for Heterogeneous Computing

2022

View full text Add to dashboard Cite

Heterogeneous computing relies on collaboration among different types of processors on shared data. In systems with discrete accelerators (e.g., GP-GPU), data sharing requires transferring a large amount of data between CPU and accelerator memories and can significantly increase the end-toend execution time. This paper proposes a novel mechanism called Demand MemCpy (DMC) to hide the data sharing overheads. DMC copies data from host memory to accelerator memory based on demands at page granularity. It utilizes a hardware-only mechanism to fetch the requested page with a short latency and the background pre-copy to fetch related pages in advance. Our evaluation shows that DMC can reduce the end-to-end execution time of GP-GPU application by 25.4% on average by overlapping computation with data transfer and not transferring unused pages.

show abstract

UVMMU: Hardware-Offloaded Page Migration for Heterogeneous Computing

Park

Jeong

Kim

2023

View full text Add to dashboard Cite

SEC-BADAEC: An Efficient ECC With No Vacancy for Strong Memory Protection

et al. 2022

View full text Add to dashboard Cite

Shrinking process technology and rising memory densities have made memories increasingly vulnerable to errors. Accordingly, DRAM vendors have introduced On-die Error Correction Code (O-ECC) to protect data against the growing number of errors. Current O-ECC provides weak Single Error Correction (SEC), but future memories will require stronger protection as error rates rise. This paper proposes a novel ECC, called Single Error Correction-Byte-Aligned Double Adjacent Error Correction (SEC-BADAEC) and its construction algorithm to improve memory reliability. SEC-BADAEC requires the same redundancy as SEC O-ECC, but it can also correct some frequent 2-bit error patterns. Our evaluation shows SEC-BADAEC can improve memory reliability by 23.5% and system-level reliability by 29.8% with negligible overheads.

show abstract

On-the-Fly Lowering Engine: Offloading Data Layout Conversion for Convolutional Neural Networks

et al. 2022

View full text Add to dashboard Cite

Many deep learning frameworks utilize GEneral Matrix Multiplication (GEMM)-based convolution to accelerate CNN execution. GEMM-based convolution provides faster convolution yet requires a data conversion process called lowering (i.e., im2col), which incurs significant memory overhead and diminishes performance. This paper proposes a novel hardware mechanism, called On-the-fly Lowering Engine (OLE), to eliminate the lowering overheads. Our goal is to offload the lowering overheads from the GEMMbased convolution. With OLE, the lowered matrix is neither pre-calculated nor stored in the main memory. Instead, a hardware engine generates lowered matrix on-the-fly from the original input matrix to reduce memory footprint and bandwidth requirements. Furthermore, the hardware offloading eliminates CPU cycles for lowering operation and overlaps computation with lowering to hide the performance overhead. Our evaluation shows that OLE can reduce memory footprint of convolutional layer inputs down to 1 12.5 × and the overall memory footprint by up to 33.5%. Moreover, OLE can reduce the execution time of convolutional layers by 57.7% on average, resulting in an average speedup of 2.3× for representative CNN models.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jungrae Kim

Auto-Tiler: Variable-Dimension Autoencoder with Tiling for Compressing Intermediate Feature Space of Deep Neural Networks for Internet of Things

Demand MemCpy: Overlapping of Computation and Data Transfer for Heterogeneous Computing

UVMMU: Hardware-Offloaded Page Migration for Heterogeneous Computing

SEC-BADAEC: An Efficient ECC With No Vacancy for Strong Memory Protection

On-the-Fly Lowering Engine: Offloading Data Layout Conversion for Convolutional Neural Networks

Contact Info

Product

Resources

About