Chang-Jae Park scite author profile

Chang-Jae Park

5Publications

15Citation Statements Received

73Citation Statements Given

How they've been cited

How they cite others

Affiliations

Samsung (South Korea), Korea Advanced Institute of Science and Technology

Publications

Order By: Most citations

A C-based RTL design verification methodology for complex microprocessor

Yim

Hwang

Park

et al. 1997

View full text Add to dashboard Cite

As the complexity of high-performance microprocessor increases, functional veri cation becomes more and more difcult and RTL simulation emerges as the bottleneck of the design cycle. In this paper, we suggest C language-based design and veri cation methodology to enhance the simulation speed instead of the conventional HDL-based methodologies. RTL C modelStreC describes the cycle-based behaviors of synchronous circuits and is followed by model re ning and optimization using LifeTime AnalyzerLTA and Cleaner. The simulation speed of cycle-based C model makes it possible to test the RTL design with the real-world" application programs in the order-of-magnitude faster speed than the commercial event-driven simulators. Using the proposed functional veri cation methodology, HK486, an intel 80486 -compatible microprocessor was successfully designed and veri ed.

show abstract

Alloy: Parallel-serial memory channel architecture for single-chip heterogeneous processor systems

Wang

Park

Byun

et al. 2015

View full text Add to dashboard Cite

A single-chip heterogeneous processor integrates both CPU and GPU on the same chip, demanding higher memory bandwidth. However, the current parallel interface (e.g., DDR3) can increase neither the number of (memory) channels nor the bit rate of the channels without paying high package and power costs. In contrast, the high-speed serial interface (HSI) can offer much higher bandwidth for the same number of pins and lower power consumption for the same bandwidth than the parallel interface. This allows us to integrate more channels under a pin and/or package power constraint but at the cost of longer latency for memory accesses and higher static energy consumption in particular for idle channels. In this paper, we first provide a deep understanding of recent HSI exhibiting very distinct characteristics from past serial interfaces in terms of bit rate, latency, energy per bit transfer, and static power consumption. To overcome the limitation of using only parallel or serial interfaces, we second propose a hybrid memory channel architecture-Alloy consisting of lowlatency parallel and high-bandwidth serial channels. Alloy is assisted by our two proposed techniques: (i), a memory channel partitioning technique adaptively maps physical (memory) pages of latency-sensitive (CPU) and bandwidthconsuming (GPU) applications to parallel and serial channels, respectively, and (ii) a power management technique reduces the static energy consumption of idle serial channels. On average, Alloy provides 21% and 32% higher performance for CPU and GPU, respectively, while consuming total memory interface energy comparable to the baseline parallel channel architecture for diverse mixes of co-running CPU and GPU applications. Keywords Memoryarchitecture; Serial memory interface; Heterogeneous processors 978-1-4799-8930-0/15/$31.00 ©2015 IEEE

show abstract

Evaluation of Neutron Shielding Effects on Various Materials by Using a Cf-252 Source

Kang¹,

Park²,

Seo³

et al. 2008

J. Korean Phy. Soc.

View full text Add to dashboard Cite

Design verification of complex microprocessors

Yim

Park²,

Yang³

et al.

View full text Add to dashboard Cite

A 64-TOPS Energy-Efficient Tensor Accelerator in 14nm With Reconfigurable Fetch Network and Processing Fusion for Maximal Data Reuse

Lee¹,

Kim²,

Yeon³

et al. 2022

IEEE Open J. Solid-State Circuits Soc.

View full text Add to dashboard Cite

For energy-efficient accelerators in data centers that leverage advances in the performance and energy efficiency of recent algorithms, flexible architectures are critical to support state-of-the-art algorithms for various deep learning tasks. Due to the matrix multiplication units at the core of tensor operations, most recent programmable architectures lack flexibility for layers with diminished dimensions, especially for inferences where a large batch axis is rarely allowed. In addition, exploiting the data reuse inherent within tensor operations for computing a single matrix multiplication is challenging. In this work, an extension of a vector processor in 14 nm is proposed, which is customized to tensor operations. The flexible architecture enables a tensorized loop to support various data layouts and different shapes and sizes of tensor operations. It also exploits all possible data reuse, including input, weight, and output. Based on the tensorized loop, fetch and reduction networks, which unicast or multicast with the ordering of both input data and processing data, can be simplified using a circuit-switching-like network with configured topology and flow control for each tensor operation. Two processing elements can be fused to optimize latency for a large model or can operate individually for throughput. As a result, various state-of-the-art models can be processed efficiently with straightforward compiler optimization, and the highest energy efficiency of 13.4 Inferences/s/W on EfficientNetV2-S is demonstrated.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chang-Jae Park

A C-based RTL design verification methodology for complex microprocessor

Alloy: Parallel-serial memory channel architecture for single-chip heterogeneous processor systems

Evaluation of Neutron Shielding Effects on Various Materials by Using a Cf-252 Source

Design verification of complex microprocessors

A 64-TOPS Energy-Efficient Tensor Accelerator in 14nm With Reconfigurable Fetch Network and Processing Fusion for Maximal Data Reuse

Contact Info

Product

Resources

About