Weng-Fai Wong scite author profile

Weng-Fai Wong

5Publications

364Citation Statements Received

49Citation Statements Given

How they've been cited

405

360

How they cite others

Affiliations

National University of Singapore, RIKEN

Publications

Order By: Most citations

Multi retention level STT-RAM cache designs with a dynamic refresh scheme

Sun

et al. 2011

191

156

View full text Add to dashboard Cite

Spin-transfer torque random access memory (STT-RAM) has received increasing attention because of its attractive features: good scalability, zero standby power, non-volatility and radiation hardness. The use of STT-RAM technology in the last level on-chip caches has been proposed as it minimizes cache leakage power with technology scaling down. Furthermore, the cell area of STT-RAM is only 1/9 ∼ 1/3 that of SRAM. This allows for a much larger cache with the same die footprint, improving overall system performance through reducing cache misses. However, deploying STT-RAM technology in L1 caches is challenging because of the long and power-consuming write operations. In this paper, we propose both L1 and lower level cache designs that use STT-RAM. In particular, our designs use STT-RAM cells with various data retention time and write performances, made possible by different magnetic tunneling junction (MTJ) designs. For the fast STT-RAM bits with reduced data retention time, a counter controlled dynamic refresh scheme is proposed to maintain the data validity. Our dynamic scheme saves more than 80% refresh energy compared to the simple refresh scheme proposed in previous works. A L1 cache built with ultra low retention STT-RAM coupled with our proposed dynamic refresh scheme can achieve 9.2% in performance improvement, and saves up to 30% of the total energy when compared to one that uses traditional SRAM. For lower level caches with relative large cache capacity, we propose a data migration scheme that moves data between portions of the cache with different retention characteristics so as to maximize the performance and power benefits. Our experiments show that on the average, our proposed multi retention level STT-RAM cache reduces 30 ∼ 70% of the total energy compared to

show abstract

Fast hardware-based algorithms for elementary function computations using rectangular multipliers

Wong

Gogo²

1994

IEEE Trans. Comput.

View full text Add to dashboard Cite

Dynamic cache contention detection in multi-threaded applications

Zhao

Koh

Raza

et al. 2011

View full text Add to dashboard Cite

In today's multi-core systems, cache contention due to true and false sharing can cause unexpected and significant performance degradation. A detailed understanding of a given multi-threaded application's behavior is required to precisely identify such performance bottlenecks. Traditionally, however, such diagnostic information can only be obtained after lengthy simulation of the memory hierarchy.In this paper, we present a novel approach that efficiently analyzes interactions between threads to determine thread correlation and detect true and false sharing. It is based on the following key insight: although the slowdown caused by cache contention depends on factors including the thread-to-core binding and parameters of the memory hierarchy, the amount of data sharing is primarily a function of the cache line size and application behavior. Using memory shadowing and dynamic instrumentation, we implemented a tool that obtains detailed sharing information between threads without simulating the full complexity of the memory hierarchy. The runtime overhead of our approach -a 5× slowdown on average relative to native execution -is significantly less than that of detailed cache simulation. The information collected allows programmers to identify the degree of cache contention in an application, the correlation among its threads, and the sources of significant false sharing. Using our approach, we were able to improve the performance of some applications by up to a factor of 12×. For other contention-intensive applications, we were able to shed light on the obstacles that prevent their performance from scaling to many cores. Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors -Optimization, Run-time environments General Terms PerformanceKeywords False Sharing, Cache Contention, Shadow Memory, Dynamic Instrumentation Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

show abstract

Interprocedural Placement-Aware Configuration Prefetching for FPGA-Based Systems

Sim

Wong

Walla

et al. 2010

View full text Add to dashboard Cite

Abstract-One of the major impediments to deploying partially run-time reconfigurable FPGAs as hardware accelerators is the time overhead involved in loading the hardware modules. While configuration prefetching is an effective method that can be employed to reduce this overhead, mispredicted prefetches may worsen the situation by increasing the number of reconfigurations needed. In this paper, we present a static algorithm for configuration prefetching in partially reconfigurable FPGAs that minimizes the reconfiguration overhead. By making use of profiling, the interprocedural control flow graph, and the placement information of hardware modules, our algorithm predicts hardware execution and tries to prefetch hardware modules as early as possible while minimizing the risk of mis-predictions. Our experiments show that our algorithm performs significantly better than current-state-ofthe-art prefetching algorthms for control-bound applications.

show abstract

Fast evaluation of the elementary functions in single precision

Wong

Goto

1995

IEEE Trans. Comput.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Weng-Fai Wong

Multi retention level STT-RAM cache designs with a dynamic refresh scheme

Fast hardware-based algorithms for elementary function computations using rectangular multipliers

Dynamic cache contention detection in multi-threaded applications

Interprocedural Placement-Aware Configuration Prefetching for FPGA-Based Systems

Fast evaluation of the elementary functions in single precision

Contact Info

Product

Resources

About