Jingwen Leng scite author profile

As GPU's compute capabilities grow, their memory hierarchy increasingly becomes a bottleneck. Current GPU memory hierarchies use coarse-grained memory accesses to exploit spatial locality, maximize peak bandwidth, simplify control, and reduce cache meta-data storage. These coarse-grained memory accesses, however, are a poor match for emerging GPU applications with irregular control flow and memory access patterns. Meanwhile, the massive multi-threading of GPUs and the simplicity of their cache hierarchies make CPU-specific memory system enhancements ineffective for improving the performance of irregular GPU applications. We design and evaluate a locality-aware memory hierarchy for throughput processors, such as GPUs. Our proposed design retains the advantages of coarse-grained accesses for spatially and temporally local programs while permitting selective fine-grained access to memory. By adaptively adjusting the access granularity, memory bandwidth and energy are reduced for data with low spatial/temporal locality without wasting control overheads or prefetching potential for data with high spatial locality. As such, our locality-aware memory hierarchy improves GPU performance, energy-efficiency, and memory throughput for a large range of applications.

show abstract

Safe limits on voltage reduction efficiency in GPUs

Leng

Buyuktosunoglu²,

Bertran³

et al. 2015

View full text Add to dashboard Cite

Energy e ciency of GPU architectures has emerged as an important aspect of computer system design. In this paper, we explore the energy benefits of reducing the GPU chip's voltage to the safe limit, i.e. V min point. We perform such a study on several commercial o↵the-shelf GPU cards. We find that there exists about 20% voltage guardband on those GPUs spanning two architectural generations, which, if "eliminated" completely, can result in up to 25% energy savings on one of the studied GPU cards. The exact improvement magnitude depends on the program's available guardband, because our measurement results unveil a program dependent V min behavior across the studied programs. We make fundamental observations about the programdependent V min behavior. We experimentally determine that the voltage noise has a larger impact on V min compared to the process and temperature variation, and the activities during the kernel execution cause large voltage droops. From these findings, we show how to use a kernel's microarchitectural performance counters to predict its V min value accurately. The average and maximum prediction errors are 0.5% and 3%, respectively. The accurate V min prediction opens up new possibilities of a cross-layer dynamic guardbanding scheme for GPUs, in which software predicts and manages the voltage guardband, while the functional correctness is ensured by a hardware safety net mechanism.

show abstract

Adversarial Defense Through Network Profiling Based Path Extraction

Qiu

Leng

Guo

et al. 2019

View full text Add to dashboard Cite

Recently, researchers have started decomposing deep neural network models according to their semantics or functions. Recent work has shown the effectiveness of decomposed functional blocks for defending adversarial attacks, which add small input perturbation to the input image to fool the DNN models. This work proposes a profiling-based method to decompose the DNN models to different functional blocks, which lead to the effective path as a new approach to exploring DNNs' internal organization. Specifically, the per-image effective path can be aggregated to the class-level effective path, through which we observe that adversarial images activate effective path different from normal images. We propose an effective path similarity-based method to detect adversarial images with an interpretable model, which achieve better accuracy and broader applicability than the state-of-the-art technique. * Jingwen Leng and Minyi Guo are co-corresponding authors of this paper.

show abstract

Adaptive guardband scheduling to improve system-level efficiency of the POWER7+

Lefurgy²,

Leng

et al. 2015

View full text Add to dashboard Cite

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity

Guo

Hsueh

Leng

et al. 2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jingwen Leng

A locality-aware memory hierarchy for energy-efficient GPU architectures

Safe limits on voltage reduction efficiency in GPUs

Adversarial Defense Through Network Profiling Based Path Extraction

Adaptive guardband scheduling to improve system-level efficiency of the POWER7+

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity

Contact Info

Product

Resources

About