The emergence of GPGPU applications, bolstered by flexible GPU programming platforms, has created a tremendous challenge in maintaining high energy efficiency in modern GPUs. In this article, we demonstrate that customizing a Streaming Multiprocessor (SM) of a GPU at a lower frequency is significantly more energy efficient compared to employing DVFS on an SM designed for a high-frequency operation. Using a system-level CAD technique, we propose
SSAGA—Streaming Multiprocessors Synthesized for Asymmetric GPGPU Applications
—an energy-efficient GPU design paradigm. SSAGA creates architecturally identical SM cores, customized for different voltage-frequency domains. Our rigorous cross-layer methodology demonstrates an average of 20% improvement in energy efficiency over a spatially multitasking GPU across a range of GPGPU applications.
General-purpose graphics processing units (GPGPU), owing to their enormous parallelism have found ubiquitous applications in parallel computing. Along with an unprecedented performance offered by every new generation of the GPGPUs, their peak power rating has also increased over the years. As an inevitable consequence, Near-Threshold Computing (NTC) has come to the rescue, offering a substantially lower demand on the power supply unit, while striving to achieve a super-threshold performance, by exploiting a higher parallelism. However, a severe device-level delay variability arising from process variation (PV), can significantly diminish the NTC system performance. In this work, choke points-a unique device-level characteristic of PV at NTC-that can exacerbate the delays of the GPGPU parallel warps have been explored. In order to improve the NTC GPU performance, a family of holistic circuit-architectural solutions, referred to as Choke Point Aware Warp Speculator (CPAWS) has been proposed. CPAWS identifies the choke point induced critical warps in GPGPU applications, and improves their execution latencies in their respective execution units. Compared to a state-of-the-art warp scheduling policy, the best scheme improves the performance and energy-efficiency of an NTC GPU by ∼39% and ∼31%, respectively. while incurring marginal hardware overheads.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.