General-purpose graphics processing units (GPGPU), owing to their enormous parallelism have found ubiquitous applications in parallel computing. Along with an unprecedented performance offered by every new generation of the GPGPUs, their peak power rating has also increased over the years. As an inevitable consequence, Near-Threshold Computing (NTC) has come to the rescue, offering a substantially lower demand on the power supply unit, while striving to achieve a super-threshold performance, by exploiting a higher parallelism. However, a severe device-level delay variability arising from process variation (PV), can significantly diminish the NTC system performance. In this work, choke points-a unique device-level characteristic of PV at NTC-that can exacerbate the delays of the GPGPU parallel warps have been explored. In order to improve the NTC GPU performance, a family of holistic circuit-architectural solutions, referred to as Choke Point Aware Warp Speculator (CPAWS) has been proposed. CPAWS identifies the choke point induced critical warps in GPGPU applications, and improves their execution latencies in their respective execution units. Compared to a state-of-the-art warp scheduling policy, the best scheme improves the performance and energy-efficiency of an NTC GPU by ∼39% and ∼31%, respectively. while incurring marginal hardware overheads.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.