CASH-RF: A Compiler-Assisted Hierarchical Register File in GPUs

Oh, Yunho; Jeong, Ipoom; Ro, Won Woo; Yoon, Myung Kuk

doi:10.1109/les.2022.3163749

Cited by 4 publications

(2 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many researchers architected the GPU register file using emerging NVM memory technologies. It is well-known that NVM memory technologies have several advantages over SRAM, such as lower static energy consumption and higher density [11], [14], [16], [19], [46], [47]. Li et al proposed STT-MRAM-based register file in GPUs [14].…”

Section: A Nvm Register Filementioning

confidence: 99%

“…Leveraging their low leakage power, implementing the register file using NVMs significantly reduces the leakage energy consumption. searchers have proposed the hybrid or hierarchical register file composed of SRAM and NVM [11], [14], [19]. In the hybrid register file [14], the SRAM write buffers are inserted ahead of each STT-MRAM-based register bank.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

TEA-RC: Thread Context-Aware Register Cache for GPUs

Jeong

et al. 2022

IEEE Access

Self Cite

View full text Add to dashboard Cite

Graphics processing units (GPUs) achieve high throughput by exploiting a high degree of thread-level parallelism (TLP). To support such high TLP, GPUs have a large-sized register file to store the context of all threads, consuming around 20% of total GPU energy. Several previous studies have attempted to minimize the energy consumption of the register file by implementing an emerging non-volatile memory (NVM), leveraging its higher density and lower leakage power over SRAMs. To amortize the cost of long access latency of NVM, prior work adopts a hierarchical register file consisting of an SRAM-based register cache and an NVM-based registers where the register cache works as a write buffer. To get the register cache index, they use the partially selected bits of warp ID and register ID. This work observes that such an index calculation causes three types of contentions leading to the underutilization of the register cache: inter-warp, intra-warp, and false contentions. To minimize such contentions, this paper proposes a thread contextaware register cache (TEA-RC) in GPUs. In TEA-RC, the cache index is calculated considering the high correlation between the number of scheduled threads and the register usage of threads. The proposed design shows 28.5% higher performance and 9.1 percentage point lower energy consumption over the conventional register cache that concatenates three bits of warp ID and five bits of register ID to compute the cache index.INDEX TERMS Graphics processing units; register file; register cache; volatile memory; non-volatile memory, hybrid register file, hierarchical register file This article has been accepted for publication in IEEE Access.

show abstract