2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA) 2018
DOI: 10.1109/isca.2018.00073
|View full text |Cite
|
Sign up to set email alerts
|

RegMutex: Inter-Warp GPU Register Time-Sharing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 38 publications
(9 citation statements)
references
References 28 publications
0
9
0
Order By: Relevance
“…Table 2 presents the major parameters of the simulated system. Except for Section 4.6, all results are generated using the Fermi [24] configuration, which is the mostly targeted configuration for GPU research even in recent publications [9,10,12,16,31,35,41]. Note that although the simulations are based on Fermi architecture, the principles behind EXPARS are also applicable to newer architectures such as Kepler, Maxwell and Pascal.…”
Section: Methodsmentioning
confidence: 99%
“…Table 2 presents the major parameters of the simulated system. Except for Section 4.6, all results are generated using the Fermi [24] configuration, which is the mostly targeted configuration for GPU research even in recent publications [9,10,12,16,31,35,41]. Note that although the simulations are based on Fermi architecture, the principles behind EXPARS are also applicable to newer architectures such as Kepler, Maxwell and Pascal.…”
Section: Methodsmentioning
confidence: 99%
“…Efficient register space utilization in GPUs. Works in this section aim to share the physical register file space [25,73]. In Ref.…”
Section: Related Workmentioning
confidence: 99%
“…In Ref. [25], the authors propose a software-hardware mechanism named Register Mutual Exclusion (RegMutex) to share a subset of physical registers between warps during the GPU kernel execution. RegMutex increases register utilization by sharing the physical register space (has nothing to do with nearest-neighbor data sharing), while NeDa reuses the physical register space efficiently, along with its corresponding data for a group of SP cores residing in a neighborhood window.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Several works aim to improve the performance of register iles. RegMutex [25] improved performance by sharing a subset of physical registers between warps during the GPU kernel execution. FineReg [42] achieved a higher number of concurrent CTAs by partitioning the register ile into two regions, one for active CTAs and another for pending CTAs.…”
Section: Related Workmentioning
confidence: 99%