2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) 2014
DOI: 10.1109/hpca.2014.6835965
|View full text |Cite
|
Sign up to set email alerts
|

Supporting x86-64 address translation for 100s of GPU lanes

Abstract: Abstract

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
96
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 112 publications
(98 citation statements)
references
References 28 publications
2
96
0
Order By: Relevance
“…This is achieved by removing the private TLBs from the SMs, and instead use virtual L1s and a single, shared TLB close to the GPU L2 (with similar configuration to the GPU L2-TLB of VI Hammer) for the translations of the entire cluster. A conservative 11 area analysis using Cacti, shows that our MMU requires at least 49% less area compared to an approach that has private TLBs, such as the one proposed by [Power et al 2014] and used by [Power et al 2015]. For VI Hammer we model 16 private TLBs with 32 entries, fully set-associative, and a shared 1024 entries 32-way set-associative TLB, while VIPS-G uses only a single 1024 entries 32-way set-associative extended TLB -that is, in the latter, we also account the extra area for the classification, owner, and V/I bits.…”
Section: Area Reduction Analysismentioning
confidence: 99%
See 4 more Smart Citations
“…This is achieved by removing the private TLBs from the SMs, and instead use virtual L1s and a single, shared TLB close to the GPU L2 (with similar configuration to the GPU L2-TLB of VI Hammer) for the translations of the entire cluster. A conservative 11 area analysis using Cacti, shows that our MMU requires at least 49% less area compared to an approach that has private TLBs, such as the one proposed by [Power et al 2014] and used by [Power et al 2015]. For VI Hammer we model 16 private TLBs with 32 entries, fully set-associative, and a shared 1024 entries 32-way set-associative TLB, while VIPS-G uses only a single 1024 entries 32-way set-associative extended TLB -that is, in the latter, we also account the extra area for the classification, owner, and V/I bits.…”
Section: Area Reduction Analysismentioning
confidence: 99%
“…In contrast with [Power et al 2014] proposal that relies on private TLBs at every GPU SM in coordination with a highly multi-threaded page-walker for the translations, we use a single shared TLB for the whole GPU attached to the L2 and virtual (VIVT) address for the GPU L1s. This is possible with the use of a coherence protocol such as VIPS-G, which is based on self-invalidation, and therefore does not involve upwards traffic as in the case of SC protocols.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations