“…This is achieved by removing the private TLBs from the SMs, and instead use virtual L1s and a single, shared TLB close to the GPU L2 (with similar configuration to the GPU L2-TLB of VI Hammer) for the translations of the entire cluster. A conservative 11 area analysis using Cacti, shows that our MMU requires at least 49% less area compared to an approach that has private TLBs, such as the one proposed by [Power et al 2014] and used by [Power et al 2015]. For VI Hammer we model 16 private TLBs with 32 entries, fully set-associative, and a shared 1024 entries 32-way set-associative TLB, while VIPS-G uses only a single 1024 entries 32-way set-associative extended TLB -that is, in the latter, we also account the extra area for the classification, owner, and V/I bits.…”