2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA) 2018
DOI: 10.1109/isca.2018.00025
|View full text |Cite
|
Sign up to set email alerts
|

Scheduling Page Table Walks for Irregular GPU Applications

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
18
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 45 publications
(18 citation statements)
references
References 37 publications
0
18
0
Order By: Relevance
“…Our approach is built upon the unique execution characteristics of GPU and effectively increases the TLB reach with minimal hardware overhead. Meanwhile, our approach is complementary to most prior works (e.g., page table walk optimization for irregular applications [56]) and can be combined with them to further improve the UVM performance. Address translation optimizations: There exists a substantial body of research works, both from the OS community and the architecture community, focusing on address translation optimizations [2,7,8,12,29,37,42].…”
Section: Related Workmentioning
confidence: 95%
See 1 more Smart Citation
“…Our approach is built upon the unique execution characteristics of GPU and effectively increases the TLB reach with minimal hardware overhead. Meanwhile, our approach is complementary to most prior works (e.g., page table walk optimization for irregular applications [56]) and can be combined with them to further improve the UVM performance. Address translation optimizations: There exists a substantial body of research works, both from the OS community and the architecture community, focusing on address translation optimizations [2,7,8,12,29,37,42].…”
Section: Related Workmentioning
confidence: 95%
“…Pham et al [43] proposed a Bloom filter-based hardware mechanism that can be used to reduce the overheads imposed by cache flushes due to virtual page remappings. Shin et al [56] explored various critical warp-aware page table walking strategies to accelerate irregular application address translations. Margaritov et al [32] proposed parallel translation prefetching to avoid multiple levels of sequential page table walks in CPUs.…”
Section: Related Workmentioning
confidence: 99%
“…Similar to the cache hierarchy, the TLB hierarchy on a GPU consists of multiple levels [41]. Each GPU core or compute unit (CU) is equipped with a private L1-TLB that is typically fully associative to eliminate conict misses [9,41].…”
Section: Virtual Address Translation In Gpusmentioning
confidence: 99%
“…Similar to the cache hierarchy, the TLB hierarchy on a GPU consists of multiple levels [41]. Each GPU core or compute unit (CU) is equipped with a private L1-TLB that is typically fully associative to eliminate conict misses [9,41]. The L1-TLBs are typically backed by a larger L2-TLB, which is shared between all the available CUs in the GPU and is usually multi-ported to allow for concurrent lookups [9].…”
Section: Virtual Address Translation In Gpusmentioning
confidence: 99%
See 1 more Smart Citation