2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) 2020
DOI: 10.1109/hpca47549.2020.00055
|View full text |Cite
|
Sign up to set email alerts
|

Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 33 publications
(13 citation statements)
references
References 36 publications
0
13
0
Order By: Relevance
“…All of our experiments are run with a 4KB page size, which is the common page size used in prior studies on address translation hardware design on GPUs [9,41,42]. While larger pages (e.g., 2MB) have the potential of reducing L1-TLB misses, they have large page migration latencies [8,11,19] and can also increase the average number of stalled wavefronts on TLB misses to 100% [8,9] and hence are not always optimal to use.…”
Section: Evaluation Methodologymentioning
confidence: 99%
See 2 more Smart Citations
“…All of our experiments are run with a 4KB page size, which is the common page size used in prior studies on address translation hardware design on GPUs [9,41,42]. While larger pages (e.g., 2MB) have the potential of reducing L1-TLB misses, they have large page migration latencies [8,11,19] and can also increase the average number of stalled wavefronts on TLB misses to 100% [8,9] and hence are not always optimal to use.…”
Section: Evaluation Methodologymentioning
confidence: 99%
“…They eliminate the need to perform explicit memory copies, as the GPU driver and the runtime handle all page transfers to/from the GPU. They lower programmer burden by managing CPU-to-GPU and GPU-to-GPU data transfers [1,11] and support oversubscription of memory [29,34]. To support all these capabilities, GPU vendors have added virtual memory support, providing the required hardware and software.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…A critical mechanism for UM is prefetching, page-eviction due to memory over subscription, and page migration between GPUs. The works of [Agarwal et al 2015;Baruah et al 2020;Ganguly et al 2019Ganguly et al , 2020Young et al 2018] proposed new algorithms to improve UM performance in the case of transparent memory management. In contrast, our approach controls the page placement and replication manually based on analysis of memory access patterns.…”
Section: Cuda Unified Memory For Multi-gpu Systemsmentioning
confidence: 99%
“…Hardware-based TLB shootdown. There have been a number of approaches to handle the problem of TLB cache coherence at the hardware layer [7,10,12,42,43,48,49,51,60,62]. Several of these hardware-based approaches attempt to squeeze performance using non-traditional TLB designs, such as multi-level TLB hierarchies.…”
Section: Related Workmentioning
confidence: 99%