2019
DOI: 10.1007/s11227-019-03079-y
|View full text |Cite
|
Sign up to set email alerts
|

A quantitative evaluation of unified memory in GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(12 citation statements)
references
References 32 publications
0
12
0
Order By: Relevance
“…The difference in memory access patterns across benchmarks put the hardware prefetcher in different degrees of efficacy. More detailed discussion on UVM hardware prefetchers can be found in other papers such as [6], [9], [19]. Observation/Suggestion: The above results on the effective PCIe bandwidth indicate that hardware prefetchers that are currently employed in GPUs cannot fully utilize PCIe bandwidth.…”
Section: Effect Of Data Migration On Pcie Bandwidthmentioning
confidence: 87%
See 2 more Smart Citations
“…The difference in memory access patterns across benchmarks put the hardware prefetcher in different degrees of efficacy. More detailed discussion on UVM hardware prefetchers can be found in other papers such as [6], [9], [19]. Observation/Suggestion: The above results on the effective PCIe bandwidth indicate that hardware prefetchers that are currently employed in GPUs cannot fully utilize PCIe bandwidth.…”
Section: Effect Of Data Migration On Pcie Bandwidthmentioning
confidence: 87%
“…Due to the large potential benefits of UVM and its associated performance issues, UVM has recently drawn significant attention from the research community. Several optimization techniques have been proposed to mitigate the side effects of UVM [5], [6], [8], [9], [12], [19], [20]. The earliest work is Zheng et al [20], which enables on-demand GPU memory and proposes prefetching techniques to improve UVM performance.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…If the shared TLB cannot find a entry as well, it asks GMMU for traversing the page table and finding the entry. (Figure 2) GMMU utilizes up to 64 page table walker threads to process concurrent requests from multiple SMs in parallel [10]. Once GMMU finds the mapping, it returns the mapping to the requesting L2, L1, and SM.…”
Section: B Address Translation In Gp-gpumentioning
confidence: 99%
“…for all j ∈ nnz i do parallel in threads a pool of managed memory is accessible from both CPUs and GPUs using a single pointer within a multi-GPU system [25], [26]. One of the most salient feature of Unified Memory is that the system automatically migrates data allocated in Unified Memory (using cudaMallocManaged API) between the host and device.…”
Section: Sptrsv With Unified Memory a Communication Through Unified M...mentioning
confidence: 99%