2015
DOI: 10.1145/2775054.2694381
|View full text |Cite
|
Sign up to set email alerts
|

Page Placement Strategies for GPUs within Heterogeneous Memory Systems

Abstract: Systems from smartphones to supercomputers are increasingly heterogeneous, being composed of both CPUs and GPUs. To maximize cost and energy efficiency, these systems will increasingly use globally-addressable heterogeneous memory systems, making choices about memory page placement critical to performance. In this work we show that current page placement policies are not sufficient to maximize GPU performance in these heterogeneous memory systems. We propose two new page placement policies that improve GPU per… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
30
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 37 publications
(31 citation statements)
references
References 27 publications
1
30
0
Order By: Relevance
“…To reduce the data movement cost, we selectively place some data objects in DRAM at the beginning of the application, instead of placing all data objects in NVM. e existing work has demonstrated performance bene t of the initial data placement on GPU with HMS [1,25]. Our initial data placement technique on NVM-based HMS is consistent with those existing e orts.…”
Section: Optimizationsupporting
confidence: 82%
See 2 more Smart Citations
“…To reduce the data movement cost, we selectively place some data objects in DRAM at the beginning of the application, instead of placing all data objects in NVM. e existing work has demonstrated performance bene t of the initial data placement on GPU with HMS [1,25]. Our initial data placement technique on NVM-based HMS is consistent with those existing e orts.…”
Section: Optimizationsupporting
confidence: 82%
“…BW d at a ob j = #dat a access × cacheline size #s ampl e s w i t h d at a acc e s s e s #s ampl e s × phase ex ecut ion t ime (1) e numerator of Equation 1 is the accessed data size. #data access in the numerator is the number of memory accesses to the data object in main memory.…”
Section: Designmentioning
confidence: 99%
See 1 more Smart Citation
“…• Copy data from host memory to device (GPU) memory • Launch the function-called kernel-to be executed on the GPU • Wait until the kernel finishes • Copy the output from device memory to host memory In the real-time systems community, GPUs have been studied actively in recent years because of their potential benefits in accelerating demanding data-parallel real-time applications [5]. As observed in [6], GPU kernels typically demand high memory bandwidth to achieve high data parallelism and, if the memory bandwidth required by GPU kernels is not satisfied, it can result in significant performance reduction. For discrete GPUs, which have dedicated graphic memories, researchers have focused on addressing interference among the co-scheduled GPU tasks.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Similarly, Micron's Hybrid Memory Cube [4,5] and byte-addressable persistent memories [6][7][8][9] are quickly gaining traction. Vendors are combining these high-performance memories with traditional high-capacity and low-cost DRAM, prompting research on heterogeneous memory architectures [2,[9][10][11][12][13][14][15].…”
Section: Introductionmentioning
confidence: 99%