2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA) 2018
DOI: 10.1109/isca.2018.00024
|View full text |Cite
|
Sign up to set email alerts
|

Get Out of the Valley: Power-Efficient Address Mapping for GPUs

Abstract: GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, GPU-compute workloads also use multi-dimensional structures to organize the threads. We observe that these structures can combine unfavorably and create significant resource imbalance in the memory subsystem-causing low performance and poor power-efficiency. The key issue is that it is highly applicationdependent which memory address bits exhib… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
37
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3
1

Relationship

3
4

Authors

Journals

citations
Cited by 28 publications
(37 citation statements)
references
References 41 publications
0
37
0
Order By: Relevance
“…We further assume two LLC slices per channel, and a total number of 64 LLC slices or 16 LLC slices per HBM stack. We use the state-of-the-art PAE randomized address mapping scheme to uniformly distribute memory accesses across LLC slices, memory channels, and banks [42]. We further assume a typical cache line size of 128 B.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We further assume two LLC slices per channel, and a total number of 64 LLC slices or 16 LLC slices per HBM stack. We use the state-of-the-art PAE randomized address mapping scheme to uniformly distribute memory accesses across LLC slices, memory channels, and banks [42]. We further assume a typical cache line size of 128 B.…”
Section: Methodsmentioning
confidence: 99%
“…The memory accesses of the SMs are routed to the different LLC slices and MCs based on the memory address. In this work, we use the recently proposed PAE address mapping which evenly distributes memory requests to different addresses across LLC slices and memory controllers to maximize parallelism in the memory subsystem [42].…”
Section: Background and Motivationmentioning
confidence: 99%
See 1 more Smart Citation
“…The low-order row bits are then XORed with the bank bits to generate new bank bits. The authors of [2] proposed a binary invertible matrix (BIM) for GPU mapping (Figure 2f), which represents memory remapping operations. The BIM composes all address mapping schemes through AND and XOR operations, and exploits its reversibility property to ensure that all possible correspondences are considered.…”
Section: Address Mappingmentioning
confidence: 99%
“…The nine differences are stored in Dif_ram. Among these values, the difference stored in Dif_ram [2] and Dif_ram [5] (000027F0) is equal to M (10224D) output by the PCU, the difference stored in Dif_ram [8] is FFFFAFF8 (<0), and the remaining entries Dif_ram [0], Dif_ram [1], Dif_ram [3], Dif_ram [4], Dif_ram [6], and Dif_ram [7] are all equal. The output terminal S0 of the AND gate is 1, the output terminals S1 and S2 are 0, and the AL output is 100, denoting that the memory access address stream follows the 2D memory access pattern.…”
Section: Arbitration Logic (Al)mentioning
confidence: 99%