2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) 2017
DOI: 10.1109/hpca.2017.58
|View full text |Cite
|
Sign up to set email alerts
|

Architecting an Energy-Efficient DRAM System for GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
28
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 77 publications
(28 citation statements)
references
References 29 publications
0
28
0
Order By: Relevance
“…In our work, we use an in-cache computing architecture similar to BLADE proposed by , targeted for the L1 cache of ARM-based many-core systems, as opposed to the Last Level Cache (LLC), as in NeuralCache proposed by Eckert et al (2018). Regarding HBM proposed by Lee et al (2014), emerging memory architectures have been explored, but mainly for GPUs, as discussed in Chatterjee et al (2017). To the best of our knowledge, this is the first work that simulates in-cache acceleration along with HBM at system level in Linux-based systems.…”
Section: Related Workmentioning
confidence: 99%
“…In our work, we use an in-cache computing architecture similar to BLADE proposed by , targeted for the L1 cache of ARM-based many-core systems, as opposed to the Last Level Cache (LLC), as in NeuralCache proposed by Eckert et al (2018). Regarding HBM proposed by Lee et al (2014), emerging memory architectures have been explored, but mainly for GPUs, as discussed in Chatterjee et al (2017). To the best of our knowledge, this is the first work that simulates in-cache acceleration along with HBM at system level in Linux-based systems.…”
Section: Related Workmentioning
confidence: 99%
“…SJF dynamically trades off the latency of completing all memory requests of a warp against the bandwidth utilization benefits of FR-FCFS. Chatterjee et al [10] propose a static reordering scheme to reduce the toggling rate in DRAM. Scheduling is orthogonal to address mapping because it attempts to increase row buffer hit rates while address mapping attempts to evenly distribute memory requests across channels and banks.…”
Section: Related Workmentioning
confidence: 99%
“…Thus, low-entropy address bits should be mapped to rows -to exploit row buffer locality -and high-entropy bits should be mapped to channels and banks -to exploit parallelism. Address mapping schemes have previously been proposed for single-core CPUs [5], multi-core CPUs [7] and GPUs [4], [8], [9], [10]. Our objective is to systematically analyze the entropy of the concurrent memory addresses in GPU-compute workloads and use this insight to derive efficient address mapping policies.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations