2012 39th Annual International Symposium on Computer Architecture (ISCA) 2012
DOI: 10.1109/isca.2012.6237036
|View full text |Cite
|
Sign up to set email alerts
|

Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems

Abstract: When multiple processor (CPU) cores and a GPU integrated together on the same chip share the off-chip main memory , requests from the GPU can heavily interfere with requests from the CPU cores, leading to low system performance and starvation of CPU cores. Unfortunately, state-of-the-art application-aware memory scheduling algorithms are ineffective at solving this problem at low complexity due to the large amount of GPU traffic. A large and costly request buffer is needed to provide these algorithms with enou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
163
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 129 publications
(163 citation statements)
references
References 28 publications
0
163
0
Order By: Relevance
“…In case of RD-to-WR, WR needs to wait CL + BL/2 + 2 − WL cycles. 6 Therefore, for a WR/RD command, the maximum delay from each of the commands that have arrived earlier is:…”
Section: A Request-driven Bounding Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…In case of RD-to-WR, WR needs to wait CL + BL/2 + 2 − WL cycles. 6 Therefore, for a WR/RD command, the maximum delay from each of the commands that have arrived earlier is:…”
Section: A Request-driven Bounding Approachmentioning
confidence: 99%
“…If one of the tasks accesses a row different from the currently open one, this memory access causes a row-conflict request so that the re-ordering effect no longer occurs. In many systems, as described in [27,25,6], the re-ordering effect can also be bounded by a hardware threshold N cap , which caps the number of re-ordering between requests. Therefore, the maximum number of row-hits that can be prioritized over older row-conflicts is:…”
Section: A Request-driven Bounding Approachmentioning
confidence: 99%
“…The scheduler module collects transactions and reorders them with respect to latency and power savings and issues them to the backend with the channel controller that takes care of the correct use of the DRAM. DRAMSys supports state-of-the-art scheduling algorithms, FR-FCFS [21], Par-BS [22] and SMS [23] or it can simply disable the scheduling unit. Furthermore, the model has a Reorder Buffer (ROB) to provide in-order responses to the requester and it also c 2015 Information Processing Society of Japan supports a multi-rank configuration of the DRAM subsystem.…”
Section: Modelsmentioning
confidence: 99%
“…This refresh aware policy can also be applied to more recent DRAM schedulers like Refs. [22] and [23] and can also be used c 2015 Information Processing Society of Japan …”
Section: Refresh Aware Schedulingmentioning
confidence: 99%
“…In order to mitigate this issue, previous studies have explored different approaches including thread scheduling, memory channel partitioning, and address mapping schemes. Thread scheduling techniques are studied in various texts [1,[3][4][5][6]. Thread scheduling approaches prioritize memory access requests from different applications (or threads) to give more priority to those whose performance gain could be maximized.…”
Section: Introductionmentioning
confidence: 99%