Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems

Ausavarungnirun,; Chang,; Subramanian, Vijay; Loh,; Mutlu,

doi:10.1109/isca.2012.6237036

Cited by 129 publications

(163 citation statements)

References 28 publications

Supporting

Mentioning

163

Contrasting

Order By: Relevance

“…In case of RD-to-WR, WR needs to wait CL + BL/2 + 2 − WL cycles. 6 Therefore, for a WR/RD command, the maximum delay from each of the commands that have arrived earlier is:…”

Section: A Request-driven Bounding Approachmentioning

confidence: 99%

“…If one of the tasks accesses a row different from the currently open one, this memory access causes a row-conflict request so that the re-ordering effect no longer occurs. In many systems, as described in [27,25,6], the re-ordering effect can also be bounded by a hardware threshold N cap , which caps the number of re-ordering between requests. Therefore, the maximum number of row-hits that can be prioritized over older row-conflicts is:…”

Section: A Request-driven Bounding Approachmentioning

confidence: 99%

See 1 more Smart Citation

Bounding memory interference delay in COTS-based multi-core systems

Kim

Niz

Andersson

et al. 2014

2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS)

Self Cite

179

158

View full text Add to dashboard Cite

Abstract-In commercial-off-the-shelf (COTS) multi-core systems, a task running on one core can be delayed by other tasks running simultaneously on other cores due to interference in the shared DRAM main memory. Such memory interference delay can be large and highly variable, thereby posing a significant challenge for the design of predictable real-time systems. In this paper, we present techniques to provide a tight upper bound on the worst-case memory interference in a COTS-based multi-core system. We explicitly model the major resources in the DRAM system, including banks, buses and the memory controller. By considering their timing characteristics, we analyze the worstcase memory interference delay imposed on a task by other tasks running in parallel. To the best of our knowledge, this is the first work bounding the request re-ordering effect of COTS memory controllers. Our work also enables the quantification of the extent by which memory interference can be reduced by partitioning DRAM banks. We evaluate our approach on a commodity multi-core platform running Linux/RK. Experimental results show that our approach provides an upper bound very close to our measured worst-case interference.

show abstract

“…In case of RD-to-WR, WR needs to wait CL + BL/2 + 2 − WL cycles. 6 Therefore, for a WR/RD command, the maximum delay from each of the commands that have arrived earlier is:…”

Section: A Request-driven Bounding Approachmentioning

confidence: 99%

Section: A Request-driven Bounding Approachmentioning

confidence: 99%

Bounding memory interference delay in COTS-based multi-core systems

Kim

Niz

Andersson

et al. 2014

2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS)

Self Cite

179

158

View full text Add to dashboard Cite

show abstract

“…The scheduler module collects transactions and reorders them with respect to latency and power savings and issues them to the backend with the channel controller that takes care of the correct use of the DRAM. DRAMSys supports state-of-the-art scheduling algorithms, FR-FCFS [21], Par-BS [22] and SMS [23] or it can simply disable the scheduling unit. Furthermore, the model has a Reorder Buffer (ROB) to provide in-order responses to the requester and it also c 2015 Information Processing Society of Japan supports a multi-rank configuration of the DRAM subsystem.…”

Section: Modelsmentioning

confidence: 99%

“…This refresh aware policy can also be applied to more recent DRAM schedulers like Refs. [22] and [23] and can also be used c 2015 Information Processing Society of Japan …”

Section: Refresh Aware Schedulingmentioning

confidence: 99%

DRAMSys: A Flexible DRAM Subsystem Design Space Exploration Framework

Jung

Weis

Wehn

2015

IPSJ Transactions on System LSI Design Methodology

View full text Add to dashboard Cite

Abstract:In systems ranging from mobile devices to servers, Dynamic Random Access Memories (DRAM) have a big impact on performance and contributes a significant part of the total consumed power. Conventional DDR3-based solutions are stretched thin as their maximum bandwidth is limited by the I/O count and interface speed. As new solutions are coming onto the market (JEDEC DDR4, JEDEC WIDE I/O, Micron's hybrid memory cube: HMC or JEDEC's high bandwidth memory: HBM) it is critical to evaluate the performance of these solutions and assess their suitability for specific applications. Furthermore, in systems with 3D stacking, the challenges of high power densities and thermal dissipation are exacerbated. It is crucial to have a flexible and holistic DRAM subsystem framework for exhaustive design space explorations, which can handle all this different types of memories, as well as the aspects of performance, power and temperature.

show abstract

“…In order to mitigate this issue, previous studies have explored different approaches including thread scheduling, memory channel partitioning, and address mapping schemes. Thread scheduling techniques are studied in various texts [1,[3][4][5][6]. Thread scheduling approaches prioritize memory access requests from different applications (or threads) to give more priority to those whose performance gain could be maximized.…”

Section: Introductionmentioning

confidence: 99%

Adaptive Memory Controller for High-performance Multi-channel Memory

Kim¹,

Lim²,

Cho³

et al. 2016

JSTS:Journal of Semiconductor Technology and Science

View full text Add to dashboard Cite

Abstract-As the number of CPU/GPU cores and IPs in SOC increases and applications require explosive memory bandwidth, simultaneously achieving good throughput and fairness in the memory system among interfering applications is very challenging. Recent works proposed priority-based thread scheduling and channel partitioning to improve throughput and fairness. However, combining these different approaches leads to performance and fairness degradation. In this paper, we analyze the problems incurred when combining priority-based scheduling and channel partitioning and propose dynamic priority thread scheduling and adaptive channel partitioning method. In addition, we propose dynamic address mapping to further optimize the proposed scheme. Combining proposed methods could enhance weighted speedup and fairness for memory intensive applications by 4.2% and 10.2% over TCM or by 19.7% and 19.9% over FR-FCFS on average whereas the proposed scheme requires space less than TCM by 8%.

show abstract

Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems

Cited by 129 publications

References 28 publications

Bounding memory interference delay in COTS-based multi-core systems

Bounding memory interference delay in COTS-based multi-core systems

DRAMSys: A Flexible DRAM Subsystem Design Space Exploration Framework

Adaptive Memory Controller for High-performance Multi-channel Memory

Contact Info

Product

Resources

About