Quantifying and Optimizing Data Access Parallelism on Manycores

Ryoo, Jihyun; Kislal, Orhan; Tang, Xulong; Kandemir, Mahmut

doi:10.1109/mascots.2018.00022

Cited by 7 publications

(1 citation statement)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Further, we consider CLP to improve performance by reducing cache hits latencies. Hardware Approaches to Memory-Level Parallelism: There have also been hardware researches optimize memory accesses in manycore systems [5,10,14,15,40,59]. Mutlu et al [34] proposed memory request batching to improve intra-thread bank-level parallelism while preserving row-bufer locality.…”

Section: Discussion Of Related Workmentioning

confidence: 99%

Co-optimizing memory-level parallelism and cache-level parallelism

Tang

Kandemir

Karakoy

et al. 2019

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation

Self Cite

View full text Add to dashboard Cite

Minimizing cache misses has been the traditional goal in optimizing cache performance using compiler based techniques. However, continuously increasing dataset sizes combined with large numbers of cache banks and memory banks connected using on-chip networks in emerging manycores/accelerators makes cache hitśmiss latency optimization as important as cache miss rate minimization. In this paper, we propose compiler support that optimizes both the latencies of last-level cache (LLC) hits and the latencies of LLC misses. Our approach tries to achieve this goal by improving the parallelism exhibited by LLC hits and LLC misses. More speciically, it tries to maximize both cache-level parallelism (CLP) and memory-level parallelism (MLP). This paper presents diferent incarnations of our approach, and evaluates them using a set of 12 multithreaded applications. Our results indicate that (i) optimizing MLP irst and CLP later brings, on average, 11.31% performance improvement over an approach that already minimizes the number of LLC misses, and (ii) optimizing CLP irst and MLP later brings 9.43% performance improvement. In comparison, balancing MLP and CLP brings 17.32% performance improvement on average. CCS Concepts • Computer systems organization → Multicore architectures.

show abstract