Abstract:The performance of memory-bound commercial applications such as databases is limited by increasing memory latencies. In this paper, we show that exploiting memory-level parallelism (MLP) is an effective approach for improving the performance of these applications and that microarchitecture has a profound impact on achievable MLP. Using the epoch model of MLP, we reason how traditional microarchitecture features such as out-oforder issue and state-of-the-art microarchitecture techniques such as runahead executi… Show more
“…Average MLP is needed to consider to estimate the average penalty of each cache miss. We derive the average MLP definition in [2] as the average number of useful outstanding off-chip requests when there is at least one outstanding off-chip requests, denoted by M LP avg . We have the following equation:…”
Section: Modeling Performance Impact Of Cache Sharingmentioning
confidence: 99%
“…By taking MLP related cost of each cache miss into account, [12] modified the standard LRU replacement policy and higher performance guaranteed. [2] analyzed the microarchitecture impact on MLP and developed a detailed model to relating MLP to overall performance.…”
“…Average MLP is needed to consider to estimate the average penalty of each cache miss. We derive the average MLP definition in [2] as the average number of useful outstanding off-chip requests when there is at least one outstanding off-chip requests, denoted by M LP avg . We have the following equation:…”
Section: Modeling Performance Impact Of Cache Sharingmentioning
confidence: 99%
“…By taking MLP related cost of each cache miss into account, [12] modified the standard LRU replacement policy and higher performance guaranteed. [2] analyzed the microarchitecture impact on MLP and developed a detailed model to relating MLP to overall performance.…”
“…The DBCP mechanism correlates data addresses to last-touch accesses, enabling DBCP to increase memory-level parallelism for dependent memory accesses. Modern out-of-order processors often stall, unable to overlap the latency of dependent L1D misses [5]. Correlating miss addresses with last-touch instruction traces enables the predictor to retrieve data-dependent blocks in parallel and enhances memory level parallelism for pointerdependent traversals (e.g., linked lists or trees).…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.