2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) 2021
DOI: 10.1109/isca52012.2021.00024
|View full text |Cite
|
Sign up to set email alerts
|

Vector Runahead

Abstract: The memory wall places a significant limit on performance for many modern workloads. These applications feature complex chains of dependent, indirect memory accesses, which cannot be picked up by even the most advanced microarchitectural prefetchers. The result is that current out-of-order superscalar processors spend the majority of their time stalled. While it is possible to build special-purpose architectures to exploit the fundamental memory-level parallelism, a microarchitectural technique to automaticall… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(2 citation statements)
references
References 91 publications
0
2
0
Order By: Relevance
“…Optimizing Irregular Memory Accesses Recent work has made significant strides in domain-agnostic prefetching for irregular applications [10,43,57]. Our split-tree structure can be seen as an application-specific prefetcher and achieves "perfect prefetching" in that 1) off-chip data accesses are overlapped with computation, 2) data needed by the accelerator are readily available on-chip without stalls, and 3) no redundant DRAM accesses are needed.…”
Section: Related Workmentioning
confidence: 99%
“…Optimizing Irregular Memory Accesses Recent work has made significant strides in domain-agnostic prefetching for irregular applications [10,43,57]. Our split-tree structure can be seen as an application-specific prefetcher and achieves "perfect prefetching" in that 1) off-chip data accesses are overlapped with computation, 2) data needed by the accelerator are readily available on-chip without stalls, and 3) no redundant DRAM accesses are needed.…”
Section: Related Workmentioning
confidence: 99%
“…We divide the traditional prefetching algorithms into three broad categories: precomputation-based, temporal, and spatial. Precomputation-based prefetchers (e.g., runahead [46,59,60,63,[95][96][97][98][99][100][101] and helper-thread execution [33,40,41,45,73,74,87,106,119,128,141,144]) pre-execute program code to generate future memory requests. These prefetchers can generate highly-accurate prefetches even when no recognizable pattern exists in program memory requests.…”
Section: Other Related Workmentioning
confidence: 99%