2019
DOI: 10.1109/lca.2019.2910518
|View full text |Cite
|
Sign up to set email alerts
|

Precise Runahead Execution

Abstract: Runahead execution improves processor performance by accurately prefetching long-latency memory accesses. When a long-latency load causes the instruction window to fill up and halt the pipeline, the processor enters runahead mode and keeps speculatively executing code to trigger accurate prefetches. A recent improvement tracks the chain of instructions that leads to the long-latency load, stores it in a runahead buffer, and executes only this chain during runahead execution, with the purpose of generating more… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 14 publications
0
8
0
Order By: Relevance
“…Hardware prefetchers can pick up a variety of memoryaccess patterns, but to achieve the instruction-level visibility necessary to calculate the addresses of complex access patterns in today's workloads [1], one must operate within the core, instead of within the cache. Runahead execution [8,9] is the most promising technique to achieve this.…”
Section: Existing Runahead Techniquesmentioning
confidence: 99%
See 4 more Smart Citations
“…Hardware prefetchers can pick up a variety of memoryaccess patterns, but to achieve the instruction-level visibility necessary to calculate the addresses of complex access patterns in today's workloads [1], one must operate within the core, instead of within the cache. Runahead execution [8,9] is the most promising technique to achieve this.…”
Section: Existing Runahead Techniquesmentioning
confidence: 99%
“…First, by skipping over loads for which the data source is not yet ready, it is unsuitable for today's complex indirection patterns that consist of chains of dependent load misses. Second, conventional runahead is limited by both the processor's front-end (fetch/decode/rename) width and available back-end resources (issue queue slots and physical registers) [9]. What is needed is a technique that can overcome the limitations of a processor's resources to generate massive amounts of memory-level parallelism and follow chains of dependent loads to completion, prefetching all data required for many memory accesses in the future.…”
Section: Existing Runahead Techniquesmentioning
confidence: 99%
See 3 more Smart Citations