Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367)
DOI: 10.1109/isca.1999.765938
|View full text |Cite
|
Sign up to set email alerts
|

Speculation techniques for improving load related instruction scheduling

Abstract: State of the art microprocessors achieve high performance by executing multiple instructions per cycle. In an out-oforder engine, the instruction scheduler is responsible for dispatching instructions to execution units based on dependencies, latencies, and resource availability. Most existing instruction schedulers are doing a less than optimal job of scheduling memory accesses and instructions dependent on them, for the following reasons:• Memory dependencies cannot be resolved prior to execution, so loads ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
78
0

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 76 publications
(78 citation statements)
references
References 19 publications
0
78
0
Order By: Relevance
“…Thus, we do not require such costly dynamic techniques. In this paper, we show that a simple ld/st vectorization is useful (in the context of scientific loops) to solve the same problems tackled in [1,5,7,4]. Coupling our costless software optimization technique with the actual imprecise memory disambiguation mechanisms is less expensive than pure hardware methods, giving nonetheless good performance improvement.…”
Section: Related Workmentioning
confidence: 88%
See 2 more Smart Citations
“…Thus, we do not require such costly dynamic techniques. In this paper, we show that a simple ld/st vectorization is useful (in the context of scientific loops) to solve the same problems tackled in [1,5,7,4]. Coupling our costless software optimization technique with the actual imprecise memory disambiguation mechanisms is less expensive than pure hardware methods, giving nonetheless good performance improvement.…”
Section: Related Workmentioning
confidence: 88%
“…Even if we do not avoid all situations of bad relative array offsets in all hardware platforms, and thus few memory disambiguation penalties persist, we showed that we still get high speedups in all experimented processors (up to 54% of perfor-mance gain). This simple software solution coupled with imprecise memory disambiguation mechanisms are less expensive than sophisticated totally hardware approaches such as [1,6,5,7,4].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Chkmiss is an informing memory operation [30] which provides early warning on upcoming stalling code, essential for a timely control flow change to the alternative execution path. Lightweight techniques to predict misses in the cache hierarchy have been proposed [77,80] and refined to detect a last-level cache (LLC) miss in one cycle [62]. We encode the presence of an LLC cache line in the TLB entries, using a simple bitmap (e.g., 64 bits for 64-byte cache lines in a 4kB page).…”
Section: Chkmissmentioning
confidence: 99%
“…We introduce a scalable SQ design that implements store-load forwarding without associative search. As each dynamic load is renamed, we use store-load dependence prediction [3,9,22] to predict the single in-flight store from which that load is most likely to forward. As illustrated in Figure 1(b), when a load executes, it accesses the SQ only at this predicted index, not associatively.…”
Section: Introductionmentioning
confidence: 99%