Proceedings of the 26th Annual International Symposium on Microarchitecture 1993
DOI: 10.1109/micro.1993.282750
|View full text |Cite
|
Sign up to set email alerts
|

Predictability of load/store instruction latencies

Abstract: I n this paper we present a model of coarse grain dataflow execution. We present one top down and two bottom up methods for generation of multithreaded code, and evaluate their effectiveness. The bottom up techniques start form a fine-grain dataflow graph and coalesce this into coarse-grain clusters. The top down technique generates clusters directly from the intermediate data dependence graph used for compiler opfimizations. We discuss the relevant phases in the compilation process. We compare the effectivene… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
66
1

Year Published

1997
1997
2006
2006

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 45 publications
(68 citation statements)
references
References 15 publications
1
66
1
Order By: Relevance
“…A lower value proved too aggressive in removing load references from using the cache, and a higher value did not remove a sufficient number of load instructions to help performance. Furthermore, the 75% threshold also relates to the memory bandwidth requirements for a cache line replacement (32 bytes) and a 64-bit load reference (8 bytes), and is the same value settled on by [ASWR93]. Table 3 shows the change in cache hit rate and required memory bandwidth after the poorest performing instructions were marked C/NA.…”
Section: Static Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…A lower value proved too aggressive in removing load references from using the cache, and a higher value did not remove a sufficient number of load instructions to help performance. Furthermore, the 75% threshold also relates to the memory bandwidth requirements for a cache line replacement (32 bytes) and a 64-bit load reference (8 bytes), and is the same value settled on by [ASWR93]. Table 3 shows the change in cache hit rate and required memory bandwidth after the poorest performing instructions were marked C/NA.…”
Section: Static Methodsmentioning
confidence: 99%
“…[ASWR93] did not look at an extensive set of benchmark programs, we began by performing experiments similar to theirs in which we measured the miss rate associated with individual load and store instructions for a more extensive set of programs. Using the ATOM program trace facilities [SrEu94] and the SPEC92 suite of benchmarks, such statistics were relatively straight-forward to gather.…”
Section: Reducing Cache Misses (Miss Rate)mentioning
confidence: 99%
See 1 more Smart Citation
“…When software pipelining is applied in VLIW architectures, where instruction latencies and scheduling are fixed at compile-time, execution time can be highly degraded due to the stall time provoked by dependences with memory instructions. Even if a nonblocking cache is used, true dependences with previous memory operations at a near distance 1 can make the processor to stall afterwards. The choice of scheduling all loads using the cache-miss latency requires considerable ILP and increases register pressure( [1]).…”
Section: Introductionmentioning
confidence: 99%
“…Even if a nonblocking cache is used, true dependences with previous memory operations at a near distance 1 can make the processor to stall afterwards. The choice of scheduling all loads using the cache-miss latency requires considerable ILP and increases register pressure( [1]). …”
Section: Introductionmentioning
confidence: 99%