Predictability of load/store instruction latencies

Abraham, Santosh G.; Sugumar, Rabin A.; Windheiser, Daniel; Rau, B. Ramakrishna; Gupta, Rajiv

doi:10.1109/micro.1993.282750

Cited by 45 publications

(68 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A lower value proved too aggressive in removing load references from using the cache, and a higher value did not remove a sufficient number of load instructions to help performance. Furthermore, the 75% threshold also relates to the memory bandwidth requirements for a cache line replacement (32 bytes) and a 64-bit load reference (8 bytes), and is the same value settled on by [ASWR93]. Table 3 shows the change in cache hit rate and required memory bandwidth after the poorest performing instructions were marked C/NA.…”

Section: Static Methodsmentioning

confidence: 99%

“…[ASWR93] did not look at an extensive set of benchmark programs, we began by performing experiments similar to theirs in which we measured the miss rate associated with individual load and store instructions for a more extensive set of programs. Using the ATOM program trace facilities [SrEu94] and the SPEC92 suite of benchmarks, such statistics were relatively straight-forward to gather.…”

Section: Reducing Cache Misses (Miss Rate)mentioning

confidence: 99%

“…Thus, a cache lookup for an item is unaffected by whether it is marked C/NA or not ---only the allocation on a miss is affected. We looked at both static (similar to [ASWR93]) and dynamic approaches to identifying and marking these C/NA instructions.…”

Section: Analysis Of Caching Potentialmentioning

confidence: 99%

See 2 more Smart Citations

Managing data caches using selective cache line replacement

Tyson

Farrens

Matthews

et al. 1997

Int J Parallel Prog

View full text Add to dashboard Cite

show abstract

Section: Static Methodsmentioning

confidence: 99%

Section: Reducing Cache Misses (Miss Rate)mentioning

confidence: 99%

See 1 more Smart Citation

Managing data caches using selective cache line replacement

Tyson

Farrens

Matthews

et al. 1997

Int J Parallel Prog

View full text Add to dashboard Cite

show abstract

“…When software pipelining is applied in VLIW architectures, where instruction latencies and scheduling are fixed at compile-time, execution time can be highly degraded due to the stall time provoked by dependences with memory instructions. Even if a nonblocking cache is used, true dependences with previous memory operations at a near distance 1 can make the processor to stall afterwards. The choice of scheduling all loads using the cache-miss latency requires considerable ILP and increases register pressure( [1]).…”

Section: Introductionmentioning

confidence: 99%

“…Even if a nonblocking cache is used, true dependences with previous memory operations at a near distance 1 can make the processor to stall afterwards. The choice of scheduling all loads using the cache-miss latency requires considerable ILP and increases register pressure( [1]). …”

Section: Introductionmentioning

confidence: 99%

Software prefetching for software pipelined loops

Sánchez

González

Proceedings of the Thirty-First Hawaii International Conference on System Sciences

View full text Add to dashboard Cite

This paper investigates the interaction between software pipelining and different software prefetching techniques for VLIW machines. It is shown that processor stalls due to memory dependences have a great impact into execution time. A novel heuristic is proposed and it is show to outperform previous proposals. IntroductionSoftware pipelining represents a family of loop scheduling techniques that tries to exploit ILP by executing in parallel consecutive iterations of a loop. The most popular scheme is called modulo scheduling, and it consists of finding a fixed pattern of operations (of length II or initiation interval) from distinct iterations ([3]).Several schemes have been proposed in the literature with the goal of minimize the II and/or register pressure, but none of them has evaluated the effect of memory. When software pipelining is applied in VLIW architectures, where instruction latencies and scheduling are fixed at compile-time, execution time can be highly degraded due to the stall time provoked by dependences with memory instructions. Even if a nonblocking cache is used, true dependences with previous memory operations at a near distance 1 can make the processor to stall afterwards. The choice of scheduling all loads using the cache-miss latency requires considerable ILP and increases register pres-Different techniques to improve memory behavior exist and are well-known, and software prefetching is one of them. The main idea of this method is to bring to cache the data that will be used in a near future ([2]).In this paper we investigate the interactions between software pipelining and software prefetching in a VLIW architecture. Some alternatives to perform software prefetching are described, and a novel heuristic is presented. An evaluation in execution time terms is reported as well as some conclusions.1.Almost all modulo scheduling schemes use a fixed cache-hit latency for all memory operations Software prefetching schemesSoftware prefetching is an effective technique to tolerate memory latency. When it is used with a nonblocking cache, this technique allows the processor to hide part or all the memory latency by overlapping the fetch of data and the computation.Software prefetching can be performed through two alternative schemes: binding and nonbinding prefetching. The first alternative, also known as early scheduling of memory operations, moves memory instructions away from those instructions that depend on them. The second alternative introduces in the code special instructions, which are called prefetch instructions. These are nonfaulting instructions that perform a cache lookup but do not modify any register.In the study presented in this paper we have evaluated two techniques of binding prefetching:• Early scheduling always (ESA): all memory operations of the loop are scheduled using cache-miss latency.• Early scheduling according to locality (ESL): schedule instructions that have some type of locality using the cache-hit latency and schedule the remaining ones using the cache-mis...

show abstract

Software Data Prefetching for Software Pipelined Loops

Sánchez

González

1999

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Predictability of load/store instruction latencies

Cited by 45 publications

References 15 publications

Managing data caches using selective cache line replacement

Managing data caches using selective cache line replacement

Software prefetching for software pipelined loops

Software Data Prefetching for Software Pipelined Loops

Contact Info

Product

Resources

About