Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems - 1991
DOI: 10.1145/106972.106979
|View full text |Cite
|
Sign up to set email alerts
|

Software prefetching

Abstract: We present an approach, called software prefetching, to reducing cache miss latencies. By providing a nonblocking prefetch instruction that causes data at a specified memory address to be brought into cache, the compiler can overlap the memory latency with other computation. Our simulations show that, even when generated by a very simple compiler algorithm, prefetch instructions can eliminate nearly all cache misses, while causing only modest increases in data traffic between memory and cache. IBM corporation … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
57
0

Year Published

1996
1996
2017
2017

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 342 publications
(57 citation statements)
references
References 10 publications
0
57
0
Order By: Relevance
“…Finally, low level optimizations at the CPU pipeline include several well-known techniques. These techniques may be categorized into loop transformations [9], data access [2] and streaming optimizations (SMP, SIMD and MIMD).…”
Section: Boosting Numerical Codesmentioning
confidence: 99%
“…Finally, low level optimizations at the CPU pipeline include several well-known techniques. These techniques may be categorized into loop transformations [9], data access [2] and streaming optimizations (SMP, SIMD and MIMD).…”
Section: Boosting Numerical Codesmentioning
confidence: 99%
“…The required data for the scalar execution is loaded into a software-controlled data cache near to the scalar registers. To reduce data miss penalty, we applied software pre-fetching techniques 20) where pre-fetch or pre-load instructions are inserted automatically by the compiler or manually by the programmer to bring data ahead of its use. The pre-load instruction causes a matrix block to be brought from the main memory to the data cache.…”
Section: Architecture Modelmentioning
confidence: 99%
“…The pre-load instruction causes a matrix block to be brought from the main memory to the data cache. This pre-load instruction looks like a load instruction except no register is specified 20) . To preserve the integrity of data between the scalar unit data cache and the main memory, the altered blocks in the data cache must be written back (or post-stored) into the main memory before switching the computing to the matrix unit.…”
Section: Architecture Modelmentioning
confidence: 99%
“…Software prefetching is an effective technique to tolerate memory latency [4]. Software prefetching can be performed through two alternative schemes: binding and nonbinding prefetching.…”
Section: Introductionmentioning
confidence: 99%
“…The use of binding and nonbinding prefetching has been previously studied in [13,1] and [4,9,14,18,3], respectively, among others. However, there are very few works analyzing the interactions of these prefetching schemes with software pipelining techniques.…”
Section: Introductionmentioning
confidence: 99%