International Symposium on Code Generation and Optimization, 2004. CGO 2004.
DOI: 10.1109/cgo.2004.1281683
|View full text |Cite
|
Sign up to set email alerts
|

Static identification of delinquent loads

Abstract: The effective use of processor caches is crucial to the performance of applications. It

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
17
0
5

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 17 publications
(23 citation statements)
references
References 13 publications
1
17
0
5
Order By: Relevance
“…Empirically, we have observed that a small number of load instructions account for more than 90% of the total data stalls that a program suffers (see Table 1). Our results are in accord with prior work [2,31] that also observed the number of delinquent loads in a program is small when compared to the total number of loads in the same program. This characteristic allows the prefetching system to focus the memory optimizations to a manageable set of instructions, and indeed, we exploit this characteristic in our work.…”
Section: Delinquent Load Selectionsupporting
confidence: 83%
“…Empirically, we have observed that a small number of load instructions account for more than 90% of the total data stalls that a program suffers (see Table 1). Our results are in accord with prior work [2,31] that also observed the number of delinquent loads in a program is small when compared to the total number of loads in the same program. This characteristic allows the prefetching system to focus the memory optimizations to a manageable set of instructions, and indeed, we exploit this characteristic in our work.…”
Section: Delinquent Load Selectionsupporting
confidence: 83%
“…Our technique identifies exactly those load and store instructions that tend to cause many cache misses. Panait et al [6] proposed a technique to statically identify the load instructions that cause many cache misses. They call such a load instruction a delinquent load.…”
Section: Related Workmentioning
confidence: 99%
“…Previous research [19,20,22,30] has demonstrated various compiler optimizations capable of automatically analyzing delinquent loads from a program and generating effective data-prefetching helper threads. The register partitioning optimization geared towards achieving minimal VMT thread switch overhead is yet another enhancement to such optimizing compiler infrastructure development.…”
Section: Fly-weight Vmt Thread Context Switchmentioning
confidence: 99%