2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) 2017
DOI: 10.1109/cgo.2017.7863749
|View full text |Cite
|
Sign up to set email alerts
|

Software prefetching for indirect memory accesses

Abstract: Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting proposition to solve this is software prefetching, where special non-blocking loads are used to bring data into the cache hierarchy just before being required. However, these are difficult to insert to effectively improve performance, and techniques for automatic insertion are currently limited. This paper develops a novel compiler pass to automatically generate software prefetches for indirect memory accesses, a special… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

3
51
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 42 publications
(54 citation statements)
references
References 19 publications
3
51
0
Order By: Relevance
“…Software Prefetching: Software prefetching [28][29][30][31][32][33] provides a way for programmers to insert prefetching instructions into a program targeting various simple and complex patterns. In Ainsworth [34], while the insertion of software prefetches for indirect memory accesses is automated and eliminates the requirement for programmer effort, it cannot guarantee insertion of the instructions in an optimized way for a specific architecture. Furthermore, significant instruction overhead may offset its benefits.…”
Section: Related Workmentioning
confidence: 99%
“…Software Prefetching: Software prefetching [28][29][30][31][32][33] provides a way for programmers to insert prefetching instructions into a program targeting various simple and complex patterns. In Ainsworth [34], while the insertion of software prefetches for indirect memory accesses is automated and eliminates the requirement for programmer effort, it cannot guarantee insertion of the instructions in an optimized way for a specific architecture. Furthermore, significant instruction overhead may offset its benefits.…”
Section: Related Workmentioning
confidence: 99%
“…In contrast, SWOOP reuses previously computed values and loaded data to reduce overhead and its execution is triggered and hidden by hardware stalls that would be pure performance loss otherwise. Software prefetching [3,21,39,73] lacks precision. Code containing complex control flow cannot rely on softwareprefetch instructions inserted a few iterations ahead.…”
Section: Related Workmentioning
confidence: 99%
“…In contrast, SWOOP reuses previously computed values and loaded data to reduce overhead and its execution is triggered and hidden by hardware stalls that would be pure performance loss otherwise. Software prefetching [3,21,39,73] lacks precision. Code containing complex control low cannot rely on softwareprefetch instructions inserted a few iterations ahead.…”
Section: Related Workmentioning
confidence: 99%