2015 IEEE International Parallel and Distributed Processing Symposium 2015
DOI: 10.1109/ipdps.2015.103
|View full text |Cite
|
Sign up to set email alerts
|

Compiler-Directed Transformation for Higher-Order Stencils

Abstract: Abstract-As the cost of data movement increasingly dominates performance, developers of finite-volume and finite-difference solutions for partial differential equations (PDEs) are exploring novel higher-order stencils that increase numerical accuracy and computational intensity. This paper describes a new compiler reordering transformation applied to stencil operators that performs partial sums in buffers, and reuses the partial sums in computing multiple results. This optimization has multiple effects on impr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 34 publications
(20 citation statements)
references
References 37 publications
0
20
0
Order By: Relevance
“…To ensure a tight coupling, several prior efforts on guiding register allocation or instruction scheduling were implemented as a compiler pass in research/prototype compilers [7,16,20,41,45], or open-source production compilers [29,46]. However, like some other recent efforts [6,28,50], we implement our reordering optimization at source level for the following reasons: (1) it allows external optimizations for closed-source compilers like NVCC; (2) it allows us to perform transformations like exposing FMAs using operator distributivity, and performing kernel fusion/fission, which can be performed more effectively and efficiently at source level; and (3) it is input-dependent, not machine-or compilerdependent -with an implementation coupled to compiler passes, it would have to be re-implemented across compilers with different intermediate representation. Our framework massages the input to a form that is more amenable to further optimizations by any GPU compiler, and we use appropriate compilation flags whenever possible to ensure that our reordering optimization is not undone by the compiler passes.…”
Section: Experimental Evaluationmentioning
confidence: 99%
See 1 more Smart Citation
“…To ensure a tight coupling, several prior efforts on guiding register allocation or instruction scheduling were implemented as a compiler pass in research/prototype compilers [7,16,20,41,45], or open-source production compilers [29,46]. However, like some other recent efforts [6,28,50], we implement our reordering optimization at source level for the following reasons: (1) it allows external optimizations for closed-source compilers like NVCC; (2) it allows us to perform transformations like exposing FMAs using operator distributivity, and performing kernel fusion/fission, which can be performed more effectively and efficiently at source level; and (3) it is input-dependent, not machine-or compilerdependent -with an implementation coupled to compiler passes, it would have to be re-implemented across compilers with different intermediate representation. Our framework massages the input to a form that is more amenable to further optimizations by any GPU compiler, and we use appropriate compilation flags whenever possible to ensure that our reordering optimization is not undone by the compiler passes.…”
Section: Experimental Evaluationmentioning
confidence: 99%
“…Basu et al [6] propose a partial sum optimization implemented within the CHiLL compiler [23]. The partial sums are computed over planes for 3D stencils, and redundant computation is eliminated by performing array common subexpression elimination (CSE) [15].…”
Section: Related Workmentioning
confidence: 99%
“…In fact, many scientific computations are compiled using -ffast-math flag [4], which allows a compiler to exploit associativity of floating-point operations to improve performance at the expense of IEEE compliance. Many recent efforts have leveraged operator associativity to drive code optimization strategies [1], [5], [6].…”
Section: Background and Motivationmentioning
confidence: 99%
“…After applying optimizations specified in the script, CHiLL generates optimized C (or Fortran) code. Recently CHiLL has been extended to generate OpenMP code [2].…”
Section: Chill Backgroundmentioning
confidence: 99%