Proceedings of the 26th International Conference on Compiler Construction 2017
DOI: 10.1145/3033019.3033023
|View full text |Cite
|
Sign up to set email alerts
|

Optimization space pruning without regrets

Abstract: Many computationally-intensive algorithms benefit from the wide parallelism offered by Graphical Processing Units (GPUs). However, the search for a close-to-optimal implementation remains extremely tedious due to the specialization and complexity of GPU architectures.We present a novel approach to automatically discover the best performing code from a given set of possible implementations. It involves a branch and bound algorithm with two distinctive features: (1) an analytic performance model of a lower bound… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…However, the influence of invalid configurations is not discussed, as they are mostly focussed on x86 and GPU architectures, where this issues is much less dominant. Another approach is followed by TELAMON [4], which avoids invalid configurations by relying on a constraint based, manually crafted hardware model. The model predicts the upper performance bound, while avoiding the construction of invalid configurations.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…However, the influence of invalid configurations is not discussed, as they are mostly focussed on x86 and GPU architectures, where this issues is much less dominant. Another approach is followed by TELAMON [4], which avoids invalid configurations by relying on a constraint based, manually crafted hardware model. The model predicts the upper performance bound, while avoiding the construction of invalid configurations.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…Coloured petri nets [20] were proposed for GPGPU performance modelling. Another approach [3] builds an analytical performance model to determine the lower bound on execution time. Low-level GPU ISA solving and assembly microbenchmarking [38] has been used to collect data about architectural features and performance.…”
Section: Related Workmentioning
confidence: 99%
“…We complement the TAG algorithm with a performance model of the candidates [2]. The model provides a lower bound on the execution time of all implementations derivable from a candidate.…”
Section: Search Strategymentioning
confidence: 99%
“…The lower bound performance model mentioned in Section 5 could not work if it just had access to an intermediate implementation in the compilation process. A similar performance model relying on ad-hoc partial implementations was previously introduced [2]. We generalize the idea by encoding partial implementations as a CSP problem on top of a semantic backbone.…”
Section: Global Heuristicsmentioning
confidence: 99%