FinPar

Andreetta, Christian; Bégot, Vivien; Berthold, Jost; Elsman, Martin; Henglein, Fritz; Henriksen, Troels; Nordfang, Maj-Britt; Oancea, Cosmin E.

doi:10.1145/2898354

Cited by 16 publications

(12 citation statements)

References 62 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We demonstrate the benefits of our approach by applying it to Futhark's incremental flattening analysis and evaluating a number of (i) real-world applications [14,16] from the remote-sensing and financial domains and (ii) benchmarks from standard suites, such as Rodinia [7] and Finpar [3,20]. In comparison with the OpenTuner-based implementation, our method reduces the tuning time by a factor as high as 22.6× and on average 6.4×, and in 5 out of the 11 cases it finds better thresholds that speed-up program execution by as high as 10×.…”

Section: Scope and Contributions Of This Papermentioning

confidence: 99%

“…To make matters even more difficult, the common wisdom does not always hold: in several important cases [3,14] it has been shown that even when the outer parallelism is large enough, exploiting inner levels of parallelism is more efficient, e.g., when the additional parallelism can be mapped to the threads of a Cuda block, and when the intermediate results fit in shared memory. 2 Finally, the best optimization strategy may not even be portable across different generations of the same type of hardware (GPU) from the same vendor [19].…”

Section: Introductionmentioning

confidence: 99%

“…2 Finally, the best optimization strategy may not even be portable across different generations of the same type of hardware (GPU) from the same vendor [19]. 3 In essence, for many important applications, there is no silver-bullet optimization recipe producing one (statically-generated) code version resulting in optimal performance for all datasets and hardware of interest. A rich body of work has been aimed at solving this pervasive problem, for example by applying:…”

Section: Introductionmentioning

confidence: 99%

“…2 In Cuda, shared memory refers to a small and fast memory that is used as a usermanaged cache, and enables inter-thread communication within a block of threads. 3 The LocVolCalib benchmark of FinPar suite [3], run on the large dataset, favors the common-wisdom approach on a Kepler GPU, but prefers exploiting inner levels of parallelism on a Turing GPU. Matters can only worsen across hardware vendors.…”

Section: Introductionmentioning

confidence: 99%

“…The proposed autotuner [19] uses a black-box approach that relies heavily on the stochastic heuristics of OpenTuner [4], but is impractical for application development and mainstream use, as demonstrated in Sect. 4 on a number of standard datasets of real-world applications [14,16] and public benchmarks from Rodinia [7] and FinPar [3] suites: -even relatively "simple" programs, i.e., exhibiting a small number of thresholds, may result in unpredictable and suboptimal tuning times; -the approach does not scale well, because the search space grows exponentially with the number of thresholds, 4 and thus an optimal result that perfectly discriminates between code versions may not be found, even if it exists.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Dataset Sensitive Autotuning of Multi-versioned Code Based on Monotonic Properties

Munksgaard

Breddam

Henriksen

et al. 2021

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Functional languages allow rewrite-rule systems that aggressively generate a multitude of semantically-equivalent but differently-optimized code versions. In the context of GPGPU execution, this paper addresses the important question of how to compose these code versions into a single program that (near-)optimally discriminates them across different datasets. Rather than aiming at a general autotuning framework reliant on stochastic search, we argue that in some cases, a more effective solution can be obtained by customizing the tuning strategy for the compiler transformation producing the code versions.We present a simple and highly-composable strategy which requires that the (dynamic) program property used to discriminate between code versions conforms with a certain monotonicity assumption. Assuming the monotonicity assumption holds, our strategy guarantees that if an optimal solution exists it will be found. If an optimal solution doesn’t exist, our strategy produces human tractable and deterministic results that provide insights into what went wrong and how it can be fixed.We apply our tuning strategy to the incremental-flattening transformation supported by the publicly-available Futhark compiler and compare with a previous black-box tuning solution that uses the popular OpenTuner library. We demonstrate the feasibility of our solution on a set of standard datasets of real-world applications and public benchmark suites, such as Rodinia and FinPar. We show that our approach shortens the tuning time by a factor of $$6\times $$ 6 × on average, and more importantly, in five out of eleven cases, it produces programs that are (as high as $$10\times $$ 10 × ) faster than the ones produced by the OpenTuner-based technique.

show abstract