2016
DOI: 10.1145/2898354
|View full text |Cite
|
Sign up to set email alerts
|

FinPar

Abstract: Commodity many-core hardware is now mainstream, but parallel programming models are still lagging behind in efficiently utilizing the application parallelism. There are (at least) two principal reasons for this. First, real-world programs often take the form of a deeply nested composition of parallel operators, but mapping the available parallelism to the hardware requires a set of transformations that are tedious to do by hand and beyond the capability of the common user. Second, the best optimization strateg… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
12
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 16 publications
(12 citation statements)
references
References 62 publications
0
12
0
Order By: Relevance
“…We demonstrate the benefits of our approach by applying it to Futhark's incremental flattening analysis and evaluating a number of (i) real-world applications [14,16] from the remote-sensing and financial domains and (ii) benchmarks from standard suites, such as Rodinia [7] and Finpar [3,20]. In comparison with the OpenTuner-based implementation, our method reduces the tuning time by a factor as high as 22.6× and on average 6.4×, and in 5 out of the 11 cases it finds better thresholds that speed-up program execution by as high as 10×.…”
Section: Scope and Contributions Of This Papermentioning
confidence: 99%
See 4 more Smart Citations
“…We demonstrate the benefits of our approach by applying it to Futhark's incremental flattening analysis and evaluating a number of (i) real-world applications [14,16] from the remote-sensing and financial domains and (ii) benchmarks from standard suites, such as Rodinia [7] and Finpar [3,20]. In comparison with the OpenTuner-based implementation, our method reduces the tuning time by a factor as high as 22.6× and on average 6.4×, and in 5 out of the 11 cases it finds better thresholds that speed-up program execution by as high as 10×.…”
Section: Scope and Contributions Of This Papermentioning
confidence: 99%
“…To make matters even more difficult, the common wisdom does not always hold: in several important cases [3,14] it has been shown that even when the outer parallelism is large enough, exploiting inner levels of parallelism is more efficient, e.g., when the additional parallelism can be mapped to the threads of a Cuda block, and when the intermediate results fit in shared memory. 2 Finally, the best optimization strategy may not even be portable across different generations of the same type of hardware (GPU) from the same vendor [19].…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations