Proceedings of the International Workshop on OpenCL 2013 &Amp; 2014 - IWOCL '14 2014
DOI: 10.1145/2664666.2664674
|View full text |Cite
|
Sign up to set email alerts
|

Performance portability study of linear algebra kernels in OpenCL

Abstract: The performance portability of OpenCL kernel implementations for common memory bandwidth limited linear algebra operations across different hardware generations of the same vendor as well as across vendors is studied. Certain combinations of kernel implementations and work sizes are found to exhibit good performance across compute kernels, hardware generations, and, to a lesser degree, vendors. As a consequence, it is demonstrated that the optimization of a single kernel is often sufficient to obtain good perf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

1
8
0

Year Published

2016
2016
2019
2019

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 12 publications
1
8
0
Order By: Relevance
“…Benchmark results in Section 6 demonstrate that ViennaCL provides performance comparable to or better than vendor-tuned libraries for sparse matrix-vector products and sparse matrix-matrix products. These results complement earlier work, which reported competitive performance of ViennaCL for dense linear algebra operations [44,54]. In addition, benchmark results for pipelined iterative solvers with kernel fusion and two important types of preconditioners allow for a comparison of solver performance on different hardware platforms.…”
supporting
confidence: 85%
See 2 more Smart Citations
“…Benchmark results in Section 6 demonstrate that ViennaCL provides performance comparable to or better than vendor-tuned libraries for sparse matrix-vector products and sparse matrix-matrix products. These results complement earlier work, which reported competitive performance of ViennaCL for dense linear algebra operations [44,54]. In addition, benchmark results for pipelined iterative solvers with kernel fusion and two important types of preconditioners allow for a comparison of solver performance on different hardware platforms.…”
supporting
confidence: 85%
“…The parameters in the device database are also useful for the CUDA compute backend, since it includes the best parameters found for the local and global workgroup sizes. Because architectural differences for GPUs from NVIDIA are smaller than across vendors, we found that only setting proper workgroup sizes at runtime is enough to obtain good performance for memory-bandwidth limited kernels on NVIDIA GPUs [44].…”
Section: Device Databasementioning
confidence: 99%
See 1 more Smart Citation
“…Rupp et al perform an extensive intervendor and intravendor performance portability investigation of OpenCL using miniature linear algebra kernels. Concentrating on structured grid codes, McIntosh‐Smith et al used 3 benchmarks, including the mini app CloverLeaf, to investigate the performance portability of OpenCL across a number of devices.…”
Section: Related Workmentioning
confidence: 99%
“…CloverLeaf has also been used to investigate the performance of the OPS DSL. 26,27 Rupp et al 28 CPU with respect to microscopy image analysis. They concluded that the devices had a significant variance between particular operations, exposing some preference for particular operations.…”
Section: Related Workmentioning
confidence: 99%