Transparent Acceleration for Heterogeneous Platforms With Compilation to OpenCL

Riebler, Heinrich; Vaz, Gavin; Kenter, Tobias; Plessl, Christian

doi:10.1145/3319423

Cited by 7 publications

(3 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several works [7,24,26] automatically detect parallelisable loops in sequential representations and translate the loops to OpenCL kernels. The existing approaches targeting OpenCL follow a similar workflow as OptCL: a sequential program is first converted into an IR.…”

Section: Related Workmentioning

confidence: 99%

OptCL: A Middleware to Optimise Performance for High Performance Domain-Specific Languages on Heterogeneous Platforms

Xiao

Andelfinger

Cai

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Programming on heterogeneous hardware architectures using OpenCL requires thorough knowledge of the hardware. Many High-Performance Domain-Specific Languages (HPDSLs) are aimed at simplifying the programming efforts by abstracting away hardware details, allowing users to program in a sequential style. However, most HPDSLs still require the users to manually map compute workloads to the best suitable hardware to achieve optimal performance. This again calls for knowledge of the underlying hardware and trial-and-error attempts. Further, very often they only consider an offloading mode where computeintensive tasks are offloaded to accelerators. During this offloading period, CPUs remain idle, leaving parts of the available computational power untapped. In this work, we propose a tool named OptCL for existing HPDSLs to enable a heterogeneous co-execution mode when capable where CPUs and accelerators can process data simultaneously. Through a static analysis of data dependencies among compute-intensive code regions and performance predictions, the tool selects the best execution schemes out of purely CPU/accelerator execution or co-execution. We show that by enabling co-execution on dedicated and integrated CPU-GPU systems up to 13× and 21× speed-ups can be achieved.

show abstract

Section: Related Workmentioning

confidence: 99%

OptCL: A Middleware to Optimise Performance for High Performance Domain-Specific Languages on Heterogeneous Platforms

Xiao

Andelfinger

Cai

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Grewe et al [19] translate OMP SMP into OpenCL. HTrOP [26] generates OpenCL applications from the LLVM bitcode. CU2CL [28] and Kim et al [9] translate CUDA into OpenCL to achieve portability.…”

Section: Related Workmentioning

confidence: 99%

“…However, this kind of existing work has a serious performance portability issue as an application implemented in CUDA, by definition, is not portable to non-NVIDIA systems. Other existing work [19,33,26,34] generates OpenCL code that can run on a wide range of parallel hardware including GPUs, CPUs, and FPGAs. Given that OpenCL remains as a low-level programming language that exposes many hardware details, maintaining the generated code is often too difficult for non-expert programmers.…”

Section: Introductionmentioning

confidence: 99%