PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusion

Ying, Liu; Huang, Lei; Wu, Ming-Chuan; Cui, Huimin; Lv, Fang; Feng, Xiaobing; Xue, Jingling

doi:10.1145/3302516.3307350

Cited by 3 publications

(8 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…So there are multiple attempts to improve crossplatform performance portability for OpenCL platforms. One of them is a source‐to‐source OpenCL compiler, PPOpenCL 5 . It was implemented in Clang and is based on fusing the host and kernel thread codes of an OpenCl program.…”

Section: Related Workmentioning

confidence: 99%

“…The main ideas of OpenCL are expressed based on the following hierarchy of models: (i) platform model, (ii) execution model, (iii) memory, and (iv) programming models. In particular, OpenCL provides a platform‐independent abstract platform model that allows arranging computations and data access 5 . This model is based on the host‐centric view, where the platform consists of a host connected to one or more OpenCL compute devices (e.g., GPUs).…”

Section: Heterogeneous Computing Systems and Programming Modelsmentioning

confidence: 99%

“…Heterogeneous computing architectures have become more and more popular in many application areas 1‐4 . Today, heterogeneous platforms are deployed in many setups, ranging from low‐power mobile systems to high‐performance computing systems 5 . The composition of a general‐purpose host CPU combined with specialized computing devices (e.g., GPU, FPGA, or AI accelerator) allows many users to speed up their codes significantly, 3,6 and improve the performance per Watt ratio .…”

Section: Introductionmentioning

confidence: 99%

“…However, the diversity of accelerators makes crossplatform programming a big challenge, 5 thus forcing programmers to write and maintain multiple source code for an application on different platforms, for example, CUDA 10 for NVIDIA GPUs and OpenMP for CPUs 11 . OpenCL 12 addresses this cross‐platform programming challenge by providing a unified programming interface for various heterogeneous systems.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Exploration of OpenCL Heterogeneous Programming for Porting Solidification Modeling to CPU‐GPU Platforms

Halbiniak

Szustak

Olas

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

Summary This article provides a comprehensive study of OpenCL heterogeneous programming for porting applications to CPU–GPU computing platforms, with a real‐life application for the solidification modeling. The aim is to achieve a flexible workload distribution between available CPU–GPU resources and optimize application performance. Considering the solidification application as a use case, we explore the necessary steps required for (i) adaptation of an application to CPU–GPU platforms, and (ii) mapping the application workload onto the OpenCL programming model. The adaptation is based on a reformulation of steps developed previously for CPU–MIC architectures. The mapping process allows us to utilize OpenCL for harnessing CPU and GPU cores using data parallelism, as well as for the management of available compute devices with task parallelism. The resulting OpenCL code's performance and energy efficiency is experimentally studied for two platforms with powerful GPUs of various generations (with Kepler and Volta architectures). The experiments confirm the performance advantage of using computing resources of both GPUs and CPUs. The achieved benefit depends on the relationship between the computing power of CPUs and GPUs. Moreover, this gain entails the growth of the average power that increases the energy consumed during the application execution.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Heterogeneous Computing Systems and Programming Modelsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Exploration of OpenCL Heterogeneous Programming for Porting Solidification Modeling to CPU‐GPU Platforms

Halbiniak

Szustak

Olas

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…The work-items are organized into work-groups with each work-group running on one CU, which is mapped to one CPE in SWCL. Thus the workitems are executed on the CPE serially, which is referred to as serial execution mode in [41]. Work-groups are statically assigned to CPEs using block distribution, by introducing an explicit loop nest, i.e., a work-group loop for each CPE, and work-group barriers are thus supported by loop fission, as in POCL [27], MOCL [72] and SNU-OCL [32].…”

Section: Basic Opencl Implementation On Sw26010mentioning

confidence: 99%

Bandwidth-Aware Loop Tiling for DMA-Supported Scratchpad Memory

Liu

Cui

et al. 2020

Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques

Self Cite

View full text Add to dashboard Cite

Scratchpad Memory (SPM) is widely used in emerging domain-specific architectures and accelerators for improving energy efficiency and time predictability. Typically, SPM-based architectures use DMA for fetching data from off-chip memory and global load instructions for loading fine-grained data directly into registers. For such architectures, neither capacity-only nor bandwidthonly loop tiling can efficiently use the bandwidth and SPM. This paper introduces a bandwidth-aware loop tiling approach that enables a tradeoff between SPM space utilization and bandwidth utilization to be made, by leveraging a runtime tiling framework and a cross-host-kernel IPA. Experimental results demonstrate that our approach can achieve the performance improvement of up to 4x, with a geometric average of 26%.

show abstract

HALO 1.0: A Hardware-agnostic Accelerator Orchestration Framework for Enabling Hardware-agnostic Programming with True Performance Portability for Heterogeneous HPC

Riera¹,

Bank-Tavakoli

Quraishi³

et al. 2020

Preprint

View full text Add to dashboard Cite

PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusion

Cited by 3 publications

References 47 publications

Exploration of OpenCL Heterogeneous Programming for Porting Solidification Modeling to CPU‐GPU Platforms

Exploration of OpenCL Heterogeneous Programming for Porting Solidification Modeling to CPU‐GPU Platforms

Bandwidth-Aware Loop Tiling for DMA-Supported Scratchpad Memory

HALO 1.0: A Hardware-agnostic Accelerator Orchestration Framework for Enabling Hardware-agnostic Programming with True Performance Portability for Heterogeneous HPC

Contact Info

Product

Resources

About