Directive-Based Compilers for GPUs

Ghike, Swapnil; Tejero, Rubén Gran; Garzarán, María Jesús; Padua, David

doi:10.1007/978-3-319-17473-0_2

Cited by 7 publications

(4 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The XMP extensions are in charge of providing distributed arrays with a small subset of the array operations of H 2 TAs and without their tile‐level features. As for heterogeneity, the fact that XcalableACC relies on OpenACC reduces its portability compared to OpenCL, which we use as backend, and sometimes also the performance, as OpenACC has been found to often offer considerably less performance than manually optimized kernels . In addition, unlike H 2 TAs, OpenACC requires explicit annotations for data movements between each host and its device(s).…”

Section: Related Workmentioning

confidence: 99%

Heterogeneous distributed computing based on high‐level abstractions

Viñas

Fraguela

Andrade

et al. 2018

Concurrency and Computation

View full text Add to dashboard Cite

Summary The rise of heterogeneous systems has given place to great challenges for users as they involve new concepts, restrictions, and frameworks. Their exploitation is further complicated in the context of distributed memory systems, which require the usage of additional different programming paradigms and tools. In this paper, we propose a novel approach to program heterogeneous clusters that is based on high‐level abstractions such as tiles and hierarchical decomposition combined with the powerful APIs that data types and embedded languages can provide in languages such as C++. Rather than building our proposal from scratch, we have implemented it as a natural integration of the existing Hierarchically Tiled Arrays (HTA) and Heterogeneous Programming Library (HPL) projects, ie, the first one being focused on distributed computing and the second one on heterogeneous processing. The result, called Heterogeneous Hierarchically Tiled Arrays (H2TA), is very intuitive and easy to use thanks to the global view of the data and the single‐threaded view of the execution that it provides at cluster level together with the transparency it provides with respect to the management of the heterogeneous devices. An evaluation comparing our proposal with MPI‐based implementations shows its large programmability advantages and the reasonable overhead incurred.

show abstract

Section: Related Workmentioning

confidence: 99%

Heterogeneous distributed computing based on high‐level abstractions

Viñas

Fraguela

Andrade

et al. 2018

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…The emergence of such systems has led to a resurgence of interest in parallelizing compilers. OpenACC, for instance, has been a target of several different compilers, such as AccUll [Reyes et al 2012], ipmacc [Lashgar et al 2014], OpenARC [Lee and Vetter 2014], and pgcc [Ghike et al 2014]. Similarly, OpenMP 4.0 is already supported by several mainstream compilers, including gcc 4.9.0 (for C/C++), gcc 4.9.1 (for Fortran), icc 15.0 (C/C++/Fortran), and LLVM's Clang 3.7, which offers partial support to OpenMP 4.0 for C/C++.…”

Section: Related Workmentioning

confidence: 99%

DawnCC

Mendonca¹,

Guimarães²,

Alves³

et al. 2017

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Directive-based programming models, such as OpenACC and OpenMP, allow developers to convert a sequential program into a parallel one with minimum human intervention. However, inserting pragmas into production code is a difficult and error-prone task, often requiring familiarity with the target program. This difficulty restricts the ability of developers to annotate code that they have not written themselves. This article provides a suite of compiler-related methods to mitigate this problem. Such techniques rely on symbolic range analysis, a well-known static technique, to achieve two purposes: populate source code with data transfer primitives and to disambiguate pointers that could hinder automatic parallelization due to aliasing. We have materialized our ideas into a tool, DawnCC, which can be used stand-alone or through an online interface. To demonstrate its effectiveness, we show how DawnCC can annotate the programs available in PolyBench without any intervention from users. Such annotations lead to speedups of over 100× in an Nvidia architecture and over 50× in an ARM architecture.

show abstract

“…However, as discussed in Xu et al, the lack of certain directives often makes the exploitation of multiple accelerators under this paradigm challenging for programmers. However, the main concern with this strategy is that compiler‐based approaches strongly depend on the quality of the compiler, often lacking a reasonable performance model and, worse, strongly underperforming with respect to other alternatives due to missing optimization opportunities …”

Section: Related Workmentioning

confidence: 99%

Facilitating the development of stencil applications using the Heterogeneous Programming Library

Viñas

Fraguela

Andrade

et al. 2017

Concurrency and Computation

View full text Add to dashboard Cite

Stencil computations are very common in scientific codes. Heterogeneous systems achieve good results solving these problems, but their programming is complex because of the ghost regions required in multi-device implementations and the difficulty to properly exploit their hardware.The Heterogeneous Programming Library (HPL) is a recent framework that improves the programmability of heterogeneous devices. This paper describes two extensions of HPL focused on stencil computations. The first one allows to automatically update the ghost regions they involve.The second one automates the implementation of the computational kernels of these algorithms.In our evaluation, the first mechanism reduces on average the number of lines of code and the Halstead programming effort of the host code of comparable HPL baselines by 34% and 64.2%, respectively, while the second contribution reduces these metrics by 72% and 79% in the computational kernels, respectively. Also, the first technique has negligible performance overheads, while the second one matches the performance of manually developed kernels. As an added benefit, the facilitation of the development of these codes thanks to these techniques helps programmers experiment with optimizations suited for this applications such as the ghost cell expansion technique, which provides speedups of up to 13% in our experiments.

show abstract

Directive-Based Compilers for GPUs

Cited by 7 publications

References 18 publications

Heterogeneous distributed computing based on high‐level abstractions

Heterogeneous distributed computing based on high‐level abstractions

DawnCC

Facilitating the development of stencil applications using the Heterogeneous Programming Library

Contact Info

Product

Resources

About