Fine‐grain data management directory for OpenMP 4.0 and OpenACC

Jaeger, Julien; Carribault, Patrick; Pérache, Marc

doi:10.1002/cpe.3352

Cited by 7 publications

(3 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Annotation systems. Annotation systems such as OpenMP 4.0 [Jaeger et al 2015], OpenSs [Meenderinck and Juurlink 2011], and Open- ACC Standard [2013] are a simple, yet powerful alternative to the development of high-performance software. Such systems are not a programming language per se; rather, they work as a metalanguage, which, once combined with a host language, typically Fortran, or C, let developers inject parallel semantics into its standard syntax.…”

Section: Related Workmentioning

confidence: 99%

“…And currently, directive-based annotation systems stand out among the different techniques used to program these machines. Examples of such systems include OpenMP [Jaeger et al 2015], OpenACC [OpenACC Standard 2013], OpenHMPP [Andión et al 2016], OpenMPC [Lee and Eigenmann 2010], and OpenSs [Meenderinck and Juurlink 2011]. The annotation-based programming model is simple yet appealing: annotations are a metalanguage, which give developers the ability to grant parallel semantics to syntax originally written to run sequentially.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

DawnCC

Mendonca¹,

Guimarães²,

Alves³

et al. 2017

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Directive-based programming models, such as OpenACC and OpenMP, allow developers to convert a sequential program into a parallel one with minimum human intervention. However, inserting pragmas into production code is a difficult and error-prone task, often requiring familiarity with the target program. This difficulty restricts the ability of developers to annotate code that they have not written themselves. This article provides a suite of compiler-related methods to mitigate this problem. Such techniques rely on symbolic range analysis, a well-known static technique, to achieve two purposes: populate source code with data transfer primitives and to disambiguate pointers that could hinder automatic parallelization due to aliasing. We have materialized our ideas into a tool, DawnCC, which can be used stand-alone or through an online interface. To demonstrate its effectiveness, we show how DawnCC can annotate the programs available in PolyBench without any intervention from users. Such annotations lead to speedups of over 100× in an Nvidia architecture and over 50× in an ARM architecture.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

DawnCC

Mendonca¹,

Guimarães²,

Alves³

et al. 2017

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

show abstract

“…In , the authors propose the design of a directory, along with a reduced runtime application binary interface, to handle data management between a host and accelerators in the OpenMP 4.0 and OpenACC standards. Some extensions were added to the directory to allow more flexibility when handling subarrays in the data clauses, including support for unstructured data lifetime.…”

mentioning

confidence: 99%

Parallel computing on graphics processing units and heterogeneous platforms

Bientinesi

Herrero

Quintana–Ort́ı

et al. 2014

Concurrency and Computation

View full text Add to dashboard Cite

During the past decade, high-performance computing evolved toward multi-core and many-core architectures. General-purpose processors feature now dozens of coarse-grain (complex) cores each with four to eight SIMD lanes for parallel computation and multi-channel memory buses for high bandwidth. Hardware accelerators such as graphics processing units (GPUs) have also a two-stage design with multiple coarse units that contain an even higher number of SIMD lanes and wider memory buses for higher bandwidth. Therefore, the adoption of hardware accelerators is rapidly advancing in performance sensitive areas. They are particularly relevant in high-throughput disciplines such as high-quality 3D computer graphics and vision, real-time data stream processing, and high-performance scientific computing. The main reason behind this trend is that these accelerators can potentially yield speedups and energy savings orders of magnitude higher than those obtained with optimized implementations for general-purpose CPU cores. A clear indicator of this trend is the prevalence of these accelerators in the supercomputing systems in the top positions of both the TOP500 and Green500 lists. As a result, during the past few years, these architectures have become powerful, capable, and inexpensive mainstream coprocessors, useful for a wide variety of applications. Furthermore, they are nowadays present in a large variety of machines, ranging from low-end single user-platforms to supercomputers.However, the benefits of heterogeneous systems do not come 'for free': scientists using these platforms have to deal not only with multiple parallelism levels, but also with the programmability differences of available accelerators. To address these challenges, we observe the development of a very rich environment for their programming, particularly in comparison with the restricted landscape of only a few years ago. A key criterion to characterize the new high-level programming tools and libraries for these devices is their positioning within the triangle of performance, coding comfort and specialization. The spectrum ranges from high-performance building blocks for common numeric or discrete transformations, to domain-specific libraries that facilitate the solution of a certain class of problems, and to general high-level abstractions targeted toward increasing programmers' productivity.In summary, the advances both in the hardware and in the programmability of accelerators, coupled with their potentially appealing performance/power ratio for a wide range of applications, have pushed organizations to invest in heterogeneous systems that include accelerators and have motivated researchers to port their algorithms to such systems and develop novel tools to facilitate their usage.This special issue contributes to this important field with extended and carefully reviewed versions of selected papers from two workshops, namely the 3rd Minisymposium on GPU Computing, which was held as

show abstract