Introducing and Implementing the Allpairs Skeleton for Programming Multi-GPU Systems

Steuwer, Michel; Friese, Malte; Albers, Sebastian; Gorlatch, Sergei

doi:10.1007/s10766-013-0265-6

Cited by 9 publications

(7 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Allpairs skeleton [24] in SkelCL can be considered as a variant of MapPairs that accepts matrix operands only; any reduction needs be implemented as part of the user function in Allpairs (i.e., by nesting), while we provide the combination MapPairsReduce (i.e., chaining). MapPairs and MapPairsReduce specifi-cally support multiple separate 1D vector operands in both dimensions, as requested for use with the MetalWalls [19] application by EXA2PRO project partner CNRS.…”

Section: Related Workmentioning

confidence: 99%

SkePU 3: Portable High-Level Programming of Heterogeneous Systems and HPC Clusters

Ernstsson

Ahlqvist

Zouzoula

et al. 2021

Int J Parallel Prog

View full text Add to dashboard Cite

We present the third generation of the C++-based open-source skeleton programming framework SkePU. Its main new features include new skeletons, new data container types, support for returning multiple objects from skeleton instances and user functions, support for specifying alternative platform-specific user functions to exploit e.g. custom SIMD instructions, generalized scheduling variants for the multicore CPU backends, and a new cluster-backend targeting the custom MPI interface provided by the StarPU task-based runtime system. We have also revised the smart data containers’ memory consistency model for automatic data sharing between main and device memory. The new features are the result of a two-year co-design effort collecting feedback from HPC application partners in the EU H2020 project EXA2PRO, and target especially the HPC application domain and HPC platforms. We evaluate the performance effects of the new features on high-end multicore CPU and GPU systems and on HPC clusters.

show abstract

Section: Related Workmentioning

confidence: 99%

SkePU 3: Portable High-Level Programming of Heterogeneous Systems and HPC Clusters

Ernstsson

Ahlqvist

Zouzoula

et al. 2021

Int J Parallel Prog

View full text Add to dashboard Cite

show abstract

“…Implemented as a library, it does not require the usage of a precompiler like SkePU 2, with the downside that user functions are defined as string literals. SkelCL includes the AllPairs skeleton [21], an efficient implementation of certain complex access modes involving multiple matrices. In SkePU 2 matrices are accessed either element-wise or randomly.…”

Section: Related Workmentioning

confidence: 99%

SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems

Ernstsson

Keßler

2017

Int J Parallel Prog

View full text Add to dashboard Cite

In this article we present SkePU 2, the next generation of the SkePU C++ skeleton programming framework for heterogeneous parallel systems. We critically examine the design and limitations of the SkePU 1 programming interface. We present a new, flexible and type-safe, interface for skeleton programming in SkePU 2, and a source-to-source transformation tool which knows about SkePU 2 constructs such as skeletons and user functions. We demonstrate how the source-to-source compiler transforms programs to enable efficient execution on parallel heterogeneous systems. We show how SkePU 2 enables new use-cases and applications by increasing the flexibility from SkePU 1, and how programming errors can be caught earlier and easier thanks to improved type safety. We propose a new skeleton, Call, unique in the sense that it does not impose any predefined skeleton structure and can encapsulate arbitrary user-defined multi-backend computations. We also discuss how the sourceto-source compiler can enable a new optimization opportunity by selecting among multiple user function specializations when building a parallel program. Finally, we show that the performance of our prototype SkePU 2 implementation closely matches that of SkePU 1.

show abstract

“…Assume that the user selected global memory-based execution policy. Consider the thread (31,31) in a thread block (3,7). This thread will create and initialize a window iW with local_start_x=3 × 32 + 31 = 127 and local_start_y=7 × 32 + 31 = 255.…”

Section: Examplementioning

confidence: 99%

“…Other examples of algorithmic skeletons is SkelCL, which uses OpenCL underneath, and Marrow . While SkelCL aims to provide an abstraction over multiple GPUs, Marrow provides a mechanism to combine a set of skeletons to form a complex structure.…”

Section: Related Workmentioning

confidence: 99%

Thrust2D: A new design abstraction framework for structured grid class of algorithms

Sarkar

George

Manoj

2018

Concurrency and Computation

View full text Add to dashboard Cite

An important goal of structured parallel programming has been to provide a design framework that balances between the extent of abstraction built over the hardware and the amount of control given to the programmer to leverage the hardware resource features. Towards this goal, NVIDIAhas released an open-source design framework called Thrust based on C++ STL, where the developers can express the functionality in STL style, without having to know the architectural details of the underlying parallel infrastructure. While the framework is generic and portable, it does not support the right abstraction for two-dimensional data, which is heavily used in most of the popular parallel algorithms. In this paper, we proposed Thrust2D, an extension of Thrust to support the abstraction for two-dimensional data, targeted towards structured grid class of applications.We took several structured grid examples from Rodinia benchmark, OpenCV framework, and NVIDIA samples and rewrote them using Thrust2D. We demonstrated that, in some cases, we get nearly 80% reduction in code complexity, and for 12 out of 17 applications we have tested, the kernel performance of Thrust2D versions are well within 85% of the native CUDA versions. When we consider the total execution time, 14 out of 17 Thrust2D versions performance are within 85% of the native CUDA versions. In some cases, the performance of the Thrust2D versions has outperformed the native versions. KEYWORDS algorithmic skeleton, cyclomatic complexity, dwarf, GPU, HPC, relative performance, shared memory access, structured grid INTRODUCTIONA high-performance software should be written with extreme care to get optimum performance from the underlying hardware. Writing such software has traditionally been (and remains) difficult for software developers due to a range of complications related to the task (to be executed in a heterogeneous infrastructure) decomposition, data alignment, communication, synchronization, debugging, and so on. Since the HPC infrastructure is evolving rapidly with new capabilities, any infrastructure upgrade invariably requires the software to be tuned (and possibly rewritten) to get maximum performance from the upgraded hardware. Moreover, for commercial reasons, an application might have to run on widely different hardware simultaneously. One solution to this is to create a software abstraction for application developers, which hides the architectural details, data access complexity, and communication details of the underlying hardware as much as possible. Such an abstraction should provide portability across various infrastructures. Lastly, the abstraction framework should provide an appropriate mechanism to the developer to express the intention to exploit the infrastructure specific features in the code so that the application can optimally utilize the computing capability of the hardware and deliver the performance close to the native implementation. NVIDIA Inc has taken the initiative to develop a lightweight framework based on open-source STL called Thrus...

show abstract

Introducing and Implementing the Allpairs Skeleton for Programming Multi-GPU Systems

Cited by 9 publications

References 11 publications

SkePU 3: Portable High-Level Programming of Heterogeneous Systems and HPC Clusters

SkePU 3: Portable High-Level Programming of Heterogeneous Systems and HPC Clusters

SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems

Thrust2D: A new design abstraction framework for structured grid class of algorithms

Contact Info

Product

Resources

About