Parallel Patterns for General Purpose Many-Core

Buono, Daniele; Danelutto, Marco; Lametti, Silvia; Torquati, Massimo

doi:10.1109/pdp.2013.27

Cited by 16 publications

(8 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In computational electromagnetics modelling problems with up to one billion variables have been addressed with both memory-and CPU-intensive algorithms, solving major longstanding problems. More structured approaches based on pattern-based parallel programming effectively cater for the design and development of parallel pipelines for M&S in systems biology and next generation sequencing [1,2], providing developers with portability across a variety of HPC platforms, like clusters of multi-cores [3,4] as well as cloud infrastructures [5].…”

Section: Background and State Of The Artmentioning

confidence: 99%

Why High-Performance Modelling and Simulation for Big Data Applications Matters

Grelck

Niewiadomska-Szynkiewicz

Aldinucci

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Modelling and Simulation (M&S) offer adequate abstractions to manage the complexity of analysing big data in scientific and engineering domains. Unfortunately, big data problems are often not easily amenable to efficient and effective use of High Performance Computing (HPC) facilities and technologies. Furthermore, M&S communities typically lack the detailed expertise required to exploit the full potential of HPC solutions while HPC specialists may not be fully aware of specific modelling and simulation requirements and applications. The COST Action IC1406 High-Performance Modelling and Simulation for Big Data Applications has created a strategic framework to foster interaction between M&S experts from various application domains on the one hand and HPC experts on the other hand to develop effective solutions for big data applications. One of the tangible outcomes of the COST Action is a collection of case studies from various computing domains. Each case study brought together both HPC and M&S experts, giving witness of the effective cross-pollination facilitated by the COST Action. In this introductory article we argue why joining forces between M&S and HPC communities is both timely in the big data era and crucial for success in many application domains. Moreover, we provide an overview on the state of the art in the various research areas concerned.

show abstract

Section: Background and State Of The Artmentioning

confidence: 99%

Why High-Performance Modelling and Simulation for Big Data Applications Matters

Grelck

Niewiadomska-Szynkiewicz

Aldinucci

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…The root is hardware description perfect that describes idealized many-core hardware and provides the highest level of abstraction for programmers. The device perfect has an unlimited amount of cores (lines [20][21][22], and each core can run 1 thread (line 23). Each lower level describes hardware in more detail and extends a parent resulting in the hierarchy.…”

Section: Hardware Description Language Hdlmentioning

confidence: 99%

“…A second means to obtain performance from a high-level programs is to provide a programming model in which programmers express their algorithms in terms of algorithmic skeletons [20][21][22]. The skeletons are often manually implemented and optimized.…”

Section: Introductionmentioning

confidence: 99%

Stepwise‐refinement for performance: a methodology for many‐core programming

Hijma

Nieuwpoort

Jacobs

et al. 2015

Concurrency and Computation

View full text Add to dashboard Cite

Many-core hardware is targeted specifically at obtaining high performance, but reaching high performance is often challenging because hardware-specific details have to be taken into account. Although there are many programming systems that try to alleviate many-core programming, some providing a high-level language, others providing a low-level language for control, none of these systems have a clear and systematic methodology as a foundation. In this article, we propose stepwise-refinement for performance: a novel, clear, and structured methodology for obtaining high performance on many-cores. We present a system that supports this methodology, offers multiple levels of abstraction to provide programmers a trade-off between highlevel and low-level programming, and provides programmers detailed performance feedback. We evaluate our methodology with several widely varying compute kernels on two different many-core architectures: a Graphical Processing Unit (GPU) and the Xeon Phi. We show that our methodology gives insight in the performance, and that in almost all cases, we gain a substantial performance improvement using our methodology.Section 2 elaborates how various many-core programming approaches relate to MCL. In Section 3, we introduce our methodology stepwise-refinement for performance. Section 4 gives an overview of MCL and how our system implements our methodology. In Section 5, we give a detailed example of how the process of stepwise-refinement for performance takes place. Section 6 discusses several of the implementation techniques of our system. Section 7 evaluates our techniques for various well-known compute kernels. We conclude the article with a discussion and conclusion. RELATED WORKThe challenges in many-core programming are widely recognized, and there are many approaches that try to alleviate it. This following section discusses the current status of programming manycores and identifies issues (summarized in Table I) that we try to address in our work. We distinguish three programming approaches: high-level programming, separation of concerns, and a tuning cycle approach. Section 2.2 discusses systems that influenced MCL. STEPWISE-REFINEMENT FOR PERFORMANCE 4517 2.1.3. Tuning cycle approach. The tuning cycle approach is an iterative process that usually consists of the following steps: evaluate the performance of an application, analyze the gathered results, and refactor the code to increase the performance. This approach usually fits low-level languages such as CUDA [28] or OpenCL [29] that offer programmers high degrees of control over the code. However, it can also be applied to directive-based programming systems, where in each step, more detailed directives are inserted [30][31][32][33][34][35]. Figure 15. Part of the hardware description mic. Xeon PhiMic. Intel's Many Integrated Core (MIC) architecture contains several tens of in-order x86 cores with powerful vector units and several hardware threads connected through a ring network. The MIC exposes two layers of parallelism: vector in...

show abstract

“…The implementation of the ffMDF skeleton has been developed using FastFlow [1,9], a skeleton-based programming framework. FastFlow is a structured parallel programming environment implemented in C++ on top of POSIX threads [5,3].…”

Section: Skeleton-based Designmentioning

confidence: 99%

A Lightweight Run-Time Support for Fast Dense Linear Algebra on Multi-Core

Buono

Danelutto

Matteis

et al. 2014

Software Engineering / 811: Parallel and Distributed Computing and Networks / 816: Artificial Intelligence and Applications

View full text Add to dashboard Cite

The work proposes ffMDF, a lightweight dynamic run-time support able to achieve high performance in the execution of dense linear algebra kernels on shared-cache multi-core. ffMDF implements a dynamic macro-dataflow interpreter processing DAG graphs generated on-the-fly out of standard numeric kernel code. The experimental results demonstrate that the performance obtained using ffMDF on both fine-grain and coarse-grain problems is comparable with or even better than that achieved by de-facto standard solutions (notably PLASMA library), which use separate run-time supports specifically optimised for different computational grains on modern multi-core. KEY WORDS Data-flow run-time, dense linear algebra, dynamic scheduling, multi-threading, multi-core. This work has been partially supported by FP7 STREP ParaPhrase (www.paraphrase-ict.eu).

show abstract

Parallel Patterns for General Purpose Many-Core

Cited by 16 publications

References 11 publications

Why High-Performance Modelling and Simulation for Big Data Applications Matters

Why High-Performance Modelling and Simulation for Big Data Applications Matters

Stepwise‐refinement for performance: a methodology for many‐core programming

A Lightweight Run-Time Support for Fast Dense Linear Algebra on Multi-Core

Contact Info

Product

Resources

About