2012
DOI: 10.1142/s0129626412400063
|View full text |Cite
|
Sign up to set email alerts
|

Targeting Heterogeneous Architectures via Macro Data Flow

Abstract: We propose a data flow based run time system as an efficient tool for supporting execution of parallel code on heterogeneous architectures hosting both multicore CPUs and GPUs. We discuss how the proposed run time system may be the target of both structured parallel applications developed using algorithmic skeletons/parallel design patterns and also more "domain specific" programming models. Experimental results demonstrating the feasibility of the approach are presented.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2012
2012
2015
2015

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 17 publications
0
5
0
Order By: Relevance
“…The feasibility of refactoring code in such a way that a map originally targeting CPU cores only is transformed into a map targeting CPU cores and GPUs has already been demonstrated in [1]. There we have shown not only that using both CPU cores and GPUs improves the performance of programs with respect to the performances achieved when using only CPU cores, but also that an automatic scheduling procedure may be set up which dynamically uses GPUs and CPU cores to achieve optimal load balancing and, therefore, performances.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The feasibility of refactoring code in such a way that a map originally targeting CPU cores only is transformed into a map targeting CPU cores and GPUs has already been demonstrated in [1]. There we have shown not only that using both CPU cores and GPUs improves the performance of programs with respect to the performances achieved when using only CPU cores, but also that an automatic scheduling procedure may be set up which dynamically uses GPUs and CPU cores to achieve optimal load balancing and, therefore, performances.…”
Section: Resultsmentioning
confidence: 99%
“…As already suggested in [1] the skeleton tree could be annoted also with information related to the target architecture at hand in order to optimize mappings and/or distribution of data. As an example, let us consider a system S provided with n CPUs and r GPUs defined as S = {cpu 1 , .…”
Section: Access Driven Optimization Suppose To Have the Following Abmentioning
confidence: 99%
“…We are currently working to implement the higher tier "algorithmic skeletons" in such a way that application programmers may seamlessly implement extended FastFlow applications much in the same way that they use to implement "single multi-core" applications with the original framework. The whole activity-along with the activities aimed at supporting GPUs within FastFlow [15]-is aimed at providing suitable means to implement the computing model designed within ParaPhrase, an FP7 STREP project whose intent is to use parallel design patterns and algorithmic skeletons to program heterogeneous-multi-core plus GPUcollections of processing elements.…”
Section: Discussionmentioning
confidence: 99%
“…Data flow with non-negligible instruction code 5 has been demonstrated to be very effective (in terms of performance) in the case of fine grain computations [18], [19]. Due to the smaller synchronisation overhead (available data determine execution of code, rather than abstract coordination of a template graph) we expect data flow implementation of structured parallel computations to be more efficient also as far as the performance/power trade-off is concerned.…”
Section: A) Unbalanced Embarrassingly Parallel Computationsmentioning
confidence: 99%