2016
DOI: 10.1007/s11227-016-1871-z
|View full text |Cite
|
Sign up to set email alerts
|

A parallel pattern for iterative stencil + reduce

Abstract: We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or a combination of both. The pattern makes it possible to deploy a single stencil computation kernel on different GPUs. We discuss the implementation of Loop-of-stenci… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(18 citation statements)
references
References 13 publications
(20 reference statements)
0
18
0
Order By: Relevance
“…As proposed in [16], using information from the application collected at runtime (without relying on user hints), it is possible to automatically derive the cutoff technique that is best suited for the application. In [2] we discussed the FastFlow implementation of a loop-of-stencil-reduce pattern, targeting iterative data parallel computations on heterogeneous multicores. We showed that various iterative kernels can be easily and effectively parallelised by using the Loop-of-stencil-reduce on the available GPUs by ex-ploiting the OpenCL capabilities of the FastFlow parallel framework.…”
Section: Resultsmentioning
confidence: 99%
“…As proposed in [16], using information from the application collected at runtime (without relying on user hints), it is possible to automatically derive the cutoff technique that is best suited for the application. In [2] we discussed the FastFlow implementation of a loop-of-stencil-reduce pattern, targeting iterative data parallel computations on heterogeneous multicores. We showed that various iterative kernels can be easily and effectively parallelised by using the Loop-of-stencil-reduce on the available GPUs by ex-ploiting the OpenCL capabilities of the FastFlow parallel framework.…”
Section: Resultsmentioning
confidence: 99%
“…The elements in each level (grid) are organized as boxes, each of which has k elements on every dimension. Therefore, a level with grid dimension of m would have (m/k) 3 boxes. The boxes in each level are distributed to the processes evenly before the computation starts.…”
Section: Hpgmg Benchmarkmentioning
confidence: 99%
“…It implies that the computation of one box requires two layers of data from other boxes on each face. So the overlapped areas between two boxes, called "ghost areas", have a depth of 2 in this algorithm, and thus the boxes are "enlarged" into size (k + 2 * 2) 3 . The stencil computation also requires the three β parameters, with the pattern of β i shown in Figure 3(c).…”
Section: Hpgmg Benchmarkmentioning
confidence: 99%
See 1 more Smart Citation
“…The map pattern is suitable for multicore architecures because each strand of computation on each node in the map pattern sequence can be mapped to a core for parallel execution. Map pattern is very suitable for embarrasingly parallel problems and can be nested with other patterns (Aldinucci et al, 2016), (Sheshikala et al, 2016) to create a more powerful pattern for computation. A map pattern can be advanced with a reduce pattern to form a Map-Reduce pattern which can help enhance parallel computation.…”
Section: Map Patternmentioning
confidence: 99%