2013
DOI: 10.1016/j.parco.2013.03.001
|View full text |Cite
|
Sign up to set email alerts
|

Efficient irregular wavefront propagation algorithms on hybrid CPU–GPU machines

Abstract: We address the problem of efficient execution of a computation pattern, referred to here as the irregular wavefront propagation pattern (IWPP), on hybrid systems with multiple CPUs and GPUs. The IWPP is common in several image processing operations. In the IWPP, data elements in the wavefront propagate waves to their neighboring elements on a grid if a propagation condition is satisfied. Elements receiving the propagated waves become part of the wavefront. This pattern results in irregular data accesses and co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
39
0
11

Year Published

2013
2013
2019
2019

Publication Types

Select...
3
2
2

Relationship

3
4

Authors

Journals

citations
Cited by 37 publications
(50 citation statements)
references
References 49 publications
0
39
0
11
Order By: Relevance
“…In order to evaluate the performance of the proposed methods, we have compared the implementations to efficient CPU 3 and GPU 1,18 implementations of the IWPP.We have also benchmarked the processors using the STREAM benchmark 19 to compute regular memory access bandwidth. The memory bandwidth with these benchmarks are presented in Table 3.…”
Section: | Experimental Evaluationmentioning
confidence: 99%
See 2 more Smart Citations
“…In order to evaluate the performance of the proposed methods, we have compared the implementations to efficient CPU 3 and GPU 1,18 implementations of the IWPP.We have also benchmarked the processors using the STREAM benchmark 19 to compute regular memory access bandwidth. The memory bandwidth with these benchmarks are presented in Table 3.…”
Section: | Experimental Evaluationmentioning
confidence: 99%
“…1 The multi-core CPU version was executed on Intel E5-processors with 2.6 GHz and employed the 16 computing cores available. The SE10P and 7120P devices are equipped with 61 cores, but the latter has a higher clock rate.…”
Section: | Experimental Evaluationmentioning
confidence: 99%
See 1 more Smart Citation
“…By stepping along the incremental direction of f and processing all elements associated, data dependencies can be respected. So far, all the existing implementations of wavefront applications on GPUs adopt this data-parallel pattern [22,31,32]. Figure 6 illustrates the processing trace of this pattern for the Needleman-Wunsch algorithm.…”
Section: Wavefront Applicationmentioning
confidence: 99%
“…To port these algorithms to the GPU, we have implemented a hierarchical and scalable queue to store elements (pixels) in fast GPU memories along with several optimizations to reduce execution time. We refer the reader to the following manuscripts [8], [9] for implementation details. The queue-based implementation resulted in significant performance improvements over previously published GPU-enabled versions of the MR algorithm [10].…”
Section: Application Parallelization For High Throughput Executionmentioning
confidence: 99%