2013
DOI: 10.1002/cpe.3016
|View full text |Cite
|
Sign up to set email alerts
|

A parallel scheme for accelerating parameter sweep applications on a GPU

Abstract: SUMMARYThis paper proposes a parallel scheme for accelerating parameter sweep applications on a graphics processing unit. By using hundreds of cores on the graphics processing unit, we found that our scheme simultaneously processes multiple parameters rather than a single parameter. The simultaneous sweeps exploit the similarity of computing behaviors shared by different parameters, thus allowing memory accesses to be coalesced into a single access if similar irregularities appear among the parameters’ computa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
1

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…Moreover, the updating operations on different elements are mutually independent, hence stencil computation is an embarrassingly parallel scenario to leverage accelerators such as graphics processing units (GPUs). A GPU has thousands of cores and its memory bandwidth is 5−10× as high as that of a CPU, thus extensively utilized in accelerating compute-and memory-intensive applications [7]- [9]. Nonetheless, GPUs possess a relatively limited device-memory capacity, typically in the range of several dozen GBs.…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, the updating operations on different elements are mutually independent, hence stencil computation is an embarrassingly parallel scenario to leverage accelerators such as graphics processing units (GPUs). A GPU has thousands of cores and its memory bandwidth is 5−10× as high as that of a CPU, thus extensively utilized in accelerating compute-and memory-intensive applications [7]- [9]. Nonetheless, GPUs possess a relatively limited device-memory capacity, typically in the range of several dozen GBs.…”
Section: Introductionmentioning
confidence: 99%
“…However, this process is time consuming because code and data structures usually must be adapted to the highly-threaded device architecture, which takes full advantage of memory latency hiding mechanisms. For example, arrays of structures must be transformed into structures of arrays to maximise memory access throughput on a GPU (Sung et al, 2012;Ino et al, 2014).…”
Section: Introductionmentioning
confidence: 99%
“…High-performance clusters and grid systems are practical for performing parameter studies due to their large collection of processors and storage resources [8,19,36], although local computers can also be used due to the advancement of graphic processors and other accelerators [17]. The setup, submission, and orchestration of such jobs in computing clusters may be a challenge, particularly to non-programmers or novice users for conducting parameter studies in a parallel or distributed fashion [10,25].…”
Section: Introductionmentioning
confidence: 99%