2010
DOI: 10.1007/978-3-642-13119-6_36
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing Sweep3D for Graphic Processor Unit

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2011
2011
2017
2017

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 10 publications
(15 citation statements)
references
References 9 publications
0
15
0
Order By: Relevance
“…There are six memory access operations and eight double precision arithmetic operations in each step. In the old implementation this data dependent loop is processed by only one GPU thread [22]. In fact, most arithmetic can be moved out and performed by all thread in a thread block.…”
Section: Solving Recursive S N Equationmentioning
confidence: 99%
See 2 more Smart Citations
“…There are six memory access operations and eight double precision arithmetic operations in each step. In the old implementation this data dependent loop is processed by only one GPU thread [22]. In fact, most arithmetic can be moved out and performed by all thread in a thread block.…”
Section: Solving Recursive S N Equationmentioning
confidence: 99%
“…4. Demonstration of the simplified S n recursion without flux fixup in the previous algorithm [22]. Each S n recursion is finished by one thread in a thread block.…”
Section: Updating Particle Flux From P N Moments and Dsa Face Currentsmentioning
confidence: 99%
See 1 more Smart Citation
“…This is an extended paper on the previous implementation of Sweep3D [22]. In this paper we describe our experiences of developing Sweep3D implementation for the CUDA platform, and we analyze the bottleneck of our GPU execution.…”
Section: Introductionmentioning
confidence: 99%
“…A micro architecture performance model was applied to predict overall CPI (cycles per instruction), and gives a detailed breakdown of processor stalls. We [22] presented multidimensional optimization methods for Sweep3D, which can be implemented on the fine-grained parallel architecture of the GPU. The multi-dimensional optimization methods include thread level parallelism, more threads and repeated computing, and using on-chip shared memory, etc.…”
Section: Introductionmentioning
confidence: 99%