Optimizing Sweep3D for Graphic Processor Unit

Gong, Chunye; Liu, Jie; Zhang, Gong; Qin, Jin; Xie, Jing

doi:10.1007/978-3-642-13119-6_36

Cited by 10 publications

(15 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are six memory access operations and eight double precision arithmetic operations in each step. In the old implementation this data dependent loop is processed by only one GPU thread [22]. In fact, most arithmetic can be moved out and performed by all thread in a thread block.…”

Section: Solving Recursive S N Equationmentioning

confidence: 99%

“…4. Demonstration of the simplified S n recursion without flux fixup in the previous algorithm [22]. Each S n recursion is finished by one thread in a thread block.…”

Section: Updating Particle Flux From P N Moments and Dsa Face Currentsmentioning

confidence: 99%

“…The control process on CPU deals with the wavefront dependence [22]. Each wavefront is finised by a CUDA kernel invocation.…”

Section: Details Of Gpu Implementationmentioning

confidence: 99%

See 2 more Smart Citations

GPU accelerated simulations of 3D deterministic particle transport using discrete ordinates method

Gong

Liu

Chi

et al. 2011

Journal of Computational Physics

Self Cite

View full text Add to dashboard Cite

Section: Solving Recursive S N Equationmentioning

confidence: 99%

“…4. Demonstration of the simplified S n recursion without flux fixup in the previous algorithm [22]. Each S n recursion is finished by one thread in a thread block.…”

Section: Updating Particle Flux From P N Moments and Dsa Face Currentsmentioning

confidence: 99%

See 1 more Smart Citation

GPU accelerated simulations of 3D deterministic particle transport using discrete ordinates method

Gong

Liu

Chi

et al. 2011

Journal of Computational Physics

Self Cite

View full text Add to dashboard Cite

“…This is an extended paper on the previous implementation of Sweep3D [22]. In this paper we describe our experiences of developing Sweep3D implementation for the CUDA platform, and we analyze the bottleneck of our GPU execution.…”

Section: Introductionmentioning

confidence: 99%

“…A micro architecture performance model was applied to predict overall CPI (cycles per instruction), and gives a detailed breakdown of processor stalls. We [22] presented multidimensional optimization methods for Sweep3D, which can be implemented on the fine-grained parallel architecture of the GPU. The multi-dimensional optimization methods include thread level parallelism, more threads and repeated computing, and using on-chip shared memory, etc.…”

Section: Introductionmentioning

confidence: 99%

Accelerating the Sweep3D for a Graphic Processor Unit

Gong¹,

Liu²,

Chen³

et al. 2011

Journal of Information Processing Systems

Self Cite

View full text Add to dashboard Cite

Abstract-As a powerful and flexible processor, the Graphic Processing Unit (GPU) can offer a great faculty in solving many high-performance computing applications. Sweep3D, which simulates a single group time-independent discrete ordinates (Sn) neutron transport deterministically on 3D Cartesian geometry space, represents the key part of a real ASCI application. The wavefront process for parallel computation in Sweep3D limits the concurrent threads on the GPU. In this paper, we present multi-dimensional optimization methods for Sweep3D, which can be efficiently implemented on the finegrained parallel architecture of the GPU. Our results show that the overall performance of Sweep3D on the CPU-GPU hybrid platform can be improved up to 4.38 times as compared to the CPU-based implementation.

show abstract