Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer

Idomura, Yasuhiro; Nakata, M.; Yamada, S.; Machida, Masahiko; Imamura, Toshiyuki; Watanabe, T.‐H.; Nunami, M.; Inoue, Hikaru; Tsutsumi, S.; Miyoshi, Ikuo; Shida, Naoyuki

doi:10.1177/1094342013490973

Cited by 24 publications

(6 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The transpose communications required for parallel 2D FFTs are implemented by using a simple blocking collective communi-cation, which makes implementation easy and the use of specific algorithms possible, like collective communications optimized for the K computer [11]. Second, transpose communications and FFT computations are overlapped by employing a communication thread in a hybrid parallel model of Message Passing Interface (MPI) and Open Multi-Processing (OpenMP), which enables overlaps of computations and blocking communications as well as non-blocking communications [12]. Masking of communication costs significantly improves strong scaling of the perpendicular-space parallelization.…”

Section: Introductionmentioning

confidence: 99%

Computation-Communication Overlap Techniques for Parallel Spectral Calculations in Gyrokinetic Vlasov Simulations

Maeyama

Watanabe

Idomura

et al. 2013

Plasma and Fusion Research

Self Cite

View full text Add to dashboard Cite

One of the important phenomena in magnetically-confined fusion plasma is plasma turbulence, which causes particle and heat transport and degrades plasma confinement. To address multi-scale turbulence including temporal and spatial scales of electrons and ions, we extend our gyrokinetic Vlasov simulation code GKV to run efficiently on peta-scale supercomputers. A key numerical technique is the parallel Fast Fourier Transform (FFT) required for parallel spectral calculations, where masking of the cost of inter-node transpose communications is essential to improve strong scaling. To mask communication costs, computation-communication overlap techniques are applied for FFTs and transpose with the help of the hybrid parallelization of message passing interface and open multi-processing. Integrated overlaps including whole spectral calculation procedures show better scaling than simple overlaps of FFTs and transpose. The masking of communication costs significantly improves strong scaling of the GKV code, and makes substantial speed-up toward multi-scale turbulence simulations.

show abstract

Section: Introductionmentioning

confidence: 99%

Computation-Communication Overlap Techniques for Parallel Spectral Calculations in Gyrokinetic Vlasov Simulations

Maeyama

Watanabe

Idomura

et al. 2013

Plasma and Fusion Research

Self Cite

View full text Add to dashboard Cite

show abstract

“…Since operations of 1D FFT are parallelized by OpenMP threads, we employ the thread-safe 1D FFTs by means of FFTW [14]. The idea of computation-communication overlaps with a communication thread has already been tested before [15][16][17]. However, for efficiently masking the communication cost by applying the pipelined overlaps, careful implementations are required: rearrangements of multiple computation kernels to keep computations enough to mask communications, a proper choice of the pipeline length (finer pipelining reduces unoverlapped parts at the beginning and the end of pipelining but increases latencies of MPI), and a regulation of the task granularity for thread parallelization (finer granularity reduces load imbalance on OpenMP threads but increases scheduling overheads).…”

Section: Computation-communication Overlapsmentioning

confidence: 99%

Improved strong scaling of a spectral/finite difference gyrokinetic code for multi-scale plasma turbulence

et al. 2015

Self Cite

View full text Add to dashboard Cite

“…The domain decomposition model is implemented using a hybrid parallelization model consisting of multi-layer MPI communicators and multi-core OpenMP parallelization. In addition, a novel computation and communication overlap technique [20] is developed using communication threads, which are implemented with a heterogeneous OpenMP programing model. The strong scaling of GT5D is dramatically improved by this latency hiding technique, and on the K-computer, an excellent strong scaling is achieved up to ∼ 0.6 million cores (see Fig.…”

Section: Calculation Modelmentioning

confidence: 99%

Progress of Full-<i>f</i> Gyrokinetic Simulation Toward Reactor Relevant Numerical Experiments

Idomura

Nakata

Jolliet

2014

Plasma and Fusion Research

Self Cite

View full text Add to dashboard Cite

Fullf gyrokinetic simulations compute both turbulent transport and profile formations under fixed power, momentum, and particle input as in experiments. This approach has the capability of dictating plasma profiles, provided that time scale of the simulation is long enough to establish power, momentum, and particle balance conditions. Recent Peta-scale supercomputers made such long time scale simulations feasible, and fullf gyrokinetic simulations are applied to reactor relevant numerical experiments. In this paper, physical models, numerical approaches, and accuracy issues of the gyrokinetic fullf Eulerian code GT5D are summarized, and then, its recent applications to the scaling studies of turbulent transport with respect to plasma size and heating power are reviewed.

show abstract

Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer

Cited by 24 publications

References 27 publications

Computation-Communication Overlap Techniques for Parallel Spectral Calculations in Gyrokinetic Vlasov Simulations

Computation-Communication Overlap Techniques for Parallel Spectral Calculations in Gyrokinetic Vlasov Simulations

Improved strong scaling of a spectral/finite difference gyrokinetic code for multi-scale plasma turbulence

Progress of Full-<i>f</i> Gyrokinetic Simulation Toward Reactor Relevant Numerical Experiments

Contact Info

Product

Resources

About