Improving the arithmetic intensity of multigrid with the help of polynomial smoothers

Ghysels, Pieter; Klosiewicz, Przemyslaw; Vanroose, Wim

doi:10.1002/nla.1808

Cited by 24 publications

(20 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, for boxes that are 64 3 on the finest level with a 1-deep ghost zone, level 0 of the box data structure can be viewed as a 4D data structure grids [12][66][66] [66]. As each process contains one or more boxes, each with 4-8 levels, an additional data structure subdomains [boxes].levels [8] is constructed to index the floating-point data. miniGMG-cuda uses a similar data structure.…”

Section: Data Structures In Minigmgmentioning

confidence: 99%

“…Consequently, the performance of common stencil computations used in GMG is typically limited by the memory bandwidth of modern architectures, as the ratio of floating point operations to data movement (i.e., flop-to-byte ratio) is usually well below the machine balance. For this reason, much research has been devoted to reducing data movement for stencil computations using techniques such as cache oblivious algorithms, time skewing, wavefront optimizations and overlapped tiling [30,22,6,7,27,35,18,29,36,8].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

et al. 2017

View full text Add to dashboard Cite

GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. As such, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU-and GPU-accelerated platforms for the geometric multigrid linear solvers found in many scientific applications. We show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU-and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.

show abstract

Section: Data Structures In Minigmgmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

et al. 2017

View full text Add to dashboard Cite

show abstract

“…Thus, in recent years, numerous efforts have focused on increasing temporal locality by fusing multiple stencil sweeps through techniques like cache oblivious, time skewing, or wavefront [8], [11], [12], [17], [19], [24], [27], [30]- [32]. Many of these efforts examined 2D or constant-coefficient problems -features rarely seen in real-world applications.…”

Section: Related Workmentioning

confidence: 99%

Optimization of geometric multigrid for emerging multi- and manycore processors

Williams

Kalamkar

Singh

et al. 2012

2012 International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

Abstract-Multigrid methods are widely used to accelerate the convergence of iterative solvers for linear systems used in a number of different application areas. In this paper, we explore optimization techniques for geometric multigrid on existing and emerging multicore systems including the Opteronbased Cray XE6, Intel R Xeon R E5-2670 and X5550 processorbased Infiniband clusters, as well as the new Intel R Xeon Phi TM coprocessor (Knights Corner). Our work examines a variety of novel techniques including communication-aggregation, threaded wavefront-based DRAM communication-avoiding, dynamic threading decisions, SIMDization, and fusion of operators. We quantify performance through each phase of the V-cycle for both single-node and distributed-memory experiments and provide detailed analysis for each class of optimization. Results show our optimizations yield significant speedups across a variety of subdomain sizes while simultaneously demonstrating the potential of multi-and manycore processors to dramatically accelerate single-node performance. However, our analysis also indicates that improvements in networks and communication will be essential to reap the potential of manycore processors in largescale multigrid calculations.

show abstract

“…However, their efficiency deteriorates. We emphasise that the efficient nature of the present implementation patterns makes us hope that they can be used as starting point to realise more competitive smoothers as proposed in [Chen et al 2012;Ernst and Gander 2011;Ghysels et al 2012;Ghysels and Vanroose 2015;Stolk 2015], e.g. Yet, this is future work.…”

Section: Introductionmentioning

confidence: 97%

“…in the grid, integrated. Similar techniques have been proposed for multilevel solvers [Adams et al 2016;Mehl et al 2006;Ghysels et al 2012;Ghysels and Vanroose 2015] or Krylov solvers [Chronopoulos and Gear 1989;Hoemmen 2010;Ghysels et al 2013;Ghysels and Vanroose 2014], but, to the best of our knowledge, no other approach offers a solution representation on all levels plus single touch. Multilevel solution representations simplify the handling of hanging nodes, non-linear problems and scale-dependent discretisations [Cools et al 2014b].…”

Section: Introductionmentioning

confidence: 99%

Complex Additive Geometric Multilevel Solvers for Helmholtz Equations on Spacetrees

Reps

Weinzierl

2017

ACM Trans. Math. Softw.

View full text Add to dashboard Cite

We introduce a family of implementations of low order, additive, geometric multilevel solvers for systems of Helmholtz equations arising from Schrödinger equations. Both grid spacing and arithmetics may comprise complex numbers and we thus can apply complex scaling to the indefinite Helmholtz operator. Our implementations are based upon the notion of a spacetree and work exclusively with a finite number of precomputed local element matrices. They are globally matrix-free.Combining various relaxation factors with two grid transfer operators allows us to switch from additive multigrid over a hierarchical basis method into a Bramble-Pasciak-Xu (BPX)-type solver, with several multiscale smoothing variants within one code base. Pipelining allows us to realise full approximation storage (FAS) within the additive environment where, amortised, each grid vertex carrying degrees of freedom is read/written only once per iteration. The codes realise a single-touch policy. Among the features facilitated by matrix-free FAS is arbitrary dynamic mesh refinement (AMR) for all solver variants. AMR as enabler for full multigrid (FMG) cycling-the grid unfolds throughout the computation-allows us to reduce the cost per unknown per order of accuracy.The present paper primary contributes towards software realisation and design questions. Our experiments show that the consolidation of single-touch FAS, dynamic AMR and vectorisation-friendly, complex scaled, matrix-free FMG cycles delivers a mature implementation blueprint for solvers of Helmholtz equations in general. For this blueprint, we put particular emphasis on a strict implementation formalism as well as some implementation correctness proofs.

show abstract

Improving the arithmetic intensity of multigrid with the help of polynomial smoothers

Cited by 24 publications

References 30 publications

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

Optimization of geometric multigrid for emerging multi- and manycore processors

Complex Additive Geometric Multilevel Solvers for Helmholtz Equations on Spacetrees

Contact Info

Product

Resources

About