2012
DOI: 10.1002/nla.1808
|View full text |Cite
|
Sign up to set email alerts
|

Improving the arithmetic intensity of multigrid with the help of polynomial smoothers

Abstract: SUMMARY The basic building blocks of a classic multigrid algorithm, which are essentially stencil computations, all have a low ratio of executed floating point operations per byte fetched from memory. This important ratio can be identified as the arithmetic intensity. Applications with a low arithmetic intensity are typically bounded by memory traffic and achieve only a small percentage of the theoretical peak performance of the underlying hardware. We propose a polynomial Chebyshev smoother, which we implemen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
20
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 24 publications
(20 citation statements)
references
References 30 publications
0
20
0
Order By: Relevance
“…Thus, for boxes that are 64 3 on the finest level with a 1-deep ghost zone, level 0 of the box data structure can be viewed as a 4D data structure grids [12][66][66] [66]. As each process contains one or more boxes, each with 4-8 levels, an additional data structure subdomains [boxes].levels [8] is constructed to index the floating-point data. miniGMG-cuda uses a similar data structure.…”
Section: Data Structures In Minigmgmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus, for boxes that are 64 3 on the finest level with a 1-deep ghost zone, level 0 of the box data structure can be viewed as a 4D data structure grids [12][66][66] [66]. As each process contains one or more boxes, each with 4-8 levels, an additional data structure subdomains [boxes].levels [8] is constructed to index the floating-point data. miniGMG-cuda uses a similar data structure.…”
Section: Data Structures In Minigmgmentioning
confidence: 99%
“…Consequently, the performance of common stencil computations used in GMG is typically limited by the memory bandwidth of modern architectures, as the ratio of floating point operations to data movement (i.e., flop-to-byte ratio) is usually well below the machine balance. For this reason, much research has been devoted to reducing data movement for stencil computations using techniques such as cache oblivious algorithms, time skewing, wavefront optimizations and overlapped tiling [30,22,6,7,27,35,18,29,36,8].…”
Section: Introductionmentioning
confidence: 99%
“…Thus, in recent years, numerous efforts have focused on increasing temporal locality by fusing multiple stencil sweeps through techniques like cache oblivious, time skewing, or wavefront [8], [11], [12], [17], [19], [24], [27], [30]- [32]. Many of these efforts examined 2D or constant-coefficient problems -features rarely seen in real-world applications.…”
Section: Related Workmentioning
confidence: 99%
“…However, their efficiency deteriorates. We emphasise that the efficient nature of the present implementation patterns makes us hope that they can be used as starting point to realise more competitive smoothers as proposed in [Chen et al 2012;Ernst and Gander 2011;Ghysels et al 2012;Ghysels and Vanroose 2015;Stolk 2015], e.g. Yet, this is future work.…”
Section: Introductionmentioning
confidence: 97%
“…in the grid, integrated. Similar techniques have been proposed for multilevel solvers [Adams et al 2016;Mehl et al 2006;Ghysels et al 2012;Ghysels and Vanroose 2015] or Krylov solvers [Chronopoulos and Gear 1989;Hoemmen 2010;Ghysels et al 2013;Ghysels and Vanroose 2014], but, to the best of our knowledge, no other approach offers a solution representation on all levels plus single touch. Multilevel solution representations simplify the handling of hanging nodes, non-linear problems and scale-dependent discretisations [Cools et al 2014b].…”
Section: Introductionmentioning
confidence: 99%