2013
DOI: 10.1007/978-3-642-16405-7_26
|View full text |Cite
|
Sign up to set email alerts
|

A Geometric Multigrid Solver on GPU Clusters

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 13 publications
0
4
0
Order By: Relevance
“…An aggregation based AMG, reaching beyond 10 11 degrees of freedom on 300,000 cores, is reported in [8]. A study of the scalability of FE solvers on clusters with GPU accelerators is given in [20]; see also [2,11,23,25,28] for other recent contributions in this direction.…”
Section: Introductionmentioning
confidence: 99%
“…An aggregation based AMG, reaching beyond 10 11 degrees of freedom on 300,000 cores, is reported in [8]. A study of the scalability of FE solvers on clusters with GPU accelerators is given in [20]; see also [2,11,23,25,28] for other recent contributions in this direction.…”
Section: Introductionmentioning
confidence: 99%
“…For instance, [21] analyzes the performance of an optimized GPU-based implementation of the geometric multigrid method on different state-of-the-art NVIDIA GPUs. Similar work also includes [22][23][24][25][26]. [27] takes portability into account and provides a unified user interface so that optimized multigrid solvers can run on diverse platforms including multicore CPUs and GPUs.…”
Section: Related Workmentioning
confidence: 99%
“…Efforts are made on compilation to improve the performance of geometric multi-grid, leveraging function fusion [7,8]. Optimization of geometric multigrid on modern GPUs has also been studied, leveraging auto-tuning technique to find the best thread block configuration and loop tiling [9], and using the unified memory and nvlink to reduce the overhead of communication [24,23,32,31]. For Taihu-Light, we reduce the memory copy overhead with the ghost area by fusing the copy for boundary into the kernel, as introduced in Section 4.3.…”
Section: Related Workmentioning
confidence: 99%