2020
DOI: 10.26599/tst.2018.9010112
|View full text |Cite
|
Sign up to set email alerts
|

Heterogeneous parallel algorithm design and performance optimization for WENO on the Sunway Taihulight supercomputer

Abstract: A Weighted Essentially Non-Oscillatory scheme (WENO) is a solution to hyperbolic conservation laws, suitable for solving high-density fluid interface instability with strong intermittency. These problems have a large and complex flow structure. To fully utilize the computing power of High Performance Computing (HPC) systems, it is necessary to develop specific methodologies to optimize the performance of applications based on the particular system's architecture. The Sunway TaihuLight supercomputer is currentl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
8
1

Relationship

2
7

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 33 publications
0
8
0
Order By: Relevance
“…Further, the code needs fewer changes because of the high-level programming model of OpenACC and is kept understandable and more uncomplicated by the programmer; moreover, single source code can be used to target both CPU only and hybrid CPU-GPU architectures, as the directives can be ignored when no GPU is available in the system. Several investigations have shown that the performance loss with OpenACC is due to the lack of support of sharing memory usage and banking conflicts, which is often not substantial, particularly on the most recent GPUs [11,15] . We employed the hybrid MPI and OpenACC programming method.…”
Section: Algorithmmentioning
confidence: 99%
See 1 more Smart Citation
“…Further, the code needs fewer changes because of the high-level programming model of OpenACC and is kept understandable and more uncomplicated by the programmer; moreover, single source code can be used to target both CPU only and hybrid CPU-GPU architectures, as the directives can be ignored when no GPU is available in the system. Several investigations have shown that the performance loss with OpenACC is due to the lack of support of sharing memory usage and banking conflicts, which is often not substantial, particularly on the most recent GPUs [11,15] . We employed the hybrid MPI and OpenACC programming method.…”
Section: Algorithmmentioning
confidence: 99%
“…Several examples that partially adapted GPUs in weather and climate prediction codes showed performance gains [7][8][9][10][11][12][13][14][15][16][17] . Especially, GPU acceleration of scalar or tracer advection modules using Compute Unified Device Architecture (CUDA) C/Fortran achieves an approximately three-fold speedup [12,18,19] .…”
Section: Introductionmentioning
confidence: 99%
“…Numerous approaches have emerged in various fields trying to solve this problem. Among them are two most effective and common solutions: One is to restructure the model at the software level by super individual or other methods; another is to speed up large-scale computing by distributed parallel computing [91,92] or using new computation tools, such as the Quantum tool [93] .…”
Section: Large-scale Mamsmentioning
confidence: 99%
“…The past few decades have witnessed an explosion of data in both the number of observations and parameters, resulting in significant interests in distributed algorithms for solving large-scale machine learning problems [1][2][3][4][5][6][7] . However, efficient implementations of the distributed optimization algorithms for machine learning applications are challenging.…”
Section: Introductionmentioning
confidence: 99%