Proceedings of the Platform for Advanced Scientific Computing Conference 2022
DOI: 10.1145/3539781.3539785
|View full text |Cite
|
Sign up to set email alerts
|

Reducing communication in the conjugate gradient method

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 27 publications
0
5
0
Order By: Relevance
“…We list the peak FOM of the degree N = 15 tests in Table 2 wherein we see that when weak-scaled we observe 943.6 GFLOPS or higher on each NVIDIA Tesla V100, 1062.8 GFLOPS or higher on each AMD Instinct MI100, and 1287.1 GFLOPS or higher on each GCD of AMD Instinct MI250X. Comparing to other GPU performance values for NekBone in the literature, Karp et al (2020) used a version of NekBone with a native CUDA Poisson operator kernel to report 410 GFLOPS on a single NVIDIA Tesla V100 at degree N = 9. Figure 4(a) shows our hipBone benchmark exceeding this FLOP rate at the lower polynomial degree N = 7, achieving 657.6 GFLOPS a single NVIDIA Tesla V100 despite the lower arithmetic intensity.…”
Section: Computational Testsmentioning
confidence: 72%
See 1 more Smart Citation
“…We list the peak FOM of the degree N = 15 tests in Table 2 wherein we see that when weak-scaled we observe 943.6 GFLOPS or higher on each NVIDIA Tesla V100, 1062.8 GFLOPS or higher on each AMD Instinct MI100, and 1287.1 GFLOPS or higher on each GCD of AMD Instinct MI250X. Comparing to other GPU performance values for NekBone in the literature, Karp et al (2020) used a version of NekBone with a native CUDA Poisson operator kernel to report 410 GFLOPS on a single NVIDIA Tesla V100 at degree N = 9. Figure 4(a) shows our hipBone benchmark exceeding this FLOP rate at the lower polynomial degree N = 7, achieving 657.6 GFLOPS a single NVIDIA Tesla V100 despite the lower arithmetic intensity.…”
Section: Computational Testsmentioning
confidence: 72%
“…Gong et al (2016) demonstrated a GPU-accelerated version of NekBone using OpenACC and CUDA Fortran. This version was later improved by Karp et al (2020) using native CUDA C kernels with implementations based on the algorithms from Świrydowicz et al (2019). Porting NekBone to FPGAs was also studied by Brown (2020).…”
Section: Introductionmentioning
confidence: 99%
“…All of our solvers are designed with locality across the memory hierarchy in mind, to be able to as efficiently as possible use modern GPUs with a significant machine imbalance. Parts of this optimization process and the theoretical background is described in Karp et al (2022a).…”
Section: Numerical Solversmentioning
confidence: 99%
“…We do this by merging kernels and by utilizing shared memory and registers in compute heavy kernels as detailed in, for example, Wahib and Maruyama (2014). For modern GPUs, the spectral element method is in the memory-bound domain as discussed by Kolev et al (2021) and optimizing the code for temporal and spatial locality is our main priority when designing kernels for the GPU backend in Neko, this was recently considered in depth for the CG method used in Neko in Karp et al (2022a).…”
Section: Gpu Implementation Considerationsmentioning
confidence: 99%
“…From the computational standpoint some of the advantages of SEM are the possibility to implement it in a matrix-free fashion, avoiding the explicit construction of any operator matrix, and its weak element coupling which allows operations to be mostly performed on a local basis, reducing communication requirements. These characteristics, among others, allow the method to handle large problems and perform efficiently on large number of processing elements [21]. NEKO [19] is a portable framework that implements SEM in object oriented modern Fortran, allowing better control on memory allocation and modularity and thus providing support for multiple compute architectures.…”
Section: A Turbulence With Image Generationmentioning
confidence: 99%