2017
DOI: 10.1175/bams-d-15-00278.1
|View full text |Cite
|
Sign up to set email alerts
|

Parallelization and Performance of the NIM Weather Model on CPU, GPU, and MIC Processors

Abstract: Next-generation supercomputers containing millions of processors will require weather prediction models to be designed and developed by scientists and software experts to ensure portability and efficiency on increasingly diverse HPC systems. Intel, Cray, PGI, and NVIDIA who were responsible for fixing bugs and providing access to the latest hardware and compilers. Thanks also to the staff at ORNL and TACC for providing system resources and helping to resolve system issues. This work was also supported in part … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
17
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(18 citation statements)
references
References 20 publications
1
17
0
Order By: Relevance
“…Overall the COSMO model with the rewritten GridTools dynamical core and with the other components ported with OpenACC directives runs about 3-4 times faster on GPUs than the original code on CPUs when comparing hardware of the same generation Leutwyler et al 2016). Similar speedups have been reported by other studies (e.g., Govett et al 2017).…”
Section: Use Of Openaccsupporting
confidence: 91%
“…Overall the COSMO model with the rewritten GridTools dynamical core and with the other components ported with OpenACC directives runs about 3-4 times faster on GPUs than the original code on CPUs when comparing hardware of the same generation Leutwyler et al 2016). Similar speedups have been reported by other studies (e.g., Govett et al 2017).…”
Section: Use Of Openaccsupporting
confidence: 91%
“…Tiling of data. For certain operations such as stencil computation (Gan et al, 2017) that have complicated data accessing patterns, the computing and accessing can be done in a tiling pattern (Bandishti et al, 2012) so that the computation of different lines, planes, or cubes can be pipelined and overlapped.…”
Section: Other Tuning Techniques At a Glimpsementioning
confidence: 99%
“…For the porting of a model at such a level, the three challenges mentioned above (heavy burden of legacy code, hundreds of hotspots distributed through the code, and the mismatch between the existing code and the emerging hardware) have apparently combined to produce more challenges. Facing the problem of tens of thousands of lines of code, the researchers and developers have to either perform an extensive rewriting of the code (Xu et al, 2014) or invest years of effort into redesign methodologies and tools (Gysi et al, 2015).…”
Section: Introductionmentioning
confidence: 99%
“…For example, the notion of DSLs as a solution has a tried and tested heritage -examples include the Kokkos array library (Edwards et al, 2012), which like GridTools uses C++ templates to provide an interface to distributed data which can support multiple hardware back ends, and from computational chemistry, sophisticated codes (Valiev et al, 2010) built on top of a toolkit (Nieplocha et al, 2006), which facilitates shared memory programming. Arguably DSLs are starting to be more prevalent because of the advent of better tooling for their development and because the code they generate can be better optimised by autotuning (Gropp and Snir, 2013). However, in our case, we still believe that human ex- Figure 8.…”
Section: Related Workmentioning
confidence: 94%
“…for memory optimisations). With the advent of exascale systems, entirely new programming models are likely to be necessary (Gropp and Snir, 2013), potentially deploying new tools, or even the same tools (MPI, OpenMP), to deliver entirely new algorithmic constructs such as thread pools and task-based parallelism (e.g. Perez et al, 2008).…”
Section: Where Is the Concurrency?mentioning
confidence: 99%