2020
DOI: 10.1002/cpe.6016
|View full text |Cite
|
Sign up to set email alerts
|

An in‐depth introduction of multi‐workgroup tiling for improving the locality of explicit one‐step methods for ODE systems with limited access distance on GPUs

Abstract: Summary This article considers a locality optimization technique for the parallel solution of a special class of large systems of ordinary differential equations (ODEs) by explicit one‐step methods on GPUs. This technique is based on tiling across the stages of the one‐step method and is enabled by the special structure of the class of ODE systems considered, that is, the limited access distance. The focus of this article is on increasing the range of access distances for which the tiling technique can provide… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
3
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 20 publications
0
3
0
Order By: Relevance
“…They often involve feeding the stencil into a stencil compiler such as PLuTo [25], Pochoir [130], or Devito [91], which will output optimized code to compute the action of the stencil across some prespecified grid of initial data for multiple timesteps. Cutting-edge stencil code generators feature many improvements over simple looping algorithms, including better cache efficiency [49,85], parallelism [83], and low-level compiler optimizations. These systems all perform the same set of updates on the stencil grid, although they vary in the order that these updates are performed.…”
Section: Related Work and Its Limitationsmentioning
confidence: 99%
“…They often involve feeding the stencil into a stencil compiler such as PLuTo [25], Pochoir [130], or Devito [91], which will output optimized code to compute the action of the stencil across some prespecified grid of initial data for multiple timesteps. Cutting-edge stencil code generators feature many improvements over simple looping algorithms, including better cache efficiency [49,85], parallelism [83], and low-level compiler optimizations. These systems all perform the same set of updates on the stencil grid, although they vary in the order that these updates are performed.…”
Section: Related Work and Its Limitationsmentioning
confidence: 99%
“…Another related research area is the solution of ODE systems on established platforms such as CPUs and GPUs. This article was partially inspired by the preceding work of Korch and Werner, 22 who propose an automatic code generation approach for explicit ODE methods on GPUs, which is based on a data flow representation of the method. However, while the preceding paper aims at locality optimizations to exploit the memory hierarchy of a GPU by different tiling schemes, the present work is focused on automatically creating efficient pipeline layouts for the FPGA.…”
Section: Related Workmentioning
confidence: 99%
“…Paper [3] proposes a locality optimization technique for the parallel solution on GPUs of large systems of ODEs by explicit one‐step methods. This technique is based on tiling across the stages of a one‐step method and is enabled by a special structure of the class of ODE systems—with the limited access distance.…”
mentioning
confidence: 99%