2017
DOI: 10.1002/cpe.4053
|View full text |Cite
|
Sign up to set email alerts
|

TOAST: Automatic tiling for iterative stencil computations on GPUs

Abstract: Summary The stencil pattern is important in many scientific and engineering domains, spurring great interest from researchers and industry. In recent years, various optimizations have been proposed for parallel stencil applications running on graphics processing units (GPUs). In particular, tiling is a technique that can significantly enhance application performance by improving data locality and by reducing the volume of communication between host memory and GPU. In addition, tiling enables stencil applicatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(3 citation statements)
references
References 33 publications
0
2
0
1
Order By: Relevance
“…According to [22] tiling can improve parallel stencil applications in at least 3 ways. First, tiling partitions loop data and computations into tiles, thereby enabling the GPU to handle amounts of input data that exceed the capacity of its internal memory.…”
Section: Optimizationsmentioning
confidence: 99%
“…According to [22] tiling can improve parallel stencil applications in at least 3 ways. First, tiling partitions loop data and computations into tiles, thereby enabling the GPU to handle amounts of input data that exceed the capacity of its internal memory.…”
Section: Optimizationsmentioning
confidence: 99%
“…In addition, researchers such as Alyson Deives Pereira and Rodrigo Caetano Rocha did research on OpenACC's extending support for stencil calculations (Alyson et al 2017a;Alyson et al 2017aAlyson et al , 2015Alyson et al 2017c;Rodrigo et al 2017), and proposed an extension pragma stencil. They designed and developed a source-to-source compiler that can identify stencil pragma and performs corresponding code transformations to generate efficient PSkel code that can be compiled and executed on GPUaccelerated devices.…”
Section: Openacc Related Workmentioning
confidence: 99%
“…Quando são feitas computações estêncil sobre o tile, dependências de vizinhança, inerentes ao padrão paralelo do estêncil, precisam ser consideradas durante o particionamento dos dados. Uma das principais soluções para satisfazer essas dependênciasé via blocos sobrepostos, resultando em dados redundantes e computação por tile [Meng and Skadron 2011, Holewinski et al 2012, Rocha et al 2017. Essa técnicaé muito importante em manycores de baixa potência como o MPPA-256, onde o sobrecusto de comunicação pode ser elevado.…”
Section: Adaptação Do Framework Pskel Para O Mppa-256unclassified