TOAST: Automatic tiling for iterative stencil computations on GPUs

Rocha, Rodrigo C. O.; Pereira, Alyson D.; Ramos, L.E.S.; Góes, Luís F. W.

doi:10.1002/cpe.4053

Cited by 14 publications

(3 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…According to [22] tiling can improve parallel stencil applications in at least 3 ways. First, tiling partitions loop data and computations into tiles, thereby enabling the GPU to handle amounts of input data that exceed the capacity of its internal memory.…”

Section: Optimizationsmentioning

confidence: 99%

GPU performance analysis for viscoacoustic wave equations using fast stencil computation from the symbolic specification

et al. 2023

View full text Add to dashboard Cite

Seismic forward modeling is a computationally and data-intensive stage in the seismic processing workflow. By profiling the kernels of seismic forward modeling algorithms, was observed that they need to access a wide variety of memory locations, in addition to the computational cost of performing floating-point operations for the numerical solution of wave equations. In this context, was used the Roofline model to analyze six representative computing kernels in seismic modeling on GPU environment to indicate bottlenecks in the performance and suggest improvements of these wave equation propagators. Based on this, was implemented six viscoacoustic equations using the Devito tool. Experimental data have shown that optimizations in increasing data reuse and decreasing off-chip memory traffic can significantly improve performance.

show abstract

Section: Optimizationsmentioning

confidence: 99%

GPU performance analysis for viscoacoustic wave equations using fast stencil computation from the symbolic specification

et al. 2023

View full text Add to dashboard Cite

show abstract

“…In addition, researchers such as Alyson Deives Pereira and Rodrigo Caetano Rocha did research on OpenACC's extending support for stencil calculations (Alyson et al 2017a;Alyson et al 2017aAlyson et al , 2015Alyson et al 2017c;Rodrigo et al 2017), and proposed an extension pragma stencil. They designed and developed a source-to-source compiler that can identify stencil pragma and performs corresponding code transformations to generate efficient PSkel code that can be compiled and executed on GPUaccelerated devices.…”

Section: Openacc Related Workmentioning

confidence: 99%

An automatic mapping technique for OpenACC kernel code based on deeply fused and heterogeneous many-core architecture

et al. 2020

View full text Add to dashboard Cite

Now the OpenACC has become a popular programming interface for many-core application programming. Internationally, a lot of research have been done on OpenACC for CPU + GPU heterogeneous many-core architecture. Among them, the PGI OpenACC compiler developed by NVIDIA is the most advanced one. But there are few research on OpenACC related to the Home Grown Heterogeneous Many-Core (HGHM) Architecture that is different from GPU. This paper proposes an automatic mapping technique for OpenACC kernel code based on the OpenACC compiler to a heterogeneous and deeply fused many-core architecture. Our approach uses the static analysis and feedback dynamic analysis of the compiler to perform the automatic mapping of the program parallel kernel code to many-core devices, and it greatly improves the transformation quality of the compiler. Experimental results show that this technique can greatly improve the efficiency of using OpenACC to port applications to heterogeneous and fused many-core system without impacting program acceleration performance.

show abstract

“…Quando são feitas computações estêncil sobre o tile, dependências de vizinhança, inerentes ao padrão paralelo do estêncil, precisam ser consideradas durante o particionamento dos dados. Uma das principais soluções para satisfazer essas dependênciasé via blocos sobrepostos, resultando em dados redundantes e computação por tile [Meng and Skadron 2011, Holewinski et al 2012, Rocha et al 2017. Essa técnicaé muito importante em manycores de baixa potência como o MPPA-256, onde o sobrecusto de comunicação pode ser elevado.…”

Section: Adaptação Do Framework Pskel Para O Mppa-256unclassified

Execução Energeticamente Eficiente de Aplicações Estêncil com o Processador Manycore MPPA-256

Podestá¹,

Pereira²,

Rocha³

et al. 2017

Anais Do XVIII Simpósio Em Sistemas Computacionais De Alto Desempenho (SSCAD 2017)

View full text Add to dashboard Cite

Neste artigo é proposta uma adaptação do framework PSkel para o processador manycore de baixa potência MPPA-256. O framework permite simplificar o desenvolvimento de aplicações estêncil iterativas para o MPPA-256, escondendo do desenvolvedor detalhes de implementação. Os resultados obtidos no MPPA-256 mostraram uma redução do consumo de energia de aplicações estêncil iterativas de até 1.45x em comparação com um processador multicore Intel Broadwell.

show abstract

TOAST: Automatic tiling for iterative stencil computations on GPUs

Cited by 14 publications

References 33 publications

GPU performance analysis for viscoacoustic wave equations using fast stencil computation from the symbolic specification

GPU performance analysis for viscoacoustic wave equations using fast stencil computation from the symbolic specification

An automatic mapping technique for OpenACC kernel code based on deeply fused and heterogeneous many-core architecture

Execução Energeticamente Eficiente de Aplicações Estêncil com o Processador Manycore MPPA-256

Contact Info

Product

Resources

About