YASK—Yet Another Stencil Kernel: A Framework for HPC Stencil Code-Generation and Tuning

Yount, Charles; Tobin, Josh; Breuer, Alexander; Durán, Alejandro

doi:10.1109/wolfhpc.2016.08

Cited by 53 publications

(32 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On Xeon and Xeon Phi processors, one of the most highlyoptimized implementations was introduced by Yount [13], which uses a technique called "Vector Folding" that is suitable for wide-vector architectures. This implementation was further optimized and made available to public as the "YASK" framework [9]. We use this framework in our evaluation.…”

Section: Introductionmentioning

confidence: 99%

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

Zohouri

Podobas

Matsuoka

2018

2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

View full text Add to dashboard Cite

In this paper we evaluate the performance of FPGAs for high-order stencil computation using High-Level Synthesis. We show that despite the higher computation intensity and onchip memory requirement of such stencils compared to first-order ones, our design technique with combined spatial and temporal blocking remains effective. This allows us to reach similar, or even higher, compute performance compared to first-order stencils. We use an OpenCL-based design that, apart from parameterizing performance knobs, also parameterizes the stencil radius. Furthermore, we show that our performance model exhibits the same accuracy as first-order stencils in predicting the performance of high-order ones. On an Intel Arria 10 GX 1150 device, for 2D and 3D star-shaped stencils, we achieve over 700 and 270 GFLOP/s of compute performance, respectively, up to a stencil radius of four. These results outperform the state-of-theart YASK framework on a modern Xeon for 2D and 3D stencils, and outperform a modern Xeon Phi for 2D stencils, while achieving competitive performance in 3D. Furthermore, our FPGA design achieves better power efficiency in almost all cases.

show abstract

Section: Introductionmentioning

confidence: 99%

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

Zohouri

Podobas

Matsuoka

2018

2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

View full text Add to dashboard Cite

show abstract

“…Each backend transformation pass is based on manipulating an input AST and returning a new, different AST. One of the reasons behind this software engineering strategy, which is clearly more challenging than a template-based solution, is to ease the integration of external tools, such as the YASK stencil optimizer [Yount16]. We are currently in the process of integrating YASK to complement the DLE, so that YASK may replace some (but not all) DLE passes.…”

Section: Integration With Yaskmentioning

confidence: 99%

“…Creating the proper Python bindings in YASK so that Devito can drive the code generation process. It has been shown that real-world stencil codes optimised through YASK may achieve an exceptionally high fraction of the attainable machine peak [Yount15], [Yount16]. Further, initial prototyping (manual optimization of Devito-generated code through YASK) revealed that YASK may also outperform the loop optimization engine currently available in Devito, besides ensuring seamless performance portability across a range of computer architectures.…”

Section: Integration With Yaskmentioning

confidence: 99%

Optimised finite difference computation from symbolic equations

Lange¹,

Kukreja²,

Luporini³

et al. 2017

Proceedings of the 16th Python in Science Conference

Self Cite

View full text Add to dashboard Cite

Abstract-Domain-specific high-productivity environments are playing an increasingly important role in scientific computing due to the levels of abstraction and automation they provide. In this paper we introduce Devito, an opensource domain-specific framework for solving partial differential equations from symbolic problem definitions by the finite difference method. We highlight the generation and automated execution of highly optimized stencil code from only a few lines of high-level symbolic Python for a set of scientific equations, before exploring the use of Devito operators in seismic inversion problems.

show abstract

“…a combination of the number of potential loops blocking sizes for 3D scenario with any additional context, would create an intractable set of combinations that would take years to explore. Thus, we used the YASK framework [7] and its genetic algorithm (GA) auto-tuning system to find nearoptimum settings. For each desired result, the GA runs for at least three separate experiments to avoid finding a local minimum prematurely.…”

Section: Optimization Strategiesmentioning

confidence: 99%

Optimizing Fully Anisotropic Elastic Propagation on 2nd Generation Intel Xeon Phi Processors

et al. 2017

View full text Add to dashboard Cite

This work shows several optimization strategies evaluated and applied to an elastic wave propagation engine, based on a Fully Staggered Grid, running on the latest Intel Xeon Phi processors, the second generation of the product (code-named Knights Landing). Our fully optimized code shows a speed-up of about 4x when compared with the same algorithm optimized for the previous generation processor.Authors also thank Repsol for the permission to publish the present research, carried out at the Repsol-BSC Research Center. This work has received funding from the European Union's Horizon 2020 Programme (2014-2020) and from the Brazilian Ministry of Science, Technology and Innovation through Rede\ud Nacional de Pesquisa (RNP) under the HPC4E Project (www.hpc4e.eu), grant\ud agreement n.◦ 689772.\ud \ud * Other brands and names are the property of their respective owners.Peer ReviewedPostprint (author's final draft

show abstract

YASK—Yet Another Stencil Kernel: A Framework for HPC Stencil Code-Generation and Tuning

Cited by 53 publications

References 14 publications

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

Optimised finite difference computation from symbolic equations

Optimizing Fully Anisotropic Elastic Propagation on 2nd Generation Intel Xeon Phi Processors

Contact Info

Product

Resources

About