Improving polyhedral code generation for high-level synthesis

Zuo, Wei; Li, Peng; Chen, Deming; Pouchet, Louis-Noël; Zhong, Sheng; Cong, Jason

doi:10.1109/codes-isss.2013.6659002

Cited by 32 publications

(24 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…exchanging the outer and inner loop) result in different polyhedra, and potentially different IIs. Polyhedral-based optimizations have been applied to synthesize memory architectures [51], improve throughput [52], and optimize resource usage [53].…”

Section: Loop Optimizationsmentioning

confidence: 99%

A Survey and Evaluation of FPGA High-Level Synthesis Tools

Nane

Sima

Pilato

et al. 2016

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

449

182

View full text Add to dashboard Cite

Abstract-High-level synthesis (HLS) is increasingly popular for the design of high-performance and energy-efficient heterogeneous systems, shortening time-to-market and addressing today's system complexity. HLS allows designers to work at a higher-level of abstraction by using a software program to specify the hardware functionality. Additionally, HLS is particularly interesting for designing FPGA circuits, where hardware implementations can be easily refined and replaced in the target device. Recent years have seen much activity in the HLS research community, with a plethora of HLS tool offerings, from both industry and academia. All these tools may have different input languages, perform different internal optimizations, and produce results of different quality, even for the very same input description. Hence, it is challenging to compare their performance and understand which is the best for the hardware to be implemented. We present a comprehensive analysis of recent HLS tools, as well as overview the areas of active interest in the HLS research community. We also present a first-published methodology to evaluate different HLS tools. We use our methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and use of resources.

show abstract

Section: Loop Optimizationsmentioning

confidence: 99%

A Survey and Evaluation of FPGA High-Level Synthesis Tools

Nane

Sima

Pilato

et al. 2016

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

449

182

View full text Add to dashboard Cite

show abstract

“…However, this is an important optimization which could be profitably used in complement to our approach. Zuo et al [38] propose several source-level transformations to simplify the control for affine loop nests in front of an HLS tool. This approach is relevant when the outcome of a polyhedral optimization is a single unperfect loop nest with all the program statements.…”

Section: Related Workmentioning

confidence: 99%

“…For instance, Alias et al [2] propose a source-level approach at C level before high-level synthesis to produce an optimized I/O system for a circuit. Zuo et al [38] optimize the control structure at source-level on a C program before using VivadoHLS. In this report, we will not follow the same guidelines.…”

Section: Introductionmentioning

confidence: 99%

Optimizing Affine Control With Semantic Factorizations

Alias

Plesco²

2017

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Hardware accelerators generated by polyhedral synthesis make an extensive use of affine expressions (affine functions and convex polyhedra) in control and steering logic. Since the control is pipelined, these affine objects must be evaluated at the same time for different values, which forbids aggressive reuse of operators. In this report, we propose an algorithm to factorize a collection of affine expressions without preventing pipelining. Our key contributions are (i) to use semantic factorizations exploiting arithmetic properties of addition and multiplication and (ii) to rely on a cost function whose minimization ensures a correct usage of FPGA resources. Our algorithm is totally parametrized by the cost function, which can be customized to fit a target FPGA. Experimental results on a large pool of linear algebra kernels show a significant improvement compared to traditional low-level RTL optimizations. In particular, we show how our method reduces resource consumption by revealing hidden strength reductions.Key-words: High-level synthesis, polyhedral compilation, affine control, FPGA * CNRS/ENS-Lyon/Inria/UCBL/Université de Lyon † XtremLogic SAS Optimisation du contrôle affine avec des factorisations sémantiques Résumé :Les accélérateurs matériels compilés par les algorithmes de synthèse polyédrique utilisent intensivement des expressions affines (fonctions affines par morceaux, polyèdres convexes) dans leur contrôle. Comme le contrôle est pipeliné, ces objets affines doiventêtreévalués en même temps pour différentes valeurs d'entrée, ce qui interdit une réutilisation aggressive des opérateurs. Dans ce rapport, nous proposons un algorithme pour factoriser une collection d'expressions affines sans empêcher le pipeline. Nos contributions sont (i) l'utilisation de factorisations exploitant les propriétés arithmétiques de l'addition et de la multiplication et (ii) une fonction de coût dont la minimisation assure une utilisation efficace des ressources FPGA. Notre algorithme est totalement paramétré par la fonction de coût, qui peutêtre adaptéeà un FPGA cible donné. Les résultats expérimentaux montrent que notre algorithme complète avantageusement les optimisations RTL bas-niveau implantées dans les outils de synthèse industriels. En particulier, nous montrons comment notre algorithme reduit la taille du circuit en révélant des réductions de force cachées.

show abstract

“…Secondly, the polyhedral framework generates x86 optimized code with complicated loop bounds resulting in many extra divisions, and min/max operations. In [15] the authors remove some of the x86 artifacts in the generated output code with a HLS friendly code generator, but the fundamental problem of complex bounds remains. …”

Section: Related Workmentioning

confidence: 99%

Inter-Tile Reuse Optimization Applied to Bandwidth Constrained Embedded Accelerators

Peemen

Mesman

Corporaal

2015

Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE), 2015

View full text Add to dashboard Cite

Abstract-The adoption of High-Level Synthesis (HLS) tools has significantly reduced accelerator design time. A complex scaling problem that remains is the data transfer bottleneck. To scale-up performance accelerators require huge amounts of data, and are often limited by interconnect resources. In addition, the energy spent by the accelerator is often dominated by the transfer of data, either in the form of memory references or data movement on interconnect. In this paper we drastically reduce accelerator communication by exploration of computation reordering and local buffer usage. Consequently, we present a new analytical methodology to optimize nested loops for intertile data reuse with loop transformations like interchange and tiling. We focus on embedded accelerators that can be used in a multi-accelerator System on Chip (SoC), so performance, area, and energy are key in this exploration. 1) On three common embedded applications in the image/video processing domain (demosaicing, block matching, object detection), we show that our methodology reduces data movement up to 2.1x compared to the best case of intra-tile optimization. 2) We demonstrate that our small accelerators (1-3% FPGA resources) can boost a simple MicroBlaze soft-core to the performance level of a high-end Inteli7 processor.

show abstract

Improving polyhedral code generation for high-level synthesis

Cited by 32 publications

References 18 publications

A Survey and Evaluation of FPGA High-Level Synthesis Tools

A Survey and Evaluation of FPGA High-Level Synthesis Tools

Optimizing Affine Control With Semantic Factorizations

Inter-Tile Reuse Optimization Applied to Bandwidth Constrained Embedded Accelerators

Contact Info

Product

Resources

About