Stefano Cherubin scite author profile

et al. 2016

The ANTAREX 1 project aims at expressing the application selfadaptivity through a Domain Specific Language (DSL) and to runtime manage and autotune applications for green and heterogeneous High Performance Computing (HPC) systems up to Exascale. The DSL approach allows the definition of energy-efficiency, performance, and adaptivity strategies as well as their enforcement at runtime through application autotuning and resource and power management. We show through a mini-app extracted from one of the project application use cases some initial exploration of application precision tuning by means enabled by the DSL.

Tools for Reduced Precision Computation

Agosta

2020

ACM Comput. Surv.

The use of reduced precision to improve performance metrics such as computation latency and power consumption is a common practice in the embedded systems field. This practice is emerging as a new trend in High Performance Computing (HPC), especially when new error-tolerant applications are considered. However, standard compiler frameworks do not support automated precision customization, and manual tuning and code transformation is the approach usually adopted in most domains. In recent years, research have been studying ways to improve the automation of this process. This article surveys this body of work, identifying the critical steps of this process, the most advanced tools available, and the open challenges in this research area. We conclude that, while several mature tools exist, there is still a gap to close, especially for tools based on static analysis rather than profiling, as well as for integration within mainstream, industry-strength compiler frameworks.

TAFFO: Tuning Assistant for Floating to Fixed Point Optimization

IEEE Embedded Syst. Lett.

Cattaneo

Chiari

et al. 2020

libVersioningCompiler: An easy-to-use library for dynamic generation and invocation of multiple code versions

Agosta

2018

SoftwareX

Dynamic Precision Autotuning with TAFFO

ACM Trans. Archit. Code Optim.

Cattaneo

Chiari

et al. 2020

Many classes of applications, both in the embedded and high performance domains, can trade off the accuracy of the computed results for computation performance. One way to achieve such a trade-off is precision tuning-that is, to modify the data types used for the computation by reducing the bit width, or by changing the representation from floating point to fixed point. We present a methodology for high-accuracy dynamic precision tuning based on the identification of input classes (i.e., classes of input datasets that benefit from similar optimizations). When a new input region is detected, the application kernels are re-compiled on the fly with the appropriate selection of parameters. In this way, we obtain a continuous optimization approach that enables the exploitation of the reduced precision computation while progressively exploring the solution space, thus reducing the time required by compilation overheads. We provide tools to support the automation of the runtime part of the solution, leaving to the user only the task of identifying the input classes. Our approach provides a significant performance boost (up to 320%) on the typical approximate computing benchmarks, without meaningfully affecting the accuracy of the result, since the error remains always below 3%.