Aspect-Driven Mixed-Precision Tuning Targeting GPUs

Nobre, Ricardo; Reis, Luís Paulo; Bispo, João; Carvalho, Tiago; Cardoso, João M. P.; Cherubin, Stefano; Agosta, Giovanni

doi:10.1145/3183767.3183776

Cited by 10 publications

(5 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Tools in the state-of-the-art are aimed at automatically producing an optimized version of a given numerical program, that sacrifices computa-tion accuracy to obtain performance gains. Such tools either target the entire program [17,15,19,2], or just computational kernels identified by the user [16,22,20,9]. Performance gains are obtained by using smaller data types, by using fixed point in place of floating point computations, or both.…”

Section: Related Workmentioning

confidence: 99%

Feedback-Driven Performance and Precision Tuning for Automatic Fixed Point Exploitation

Cattaneo

Chiari

Cherubin

et al. 2020

Parallel Computing: Technology Trends

View full text Add to dashboard Cite

Precision tuning is an emerging class of techniques that leverage the trade-off between accuracy and performance in a wide range of numerical applications. We employ TAFFO, a compiler-based state-of-the-art framework that relies on fixed point representations to perform precision tuning. It converts floating-point computations into a fixed point version with comparable semantics, in order to obtain performance improvements. Usually, the process of fixed point type selection aims at the minimization of the round-off error introduced by the precision reduction. However, this approach introduces a large number of type cast operations, generating an overhead that may overcome the performance improvements of the conversion to fixed point formats. We propose a control loop architecture that exploits the static analyses provided by TAFFO to reduce the number of type cast operations while keeping the error under a given threshold. We evaluate our approach on three benchmarks of the AXBENCH suite, and we show that in all cases we are able to achieve performance improvements while keeping the introduced numerical error below the given tolerance threshold.

show abstract

Section: Related Workmentioning

confidence: 99%

Feedback-Driven Performance and Precision Tuning for Automatic Fixed Point Exploitation

Cattaneo

Chiari

Cherubin

et al. 2020

Parallel Computing: Technology Trends

View full text Add to dashboard Cite

show abstract

“…LARA promotes modularity and aspect reuse, and supports embedding JavaScript code, to specify more sophisticated strategies. As shown in [12], we support exploration of mixed precision OpenCL kernels by using half, single, and double precision floating point data types. We additionally support fixed point representations through a custom C++ template-based implementation for HPC systems, which has already been used in [13].…”

Section: Precision Tuningmentioning

confidence: 99%

The ANTAREX domain specific language for high performance computing

Silvano

Agosta

Bartolini

et al. 2019

Microprocessors and Microsystems

Self Cite

View full text Add to dashboard Cite

The ANTAREX project relies on a Domain Specific Language (DSL) based on Aspect Oriented Programming (AOP) concepts to allow applications to enforce extra functional properties such as energy-efficiency and performance and to optimize Quality of Service (QoS) in an adaptive way. The DSL approach allows the definition of energyefficiency, performance, and adaptivity strategies as well as their enforcement at runtime through application autotuning and resource and power management. In this paper, we present an overview of the key outcome of the project, the ANTAREX DSL, and some of its capabilities through a number of examples, including how the DSL is applied in the context of the project use cases.

show abstract

“…Error-tolerating applications are increasingly common in the emerging field of real-time HPC. In ANTAREX, we explored both precision tuning of floating point computation on GPGPU accelerators [12] and floating to fixed point conversion, followed by tuning of the fixed point representation in terms of bit width and point position [13,14]. c) Memoization: Memoization has been employed for a long time as a performance optimization technique, albeit primarilyin functional languages.…”

Section: The Antarex Approachmentioning

confidence: 99%

“…It has 209 nodes based on Intel Sandy Bridge CPUs 10 . It also contains 23 GPU accelerated nodes 11 and 4 MIC accelerated nodes 12 .…”

Section: B It4innovations Platform and Roadmapmentioning

confidence: 99%

Supporting the Scale-Up of High Performance Application to Pre-Exascale Systems: The ANTAREX Approach

Silvano

Agosta

Bartolini

et al. 2019

2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)

Self Cite

View full text Add to dashboard Cite

Aspect-Driven Mixed-Precision Tuning Targeting GPUs

Cited by 10 publications

References 18 publications

Feedback-Driven Performance and Precision Tuning for Automatic Fixed Point Exploitation

Feedback-Driven Performance and Precision Tuning for Automatic Fixed Point Exploitation

The ANTAREX domain specific language for high performance computing

Supporting the Scale-Up of High Performance Application to Pre-Exascale Systems: The ANTAREX Approach

Contact Info

Product

Resources

About