Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Archite 2018
DOI: 10.1145/3183767.3183776
|View full text |Cite
|
Sign up to set email alerts
|

Aspect-Driven Mixed-Precision Tuning Targeting GPUs

Abstract: Writing mixed-precision kernels allows to achieve higher throughput together with outputs whose precision remain within given limits. The recent introduction of native half-precision arithmetic capabilities in several GPUs, such as NVIDIA P100 and AMD Vega 10, contributes to make precision-tuning even more relevant as of late. However, it is not trivial to manually find which variables are to be represented as half-precision instead of single-or double-precision. Although the use of half-precision arithmetic c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
1
1

Relationship

4
2

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 18 publications
0
5
0
Order By: Relevance
“…Tools in the state-of-the-art are aimed at automatically producing an optimized version of a given numerical program, that sacrifices computa-tion accuracy to obtain performance gains. Such tools either target the entire program [17,15,19,2], or just computational kernels identified by the user [16,22,20,9]. Performance gains are obtained by using smaller data types, by using fixed point in place of floating point computations, or both.…”
Section: Related Workmentioning
confidence: 99%
“…Tools in the state-of-the-art are aimed at automatically producing an optimized version of a given numerical program, that sacrifices computa-tion accuracy to obtain performance gains. Such tools either target the entire program [17,15,19,2], or just computational kernels identified by the user [16,22,20,9]. Performance gains are obtained by using smaller data types, by using fixed point in place of floating point computations, or both.…”
Section: Related Workmentioning
confidence: 99%
“…LARA promotes modularity and aspect reuse, and supports embedding JavaScript code, to specify more sophisticated strategies. As shown in [12], we support exploration of mixed precision OpenCL kernels by using half, single, and double precision floating point data types. We additionally support fixed point representations through a custom C++ template-based implementation for HPC systems, which has already been used in [13].…”
Section: Precision Tuningmentioning
confidence: 99%
“…Error-tolerating applications are increasingly common in the emerging field of real-time HPC. In ANTAREX, we explored both precision tuning of floating point computation on GPGPU accelerators [12] and floating to fixed point conversion, followed by tuning of the fixed point representation in terms of bit width and point position [13,14]. c) Memoization: Memoization has been employed for a long time as a performance optimization technique, albeit primarilyin functional languages.…”
Section: The Antarex Approachmentioning
confidence: 99%
“…It has 209 nodes based on Intel Sandy Bridge CPUs 10 . It also contains 23 GPU accelerated nodes 11 and 4 MIC accelerated nodes 12 .…”
Section: B It4innovations Platform and Roadmapmentioning
confidence: 99%