A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs

Angerd, Alexandra; Sintorn, Erik; Stenström, Per

doi:10.1145/3151032

Cited by 11 publications

(18 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Reducing the precision of floating point [6,21,42] and fixed point [22] numbers has been used to alleviate the memory bandwidth bottleneck in deep neural networks [22], GPU workloads [42] and other approximation tolerant applications [21], thereby improving performance and energy efficiency. However, the compression ratio is still limited between 2:1 and 4:1 despite the loss of precision as these approaches do not exploit inter-value similarities to compress data.…”

Section: Related Workmentioning

confidence: 99%

“…Approximate deduplication of individual cachelines increases cache capacity [39], however, multiple values need to match at cacheline granularity. A form of lossy compression has been applied in approximate computing, but is constrained to reducing precision of single values truncating their least significant bits [6,21,22,42] and therefore achieves limited compression ratio.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Avr

Eldstål-Damlin

Trancoso

Sourdis

2019

Proceedings of the 48th International Conference on Parallel Processing

View full text Add to dashboard Cite

This paper describes Approximate Value Reconstruction (AVR), an architecture for approximate memory compression. AVR reduces the memory traffic of applications that tolerate approximations in their dataset. Thereby, it utilizes more efficiently the available off-chip bandwidth improving significantly system performance and energy efficiency. AVR compresses memory blocks using low latency downsampling that exploits similarities between neighboring values and achieves aggressive compression ratios, up to 16:1 in our implementation. The proposed AVR architecture supports our compression scheme maximizing its effect and minimizing its overheads by (i) co-locating in the Last Level Cache (LLC) compressed and uncompressed data, (ii) efficiently handling LLC evictions, (iii) keeping track of badly compressed memory blocks, and (iv) avoiding LLC pollution with unwanted decompressed data. For applications that tolerate aggressive approximation in large fractions of their data, AVR reduces memory traffic by up to 70%, execution time by up to 55%, and energy costs by up to 20% introducing up to 1.2% error to the application output. CCS CONCEPTS • Computer systems organization → Other architectures; • Hardware → Memory and dense storage.

show abstract

Section: Related Workmentioning

confidence: 99%

mentioning

confidence: 99%

Avr

Eldstål-Damlin

Trancoso

Sourdis

2019

Proceedings of the 48th International Conference on Parallel Processing

View full text Add to dashboard Cite

show abstract

“…Automated precision tuning for an application was investigated in [3], [4]. The algorithm of [3] adapts the delta debugging based search algorithm to seek 1-minimal test case (e.g., for 1-minimal test case, replacing any variable with a lower precision violates either accuracy constraint or performance constraint).…”

Section: B Automated Precision Tuningmentioning

confidence: 99%

“…The algorithm of [3] adapts the delta debugging based search algorithm to seek 1-minimal test case (e.g., for 1-minimal test case, replacing any variable with a lower precision violates either accuracy constraint or performance constraint). Another automated precision tuning research was proposed in [4] to investigate precision tuning for a lower level implementation.…”

Section: B Automated Precision Tuningmentioning

confidence: 99%

See 1 more Smart Citation

Energy-Efficient Iterative Refinement Using Dynamic Precision

Lee

Vandierendonck

Arif

et al. 2018

IEEE J. Emerg. Sel. Topics Circuits Syst.

View full text Add to dashboard Cite

Mixed precision is a promising approach to save energy in iterative refinement algorithms since it obtains speedup without necessitating additional cores and parallelisation. However, conventional mixed precision methods utilise statically defined precision in a loop, thus hindering further speed-up and energy savings. We overcome this problem by proposing novel methods which allow iterative refinement to utilise variable precision arithmetic dynamically in a loop (i.e. a trans-precision approach). Our methods restructure a numeric algorithm dynamically according to runtime numeric behaviour and remove unnecessary accuracy checks. We implemented our methods by extending one conventional mixed precision iterative refinement algorithm on an Intel Xeon E5-2650 2GHz core with MKL 2017 and XBLAS 1.0. Our dynamic precision approach demonstrates 2.0-2.6× speed-up and 1.8-2.4× energy savings compared to mixed precision iterative refinement when double precision solution accuracy is required for forward error and with matrix dimensions ranging from 4K to 32K.

show abstract