High-Performance FPGA-Based General Reduction Methods

Morris, Gerald R.; Zhuo, Ling; Prasanna, Viktor K.

doi:10.1109/fccm.2005.42

Cited by 16 publications

(15 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…All the research papers consulted in this research [5,[19][20][21]. [3][4][14][15]18,[21][22][23][24][28][29] did not provide direct relationship between the width of fraction and the width of available unsigned registers in determining the upper bound of accuracy of decimal fraction computed within binary floating-point. We establish the direct relation-ships between y fraction bits and z bit of unsigned integral registers in the statement (8) and (9) On the reverse direction z bit unsigned integral register will be capable of representing decimal-digits fraction accurately without sophisticated algorithm.…”

Section: Resultsmentioning

confidence: 99%

Are Ieee 754 32-Bit and 64-Bit Binary Floating-Point Accurate Enough?

Hutabarat¹,

Purnama²,

Hariadi³

et al. 2011

MST

View full text Add to dashboard Cite

This paper describes a research toward the accuracy of floating-point values, and effort to reveal the real accuracy. The methods used in this research paper are assignment of values, assignment of value of arithmetic expressions, and output the values using floating-point value format that helps reveal the accuracy. The programming-tool used are Visual C# 9, Visual C++ 9, Java 5, and Visual BASIC 9. These tools run on top of Intel 80 x 86 hardware. The results show that 1*10 -x cannot be accurately represented, and the approximate accuracy ranges only from 7 to 16 decimal digits.

show abstract

Section: Resultsmentioning

confidence: 99%

Are Ieee 754 32-Bit and 64-Bit Binary Floating-Point Accurate Enough?

Hutabarat¹,

Purnama²,

Hariadi³

et al. 2011

MST

View full text Add to dashboard Cite

show abstract

“…The deeply pipelined nature of the arithmetic units on FPGAs, such as those for floating point, make reduction operators non-trivial to implement, and much research has gone into efficient reduction circuits [23][24][25]. Reduction implementations must balance throughput, resource usage and latency.…”

Section: Reductionsmentioning

confidence: 99%

dfesnippets: An Open-Source Library for Dataflow Acceleration on FPGAs

Grigoras

Burovskiy

Arram

et al. 2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Highly-tuned FPGA implementations can achieve significant performance and power efficiency gains over general purpose hardware. However the limited development productivity has prevented mainstream adoption of FPGAs in many areas such as High Performance Computing. High level standard development libraries are increasingly adopted in improving productivity. We propose an approach for performance critical applications including standard library modules, benchmarking facilities and application benchmarks to support a variety of use-cases. We implement the proposed approach as an open-source library for a commercially available FPGA system and highlight applications and productivity gains.

show abstract

“…However, Carte flushes inner loop pipelines, which significantly reduces the performance of this application. Finally, the VHDL-based accumulator IP cores described in [13] will not work in pipelined loops since the latency depends upon the number of elements in the input stream. In short, existing accumulation solutions fail for this application.…”

Section: Partial Summation Unit: Postscriptmentioning

confidence: 99%