13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05)
DOI: 10.1109/fccm.2005.42
|View full text |Cite
|
Sign up to set email alerts
|

High-Performance FPGA-Based General Reduction Methods

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 16 publications
(15 citation statements)
references
References 1 publication
0
15
0
Order By: Relevance
“…All the research papers consulted in this research [5,[19][20][21]. [3][4][14][15]18,[21][22][23][24][28][29] did not provide direct relationship between the width of fraction and the width of available unsigned registers in determining the upper bound of accuracy of decimal fraction computed within binary floating-point. We establish the direct relation-ships between y fraction bits and z bit of unsigned integral registers in the statement (8) and (9) On the reverse direction z bit unsigned integral register will be capable of representing decimal-digits fraction accurately without sophisticated algorithm.…”
Section: Resultsmentioning
confidence: 99%
“…All the research papers consulted in this research [5,[19][20][21]. [3][4][14][15]18,[21][22][23][24][28][29] did not provide direct relationship between the width of fraction and the width of available unsigned registers in determining the upper bound of accuracy of decimal fraction computed within binary floating-point. We establish the direct relation-ships between y fraction bits and z bit of unsigned integral registers in the statement (8) and (9) On the reverse direction z bit unsigned integral register will be capable of representing decimal-digits fraction accurately without sophisticated algorithm.…”
Section: Resultsmentioning
confidence: 99%
“…The deeply pipelined nature of the arithmetic units on FPGAs, such as those for floating point, make reduction operators non-trivial to implement, and much research has gone into efficient reduction circuits [23][24][25]. Reduction implementations must balance throughput, resource usage and latency.…”
Section: Reductionsmentioning
confidence: 99%
“…However, Carte flushes inner loop pipelines, which significantly reduces the performance of this application. Finally, the VHDL-based accumulator IP cores described in [13] will not work in pipelined loops since the latency depends upon the number of elements in the input stream. In short, existing accumulation solutions fail for this application.…”
Section: Partial Summation Unit: Postscriptmentioning
confidence: 99%