2000
DOI: 10.1109/12.841125
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating pipelined integer and floating-point accumulations in configurable hardware with delayed addition techniques

Abstract: ÐThe speed of arithmetic calculations in configurable hardware is limited by carry propagation, even with the dedicated hardware found in recent FPGAs. This paper proposes and evaluates an approach called delayed addition that reduces the carrypropagation bottleneck and improves the performance of arithmetic calculations. Our approach employs the idea used in Wallace trees to store the results in an intermediate form and delay addition until the end of a repeated calculation such as accumulation or dotproduct;… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
38
0

Year Published

2002
2002
2020
2020

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 68 publications
(38 citation statements)
references
References 15 publications
0
38
0
Order By: Relevance
“…Previous attempts to pipeline floating-point accumulation, such as [6], do so only at the expense of assuming associativity and thereby producing non-compliant results. In contrast, the techniques introduced here show how to exploit associativity while obtaining results identical to the sequential ordering specified in the program.…”
Section: Background a Non-associativity Of Floating-point Accumumentioning
confidence: 99%
“…Previous attempts to pipeline floating-point accumulation, such as [6], do so only at the expense of assuming associativity and thereby producing non-compliant results. In contrast, the techniques introduced here show how to exploit associativity while obtaining results identical to the sequential ordering specified in the program.…”
Section: Background a Non-associativity Of Floating-point Accumumentioning
confidence: 99%
“…Thus, power consumption is reduced, but large precision formats require more cycles to complete the operation. The proposal described in Vangal et al [2006] considers delayed addition, as in Luo and Martonosi [2000], for optimizing the delay of large sums. Moreover, authors optimize the SP FP-MAC by adding in 32 base.…”
Section: Related Workmentioning
confidence: 99%
“…They can be used to implement complex applications, including floating-point ones [Scrofano et al 2008;Underwood 2004;Woods and VanCourt 2008;Zhuo and Prasanna 2008]. This has sparked the development of floating-point units targeting FPGAs, first for the basic operators mimicking thoses found in microprocessors [Belanovic and Leeser 2002;Louca et al 1996;Shirazi et al 1995], then more recently for operators which are more FPGA-specific, for instance accumulators [Bodnar et al 2006;de Dinechin et al 2008;Luo and Martonosi 2000;Wang et al 2006] or elementary functions Doss and Riley 2004;Pineiro et al 2004].…”
Section: Introduction 1floating Point Acceleration Using Fpgasmentioning
confidence: 99%