Multispeculative Addition Applied to Datapath Synthesis

Barrio, Alberto A. Del; Hermida, R.; Memik, Seda Öǧrenci; Mendias, J.M.; Molina, M.C.

doi:10.1109/tcad.2012.2208966

Cited by 27 publications

(24 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Besides, every module is usually implemented as a KS, the fastest but also the largest type of design. Nevertheless, as noted in Del Barrio et al [2012], KS area behaves linear when its width is small. Hence, a Multispeculative Adder (MSADD) composed of several small KS modules working in parallel has a linear area and a reduced execution time.…”

Section: Our Proposed Designmentioning

confidence: 72%

“…The idea of splitting the addition can also be found in integer arithmetic [Nowick 1996;Lu 2003;Verma et al 2008;Del Barrio et al 2012]. An n-bit adder is divided into several k-bit fragments operating in parallel, k n, because if k is large enough the carry-in of a chunk will be quasi-independent from the carry-out of the previous fragment.…”

Section: Related Workmentioning

confidence: 99%

“…Secondly, in order to reduce the power consumption as much as possible, the replication of many k-bit modules, as proposed in [Lu 2003;Verma et al 2008], is not suitable because of their area and power overhead. On the contrary, the work presented in Del Barrio et al [2012]. Proposes to divide the n-bit adder into exactly n/k k-bit fragments working in parallel.…”

Section: Related Workmentioning

confidence: 99%

“…Del Barrio et al [2012] there are several approaches to predict these carries, but the important fact is that if there is a failure in the prediction, a single cycle will be enough for correcting simultaneously all the mispredictions. Nevertheless, the possibility of propagating these mispredictions through the most significant modules, although extremely low, still exists.…”

Section: Our Proposed Designmentioning

confidence: 99%

“…It is true that with the previous four cases it is possible to correct and/or complement every k-bit result in parallel. However, as stated in Del Barrio et al [2012], a misprediction can be propagated to the most significant module. Although most mispredictions can be corrected in a single step, there still exist cases where more steps are necessary.…”

Section: Lemma 1 Let S C Be Two N-bit Vectors Hence C1(smentioning

confidence: 99%

See 4 more Smart Citations

Ultra-low-power adder stage design for exascale floating point units

Barrio

Bagherzadeh

Hermida

2014

ACM Trans. Embed. Comput. Syst.

Self Cite

View full text Add to dashboard Cite

Currently, the most powerful supercomputers can provide tens of petaflops. Future many-core systems are estimated to provide an exaflop. However, the power budget limitation makes these machines still unfeasible and unaffordable. Floating Point Units (FPUs) are critical from both the power consumption and performance points of view of today's microprocessors and supercomputers. Literature offers very different designs. Some of them are focused on increasing performance no matter the penalty, and others on decreasing power at the expense of lower performance. In this article, we propose a novel approach for reducing the power of the FPU without degrading the rest of parameters. Concretely, this power reduction is also accompanied by an area reduction and a performance improvement. Hence, an overall energy gain will be produced. According to our experiments, our proposed unit consumes 17.5%, 23% and 16.5% less energy for single, double and quadruple precision, with an additional 15%, 21.5% and 14.5% delay reduction, respectively. Furthermore, area is also diminished by 4%, 4.5 and 5%.

show abstract

Section: Our Proposed Designmentioning

confidence: 72%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Our Proposed Designmentioning

confidence: 99%

Section: Lemma 1 Let S C Be Two N-bit Vectors Hence C1(smentioning

confidence: 99%

See 3 more Smart Citations

Ultra-low-power adder stage design for exascale floating point units

Barrio

Bagherzadeh

Hermida

2014

ACM Trans. Embed. Comput. Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

Improving circuit performance with multispeculative additive trees in high-level synthesis

Barrio¹,

Hermida²,

Memik³

et al. 2014

Microelectronics Journal

Self Cite

View full text Add to dashboard Cite

Abstract-The recent introduction of Variable Latency Functional Units (VLFUs) has broadened the design space of High-LevelSynthesis (HLS). Nevertheless their use is restricted to only few operators in the datapaths because the number of cases to control grows exponentially. In this work an instance of VLFUs is described, and based on its structure, the average latency of tree structures is improved. Multispeculative Functional Units (MSFUs) are arithmetic Functional Units that operate using several predictors for the carry signal. In spite of utilizing more than a predictor, none or only one additional very short cycle is enough for producing the correct result in the majority of the cases. In this paper our proposal takes advantage of multispeculation in order to increase the performance of tree structures with a negligible area penalty. By judiciously introducing these structures into computation trees, it will only be necessary to predict the carry signals in certain selected nodes, thus minimizing the total number of predictions and the number of operations that can potentially mispredict. Hence, the average latency will be diminished and thus performance will be increased. Our experiments show that it is possible to improve 26% execution time. Furthermore, our flow outperforms previous approaches with Speculative FUs.

show abstract