A Scalable Counterflow-Pipelined Asynchronous Radix-4 Booth Multiplier

Hensley, Justin; Lastra, Anselmo; Singh, Montek

doi:10.1109/async.2005.6

Cited by 7 publications

(10 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, the latch needs to be enabled only when a token is passing through. In general, however, counterflow pipeline can be designed to support data flow in either direction, as was done for the counterflow Booth multiplier in [4,5].…”

Section: Pipeline Operationmentioning

confidence: 98%

“…The if-then-else join has 3 input channels -if branch (with Fif and Bif control signals); else branch (with Fe1 and Bei control signals); and the condition evaluation branch (with F,n and B,n control signals). The join also has the condition bit d + cn 5.…”

Section: If-then-else Controllermentioning

confidence: 99%

“…[4,5] introduced a counterflow pipelined asynchronous radix-4 booth multiplier, with a novel counterflow organization: the data bits flow in one direction and the Booth commands piggyback on the acknowledgments flowing in the opposite direction. This design shares with CFPP the idea of data and instructions flowing in opposite directions.…”

Section: Previous Workmentioning

confidence: 99%

“…To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICCAD '06, November [5][6][7][8][9]2006, San Jose, CA Copyright 2006 ACM 1-59593-389-1/06/0011 ... $5.00. et al [7], which introduced a processor architecture where instructions and data flow in opposite directions.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Counterflow Pipelining: Architectural Support for Preemption in Asynchronous Systems using Anti-Tokens

Ampalam

Singh

2006

2006 IEEE/ACM International Conference on Computer Aided Design

View full text Add to dashboard Cite

This paper introduces a novel approach to efficiently implement several useful architectural features in asynchronous applicationspecific ICs (ASICs). These features include speculation, preemption, and eager evaluation, which have so far only been available on CPUs, and have not been adequately investigated for custom ASICs.For the efficient implementation of the new architectural features, a radically new approach inspired by Sproull's counterfiow pipelines [7] is proposed. The key idea is to allow special commands, called anti-tokens, to be propagated in a direction opposite to that of data, allowing certain computations to be killed before they are completed, if their results are no longer required.The net impact is a significant improvement in the throughput of a certain class of systems e.g., those involving conditional computation where a bottleneck pipeline stage can often be preempted if its result is determined to be no longer needed. Experimental results indicate that our approach can improve the system throughput by a factor of up to 2.2x, along with an energy savings of up to 27%.

show abstract

Section: Pipeline Operationmentioning

confidence: 98%

Section: If-then-else Controllermentioning

confidence: 99%

Section: Previous Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Counterflow Pipelining: Architectural Support for Preemption in Asynchronous Systems using Anti-Tokens

Ampalam

Singh

2006

2006 IEEE/ACM International Conference on Computer Aided Design

View full text Add to dashboard Cite

show abstract

“…Traditionally, high performance has been the key driving factor in multiplier design. However, as power consumption has become a major design constraint lately, a number of low-power multiplier designs have been proposed both in synchronous [5,11] and asynchronous domains [9,12,13].…”

Section: Multiplier Design Trade-offsmentioning

confidence: 99%

An Asynchronous Floating-Point Multiplier

Sheikh

Manohar

2012

2012 IEEE 18th International Symposium on Asynchronous Circuits and Systems

View full text Add to dashboard Cite

Abstract-We present the details of our energy-efficient asynchronous floating-point multiplier (FPM). We discuss design trade-offs of various multiplier implementations. A higher radix array multiplier design with operand-dependent carrypropagation adder and low handshake overhead pipeline design is presented, which yields significant energy savings while preserving the average throughput. Our FPM also includes a hardware implementation of denormal and underflow cases. When compared against a custom synchronous FPM design, our asynchronous FPM consumes 3X less energy per operation while operating at 2.3X higher throughput. To our knowledge, this is the first detailed design of a high-performance asynchronous IEEE-754 compliant double-precision floating-point multiplier.Keywords-Floating point arithmetic; asynchronous logic circuits; very-large-scale integration; pipeline processing I. INTRODUCTION Energy-efficient floating-point computation is important for a wide range of applications. Traditionally, VLSI designers primarily relied on CMOS technology and voltage scaling to reduce power consumption [4]. With the transistor threshold voltage fixed [10], V DD has been scaling very slowly if at all, which means all performance improvements come at an increased energy consumption. Furthermore, process variations in deep sub-micron range have made devices far less robust, which is increasingly making it difficult for synchronous designers to overcome the problems associated with clock skew rates and clock distribution [6]. The findings of a recent in-depth study, to explore and devise ways to further scale supercomputer petaFLOP performance by 1000X, indicate the inadequacy of current design practices and technologies to achieve the desired throughput within a sustainable power budget [1]. This underscores a pressing need for alternate design practices, to reduce energy consumption for floating-point computations while preserving robust behavior in advanced technology nodes.At the other end of the spectrum, embedded systems that have traditionally been considered low performance are demanding higher and higher throughput for the same power budget to support compute-intensive floating-point applications that improve the user experience. Since these applications have to be deployed on portable devices with limited batterylife, it is critical that we develop energy-efficient floatingpoint hardware for these embedded systems, not simply high performance floating-point hardware.The IEEE 754 standard [19] for binary floating-point arithmetic provides a precise specification of floating-point number formats, computation operations, and exceptions and their handling. The combination of a vast range of inputs, special cases, and rounding modes makes the hardware implementation of fully IEEE 754 standard compliant floating-point arithmetic a very challenging task. Ignoring certain aspects of the standard can lead to unexpected consequences in the context of numerical algorithms. Hence, most floating-point hardware is IEEE-comp...

show abstract

Precise time mode multiplier using digital primitives and passive components

D'Angelo

Sonkusale

2016

2016 IEEE International Symposium on Circuits and Systems (ISCAS)

View full text Add to dashboard Cite

A Scalable Counterflow-Pipelined Asynchronous Radix-4 Booth Multiplier

Cited by 7 publications

References 17 publications

Counterflow Pipelining: Architectural Support for Preemption in Asynchronous Systems using Anti-Tokens

Counterflow Pipelining: Architectural Support for Preemption in Asynchronous Systems using Anti-Tokens

An Asynchronous Floating-Point Multiplier

Precise time mode multiplier using digital primitives and passive components

Contact Info

Product

Resources

About