RoCoCo: Row and Column Compression for high-performance multiplication on FPGAs

Ugurdag, Fatih; Keskin, Okan; Tunc, Cihan; Temizkan, Fatih; Fici, Gurbey; Dedeoglu, Soner

doi:10.1109/ewdts.2011.6116419

Cited by 5 publications

(5 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…8, and the related figures to follow. Now, consider f (13), which is given by (5). Its select vector is 0001 because only dLUT0's output is summed with TIV new .…”

Section: Fr-dlut Microarchitecturementioning

confidence: 99%

“…In our work, the summation is done by SignedSummation shown in Fig. 11, which is based on the CCT generator proposed in [13] (called RoCoCo). RoCoCo handles only the summation of unsigned numbers, which is why the following conversion had to be done.…”

Section: Fr-dlut Microarchitecturementioning

confidence: 99%

“…Equation ( 12) does a little optimization by adding the rightmost two numbers in (10) offline and hence combining them into a single number (1011111111110ēxxx). Equation (13), on the other hand, eliminates bit position 16 of that combined number as the sum is guaranteed to be 16 bits (bit positions 0 through 15).…”

Section: Fr-dlut Microarchitecturementioning

confidence: 99%

“…11 expose bits of signals and recombine them. The bit-level manipulation before the CCT is a pictorial representation of equations ( 9) through (13). Note that the number of zeros to the left of ē does not have to be one as in (13) in the general case.…”

Section: Fr-dlut Microarchitecturementioning

confidence: 99%

“…The bit-level manipulation before the CCT is a pictorial representation of equations ( 9) through (13). Note that the number of zeros to the left of ē does not have to be one as in (13) in the general case. It is lg∆ − 1 zeros (assuming ∆ is a power of 2) as in Fig.…”

Section: Fr-dlut Microarchitecturementioning

confidence: 99%

See 4 more Smart Citations

Semi- and Fully-Random Access LUTs for Smooth Functions

Gener

Aydin

Gören

et al. 2020

IFIP Advances in Information and Communication Technology

View full text Add to dashboard Cite

Look-Up Table (LUT) implementation of complicated functions often offers lower latency compared to algebraic implementations at the expense of significant area penalty. If the function is smooth, Multi-Partite table method (MP) can circumvent the area problem by breaking up the implementation into multiple smaller LUTs. However, even some of these smaller LUTs may be big in high accuracy MP implementations. Lossless LUT compression can be applied to these LUTs to further improve area and even timing in some cases. The state-of-the-art in the literature decomposes the Table of Initial Values (TIV) of MP into a table of pivots and tables of differences from the pivots. Our technique instead places differences of consecutive elements in the difference tables and result in a smaller range of differences that fit in fewer bits. Constraining the difference of consecutive input values, hence semi-random access, allows us to further optimize designs. We also propose variants of our techniques with variable length coding. We implemented Verilog generators of MP for sine and exponential using conventional LUT as well as different versions of the state-of-the-art and our technique. We synthesized the generated designs on FPGA and found that our techniques produce up to 29% improvement in area, 11% improvement in timing, and 26% improvement in area-time product over the state-of-the-art.

show abstract

“…8, and the related figures to follow. Now, consider f (13), which is given by (5). Its select vector is 0001 because only dLUT0's output is summed with TIV new .…”

Section: Fr-dlut Microarchitecturementioning

confidence: 99%

Section: Fr-dlut Microarchitecturementioning

confidence: 99%

Section: Fr-dlut Microarchitecturementioning

confidence: 99%

Section: Fr-dlut Microarchitecturementioning

confidence: 99%

Section: Fr-dlut Microarchitecturementioning

confidence: 99%

See 3 more Smart Citations