An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor

Parandeh-Afshar, Hadi; Brisk, Philip; Ienne, Paolo

doi:10.1145/1575774.1575778

Cited by 3 publications

(5 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To the best of our knowledge, the (4:2) compressor (see Figure 3a) is the only FPGA-friendly [11]) design that targets Xilinx FPGAs, while no efficient compressors exist for Intel devices. Parandeh-Afshar et al [19] addressed this issue by proposing configurable carry-chains as modifications to the Intel Adaptive Logic Module (ALM), supporting 6:2 and/or 7:2 compressors.…”

Section: Compressorsmentioning

confidence: 99%

“…Stage 0 in Figure 1b is a compressor tree that produces sum and carry bits as inputs into Stage 1, which are then evaluated by an RCA to produce the final result (see HA→FA→HA row in Figure 4b, which is the RCA stage). Compressor trees can be built using GPCs, compressors, or both, and efficient compressor tree design is an active area of research with large bodies of existing literature [11,12,[18][19][20][21][22]29].…”

Section: Adder and Compressor Treesmentioning

confidence: 99%

“…The reader is encouraged to read [19] for a more detailed background on parallel counters, GPCs, compressors, and different methods of compressor tree implementations.…”

Section: Adder and Compressor Treesmentioning

confidence: 99%

“…[11,12]. In [19], the authors proposed architectural changes to the Intel ALM carry-chains such that large compressors like (6:2) and (7:2) can be efficiently mapped to single ALMs. Although their proposed compressor is very efficient, for modern applications such as BNN popcounting [13], these compressors would be significantly underutilized.…”

Section: Related Workmentioning

confidence: 99%

“…[12] demonstrated software techniques that automate the design of optimal compressor tree implementations for FPGAs. However, modern FPGA lookup table (LUT) based architectures are not particularly efficient for implementation of compressor trees [19].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Luxor

Rasoulinezhad

Siddhartha

Zhou

et al. 2020

Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

View full text Add to dashboard Cite

We propose two tiers of modifications to FPGA logic cell architecture to deliver a variety of performance and utilization benefits with only minor area overheads. In the first tier, we augment existing commercial logic cell datapaths with a 6-input XOR gate in order to improve the expressiveness of each element, while maintaining backward compatibility. This new architecture is vendor-agnostic, and we refer to it as LUXOR. We also consider a secondary tier of vendor-specific modifications to both Xilinx and Intel FPGAs, which we refer to as X-LUXOR+ and I-LUXOR+ respectively. We demonstrate that compressor tree synthesis using generalized parallel counters (GPCs) is further improved with the proposed modifications. Using both the Intel adaptive logic module and the Xilinx slice at the 65nm technology node for a comparative study, it is shown that the silicon area overhead is less than 0.5% for LUXOR and 5-6% for LUXOR+, while the delay increments are 1-6% and 3-9% respectively. We demonstrate that LUXOR can deliver an average reduction of 13-19% in logic utilization on micro-benchmarks from a variety of domains.BNN benchmarks benefit the most with an average reduction of 37-47% in logic utilization, which is due to the highly-efficient mapping of the XnorPopcount operation on our proposed LUXOR+ logic cells.

show abstract

Section: Compressorsmentioning

confidence: 99%

Section: Adder and Compressor Treesmentioning

confidence: 99%

“…The reader is encouraged to read [19] for a more detailed background on parallel counters, GPCs, compressors, and different methods of compressor tree implementations.…”

Section: Adder and Compressor Treesmentioning

confidence: 99%