2018 28th International Conference on Field Programmable Logic and Applications (FPL) 2018
DOI: 10.1109/fpl.2018.00014
|View full text |Cite
|
Sign up to set email alerts
|

Embracing Diversity: Enhanced DSP Blocks for Low-Precision Deep Learning on FPGAs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
31
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 57 publications
(32 citation statements)
references
References 12 publications
1
31
0
Order By: Relevance
“…With the proposed column-wise MVM, one column of the weights matrix naturally shares the same element of the input vector, which helps us to pack four 8bit or ten 2-bit multiplications into one DSP block on Intel FPGAs [47] to reduce the hardware resources. Moreover, this would not be a restriction (and will come at lower cost) if we use a novel DSP similar to what was proposed in [48] and will be adopted in the next generation Agilex devices [49].…”
Section: Low Precision Multiplications With Dsp Block Sharingmentioning
confidence: 99%
“…With the proposed column-wise MVM, one column of the weights matrix naturally shares the same element of the input vector, which helps us to pack four 8bit or ten 2-bit multiplications into one DSP block on Intel FPGAs [47] to reduce the hardware resources. Moreover, this would not be a restriction (and will come at lower cost) if we use a novel DSP similar to what was proposed in [48] and will be adopted in the next generation Agilex devices [49].…”
Section: Low Precision Multiplications With Dsp Block Sharingmentioning
confidence: 99%
“…When using lower precisions on FPGAs, many authors have implemented multipliers using LUTs instead of DSPs to achieve higher resource e ciency. Boutros et al [15] proposed the enhancement of DSP blocks to support low-precision MACs with some 12% area overhead and no drop in achievable frequency. One such enhanced DSP can perform one 27 × 27 or two 18 × 19, four 9 × 9 or eight 4 × 4 parallel MAC(s).…”
Section: Fixed-point Representationmentioning
confidence: 99%
“…3) Many overflow cases after adding error-reduction term. SIMD Accurate/Approximate Multiplier: Authors in [6,19] have shown performance/energy improvements in FPGA-based DNNs by modifying ASIC-based DSP block to perform double approximate multiplications with a common operand. Recently, [23] has proposed an approximate SIMD design (using 8x8 truncated multipliers) for ASIC platforms.…”
Section: Related Workmentioning
confidence: 99%
“…Nevertheless, in spite of their advantages, hosting off-the-shelf fixed-precision DSP blocks falls short on fulfilling design requirements in a variety of domains. Beside being unable to perform division, some shortcomings that testify on their inefficiency are: 1) their fixed locations in FPGAs impose routing complexity and often results in degraded performance of some circuits [17] (and Viterbi decoder, Reed-Solomon and JPEG encoders discussed in [30]); 2) unable to be efficiently-utilized for multiplication precision below 18-bit [6,19] (the comparable performance and better energy-efficiency of small-scale LUT-based multipliers over DSP blocks further encourages their deployment in e.g. neural networks) 3) their limited ratio versus LUTs (<0.001) in multiplication-intensive applications or concurrently executing programs.…”
Section: Introductionmentioning
confidence: 99%