2020
DOI: 10.1088/2632-2153/aba042
|View full text |Cite
|
Sign up to set email alerts
|

Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml

Abstract: We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parameters to binary or ternary. We discuss the trade-off between model accuracy and resource consumption. In addition, w… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
48
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6

Relationship

4
2

Authors

Journals

citations
Cited by 71 publications
(49 citation statements)
references
References 15 publications
1
48
0
Order By: Relevance
“…This allows compression of the model size, but to some extent sacrifices accuracy. Recently, support for binary and ternary precision DNNs [43] trained quantization-aware has been included in the library. This greatly reduces the model size, but requiring such an extremely lowprecision of each parameter type sacrifices accuracy and generalization.…”
Section: Motivationmentioning
confidence: 99%
See 1 more Smart Citation
“…This allows compression of the model size, but to some extent sacrifices accuracy. Recently, support for binary and ternary precision DNNs [43] trained quantization-aware has been included in the library. This greatly reduces the model size, but requiring such an extremely lowprecision of each parameter type sacrifices accuracy and generalization.…”
Section: Motivationmentioning
confidence: 99%
“…For example, when using a quantizer with a given alpha parameter (i.e., scaled weights), hls4ml inserts an operation to re-scale the layer output. For binary and ternary weights and activations, the same strategies as in [43] are used. With binary layers, the arithmetical value of -1 is encoded as 0, allowing the product to be expressed as an XNOR operation.…”
Section: Ultra Low-latency Quantized Model On Fpga Hardwarementioning
confidence: 99%
“…Development of ML models deployable to FPGA-based L1T systems is helped by tools for automatic network-to-circuit conversion such as hls4ml. Using hls4ml, several solutions for HEP-specific tasks (e.g., jet tagging) have been provided (Duarte et al, 2018;Coelho et al, 2020;Di Guglielmo et al, 2020;Summers et al, 2020), exploiting models with simpler architectures than what is shown here. This tool has been applied extensively for tasks in the HL-LHC upgrade of the CMS L1T system, including an autoencoder for anomaly detection, and DNNs for muon energy regression and identification, tau lepton identification, and vector boson fusion event classification (CMS Collaboration, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…On the other hand, the quantized model uses more LUTs, mainly for the multiplications in the GARNET encoders and decoders, as discussed in Section 4. However, it is known that the expected LUT usage tend to be overestimated in Vivado HLS, while the expected DSP usage tends to be accurate (Duarte et al, 2018;Di Guglielmo et al, 2020). The DSP usage of 3.1 × 10 3 for the continuous model is well within the limit of the target device, but is more than what is available on a single die slice (2.8 × 10 3 ) (Xilinx, 2020).…”
Section: Model Synthesis and Performancementioning
confidence: 99%
See 1 more Smart Citation