Toolflows for Mapping Convolutional Neural Networks on FPGAs

Venieris, Stylianos I.; Kouris, Alexandros; Bouganis, Christos-Savvas

doi:10.1145/3186332

Cited by 153 publications

(74 citation statements)

References 87 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Moreover, [53] and [54] presented automated frameworks specifically tailored for FPGA-based binarised and spiking neural networks respectively, while [55] proposed a library for the mapping of ConvNets on diverse embedded platforms, together with a comparative study of their design spaces. Finally, [12] provides a detailed survey of ConvNet-to-FPGA toolflows.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

fpgaConvNet: Mapping Regular and Irregular Convolutional Neural Networks on FPGAs

Venieris

Bouganis

2019

IEEE Trans. Neural Netw. Learning Syst.

Self Cite

121

View full text Add to dashboard Cite

Since neural networks renaissance, convolutional neural networks (ConvNets) have demonstrated a state-of-the-art performance in several emerging artificial intelligence tasks. The deployment of ConvNets in real-life applications requires power-efficient designs that meet the application-level performance needs. In this context, field-programmable gate arrays (FPGAs) can provide a potential platform that can be tailored to application-specific requirements. However, with the complexity of ConvNet models increasing rapidly, the ConvNet-to-FPGA design space becomes prohibitively large. This paper presents fpgaConvNet, an end-to-end framework for the optimized mapping of ConvNets on FPGAs. The proposed framework comprises an automated design methodology based on the synchronous dataflow (SDF) paradigm and defines a set of SDF transformations in order to efficiently navigate the architectural design space. By proposing a systematic multiobjective optimization formulation, the presented framework is able to generate hardware designs that are cooptimized for the ConvNet workload, the target device, and the application's performance metric of interest. Quantitative evaluation shows that the proposed methodology yields hardware designs that improve the performance by up to 6.65$x$ over highly optimized graphics processing unit designs for the same power constraints and achieve up to 2.94$x$ higher performance density compared with the state-of-the-art FPGA-based ConvNet architectures.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Nevertheless, several issues increase the complexity of Con-vNet system development on FPGAs [12]. With FPGAs' size and resource specifications advancing at a fast pace and with ConvNets becoming more complex, the possible mappings of a ConvNet on an FPGA lie on a large multidimensional design space that cannot be explored manually.…”

Section: Introductionmentioning

confidence: 99%

fpgaConvNet: Mapping Regular and Irregular Convolutional Neural Networks on FPGAs

Venieris

Bouganis

2019

IEEE Trans. Neural Netw. Learning Syst.

Self Cite

121

View full text Add to dashboard Cite

show abstract

“…while the number of elements included in the unrolled sliding window for all channels of the input feature map volume is P = K H K W N IN (7). In the case of FC Layers, batching is employed to form a similar R F C ×P matrix, each row of which contains the input feature vector of size P for a different input sample, and hence: R FC = BatchSize (8).…”

Section: Architecturementioning

confidence: 99%

“…In this context, FPGAs constitute a promising platform for CNN inference due to their customisability which enables the use of optimised low-precision arithmetic units to achieve performance gains [7]. Existing FPGA-based accelerators have produced hardware designs that span from uniform 16-bit precision [8] [9] with minimal effect on accuracy, down to very high-performance binarised networks [10], but at a significant accuracy loss.…”

Section: Introductionmentioning

confidence: 99%

Cascade^CNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks

Kouris

Venieris

Bouganis

2018

2018 28th International Conference on Field Programmable Logic and Applications (FPL)

Self Cite

View full text Add to dashboard Cite

This work presents CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, aiming to perform high-throughput inference. A two-stage architecture tailored for any given CNN-FPGA pair is generated, consisting of a low-and high-precision unit in a cascade. A confidence evaluation unit is employed to identify misclassified cases from the excessively low-precision unit and forward them to the high-precision unit for re-processing. Experiments demonstrate that the proposed toolflow can achieve a performance boost up to 55% for VGG-16 and 48% for AlexNet over the baseline design for the same resource budget and accuracy, without the need of retraining the model or accessing the training data.

show abstract

“…The standard 2D convolution layers, from which the CNN is constructed, occupy over 90% of the overall processing time [17] and their latency T i on the accelerator needs to be estimated to determine the best hardware configuration through DSE. For 2D convolution, there are several categories of parallelism including filter parallelism (P F ) or channel parallelism (P C) in addition to spatial and kernel parallelisms.…”

Section: Introductionmentioning

confidence: 99%

Improving Performance Estimation for FPGA-Based Accelerators for Convolutional Neural Networks

Ferianc

Fan

Chu

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Field-programmable gate array (FPGA) based accelerators are being widely used for acceleration of convolutional neural networks (CNNs) due to their potential in improving the performance and reconfigurability for specific application instances. To determine the optimal configuration of an FPGA-based accelerator, it is necessary to explore the design space and an accurate performance prediction plays an important role during the exploration. This work introduces a novel method for fast and accurate estimation of latency based on a Gaussian process parametrised by an analytic approximation and coupled with runtime data. The experiments conducted on three different CNNs on an FPGAbased accelerator on Intel Arria 10 GX 1150 demonstrated a 30.7% improvement in accuracy with respect to the mean absolute error in comparison to a standard analytic method in leave-one-out cross-validation.

show abstract

Toolflows for Mapping Convolutional Neural Networks on FPGAs

Cited by 153 publications

References 87 publications

fpgaConvNet: Mapping Regular and Irregular Convolutional Neural Networks on FPGAs

fpgaConvNet: Mapping Regular and Irregular Convolutional Neural Networks on FPGAs

Cascade^CNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks

Improving Performance Estimation for FPGA-Based Accelerators for Convolutional Neural Networks

Contact Info

Product

Resources

About