2019
DOI: 10.3390/jlpea10010001
|View full text |Cite
|
Sign up to set email alerts
|

Energy-Efficient Architecture for CNNs Inference on Heterogeneous FPGA

Abstract: Due to the huge requirements in terms of both computational and memory capabilities, implementing energy-efficient and high-performance Convolutional Neural Networks (CNNs) by exploiting embedded systems still represents a major challenge for hardware designers. This paper presents the complete design of a heterogeneous embedded system realized by using a Field-Programmable Gate Array Systems-on-Chip (SoC) and suitable to accelerate the inference of Convolutional Neural Networks in power-constrained environmen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(9 citation statements)
references
References 29 publications
0
9
0
Order By: Relevance
“…The table also includes the comparison with some of the novel FPGA accelerators. The motivation of the authors in [ 19 , 20 , 21 , 38 ], was optimizing a design to obtain higher GOPs, maximize the performance, or reducing power consumption. On the contrary, our focus is to increase the frequency and keep inference engines idle to save dynamic power consumption.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The table also includes the comparison with some of the novel FPGA accelerators. The motivation of the authors in [ 19 , 20 , 21 , 38 ], was optimizing a design to obtain higher GOPs, maximize the performance, or reducing power consumption. On the contrary, our focus is to increase the frequency and keep inference engines idle to save dynamic power consumption.…”
Section: Resultsmentioning
confidence: 99%
“…For instance, Spagnolo et al proposed an energy-efficient hardware accelerator for CNN using heterogeneous FPGA. Their system on chip (SoC) architecture is structured to support the efficient Single-Instruction-Multiple-Data (SIMD) paradigm for computing both convolutional and fully connected layers [ 19 ]. Since all computations are applied on the FPGA and controlled by an embedded processor, they obtained better performance than the GPU.…”
Section: Related Workmentioning
confidence: 99%
“…Therefore, we can argue that our architecture outperforms the work in Reference [ 46 ]. Next, we compare our work to an FPGA-based SIMD CNN accelerator design [ 47 ]. The results are shown in Table 6 which indicated performance improvements in our design.…”
Section: Resultsmentioning
confidence: 99%
“…The authors in Ref. [105] designed a system architecture based on heterogeneous FPGA with DSPs, supporting SIMD paradigm to efficiently process parallel computation for CNNs layers (Convolution and fully connected layers). The proposed architecture required lower computational time (47%) over non‐SIMD computational implementation.…”
Section: Hw Acceleration Approachesmentioning
confidence: 99%