2021
DOI: 10.3390/jimaging7100210
|View full text |Cite
|
Sign up to set email alerts
|

Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions

Abstract: Nowadays, computer vision relies heavily on convolutional neural networks (CNNs) to perform complex and accurate tasks. Among them, super-resolution CNNs represent a meaningful example, due to the presence of both convolutional (CONV) and transposed convolutional (TCONV) layers. While the former exploit multiply-and-accumulate (MAC) operations to extract features of interest from incoming feature maps (fmaps), the latter perform MACs to tune the spatial resolution of the received fmaps properly. The ever-growi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2
2

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 24 publications
0
4
0
Order By: Relevance
“…The proposed accelerator reaches an output throughput of ≈753 Mega pixels per second, corresponding to the generation of UHD images at a frame rate higher than 95 fps, which perfectly meets the target latency requirements [16]. When compared to prior works [20] and [23], which accelerate the final TCONV layer through innovative filter and pixel decomposition schemes that do not have impact on the computational complexity reduction, the proposed architecture sustains an output throughput 1.52 and 15.66 times higher, respectively, even processing larger input frames. At the same time, in comparison with the architecture proposed in [24], the amount of LUTs, FFs and BRAMs is reduced by 73.8%, 34.9% and 51.5%, respectively.…”
Section: B Hardware Evaluationmentioning
confidence: 76%
See 3 more Smart Citations
“…The proposed accelerator reaches an output throughput of ≈753 Mega pixels per second, corresponding to the generation of UHD images at a frame rate higher than 95 fps, which perfectly meets the target latency requirements [16]. When compared to prior works [20] and [23], which accelerate the final TCONV layer through innovative filter and pixel decomposition schemes that do not have impact on the computational complexity reduction, the proposed architecture sustains an output throughput 1.52 and 15.66 times higher, respectively, even processing larger input frames. At the same time, in comparison with the architecture proposed in [24], the amount of LUTs, FFs and BRAMs is reduced by 73.8%, 34.9% and 51.5%, respectively.…”
Section: B Hardware Evaluationmentioning
confidence: 76%
“…Furthermore, they introduce many unnecessary multiplications by zeros and require more complex strategies to access data memory, making video streaming unfeasible in some cases. Most of prior hardware designs, targeting either Field Programmable Gate Array (FPGA) [20], [23]- [24] or Application Specific Integrated Circuit (ASIC) [21]- [22], [25] technologies, rely on transforming the TCONV into multiple sub-convolutions. While these methods perform accurate TCONVs by re-arranging either the filter kernels [20], [21], [24] or the incoming pixels [23], they are not effective in reducing the number of involved MAC operations and introduce latency overheads and/or additional on-chip memory requirements.…”
Section: Hardware-oriented Acceleration Methodsmentioning
confidence: 99%
See 2 more Smart Citations