Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions

Sestito, Cristian; Spagnolo, Fanny; Perri, Stefania

doi:10.3390/jimaging7100210

Cited by 8 publications

(4 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The proposed accelerator reaches an output throughput of ≈753 Mega pixels per second, corresponding to the generation of UHD images at a frame rate higher than 95 fps, which perfectly meets the target latency requirements [16]. When compared to prior works [20] and [23], which accelerate the final TCONV layer through innovative filter and pixel decomposition schemes that do not have impact on the computational complexity reduction, the proposed architecture sustains an output throughput 1.52 and 15.66 times higher, respectively, even processing larger input frames. At the same time, in comparison with the architecture proposed in [24], the amount of LUTs, FFs and BRAMs is reduced by 73.8%, 34.9% and 51.5%, respectively.…”

Section: B Hardware Evaluationmentioning

confidence: 76%

“…Furthermore, they introduce many unnecessary multiplications by zeros and require more complex strategies to access data memory, making video streaming unfeasible in some cases. Most of prior hardware designs, targeting either Field Programmable Gate Array (FPGA) [20], [23]- [24] or Application Specific Integrated Circuit (ASIC) [21]- [22], [25] technologies, rely on transforming the TCONV into multiple sub-convolutions. While these methods perform accurate TCONVs by re-arranging either the filter kernels [20], [21], [24] or the incoming pixels [23], they are not effective in reducing the number of involved MAC operations and introduce latency overheads and/or additional on-chip memory requirements.…”

Section: Hardware-oriented Acceleration Methodsmentioning

confidence: 99%

“…Most of prior hardware designs, targeting either Field Programmable Gate Array (FPGA) [20], [23]- [24] or Application Specific Integrated Circuit (ASIC) [21]- [22], [25] technologies, rely on transforming the TCONV into multiple sub-convolutions. While these methods perform accurate TCONVs by re-arranging either the filter kernels [20], [21], [24] or the incoming pixels [23], they are not effective in reducing the number of involved MAC operations and introduce latency overheads and/or additional on-chip memory requirements. Differently, the ASIC design presented in [22] is based on slicing the low-resolution input image into multiple tiles that are processed by a decision network to establish the number of layers each tile has to pass through, according to its content.…”

Section: Hardware-oriented Acceleration Methodsmentioning

confidence: 99%

“…the number of MAC operations) and the amount of memory needed to store the model parameters (Params) required by state-of-the-art competitors [10], [20]- [24]. Considering that [20] implements the model FSRCNN(25,5,1) and [21], [23]- [24] exploit the configuration FSRCNN(56,12,4), it can be observed that the proposed approach allows reducing computational and memory requirements by at least 5.11 and 1.09 times, respectively. In the following discussion, the PSNR/SSIM comparison is made considering the average of the percent deviations with reference to the available data.…”

Section: A Image Quality Evaluationmentioning

confidence: 99%

See 3 more Smart Citations