“…The proposed accelerator reaches an output throughput of ≈753 Mega pixels per second, corresponding to the generation of UHD images at a frame rate higher than 95 fps, which perfectly meets the target latency requirements [16]. When compared to prior works [20] and [23], which accelerate the final TCONV layer through innovative filter and pixel decomposition schemes that do not have impact on the computational complexity reduction, the proposed architecture sustains an output throughput 1.52 and 15.66 times higher, respectively, even processing larger input frames. At the same time, in comparison with the architecture proposed in [24], the amount of LUTs, FFs and BRAMs is reduced by 73.8%, 34.9% and 51.5%, respectively.…”