In this paper, a hardware implementation in reconfigurable logic of a single-pass connected component labelling (CCL) and connected component analysis (CCA) module is presented. The main novelty of the design is the support of a video stream in 2 and 4 pixel per clock format (2 and 4 ppc) and real-time processing of 4K/UHD video stream (3840 x 2160 pixels) at 60 frames per second. We discuss several approaches to the issue and present in detail the selected ones. The proposed module was verified in an exemplary application – skin colour areas segmentation – on the ZCU 102 and ZCU 104 evaluation boards equipped with Xilinx Zynq UltraScale+ MPSoC devices.
Deploying Deep Neural Networks in low-power embedded devices for real time-constrained applications requires optimization of memory and computational complexity of the networks, usually by quantizing the weights. Most of the existing works employ linear quantization which causes considerable degradation in accuracy for weight bit widths lower than 8. Since the distribution of weights is usually non-uniform (with most weights concentrated around zero), other methods, such as logarithmic quantization, are more suitable as they are able to preserve the shape of the weight distribution more precise. Moreover, using base-2 logarithmic representation allows optimizing the multiplication by replacing it with bit shifting. In this paper, we explore non-linear quantization techniques for exploiting lower bit precision and identify favorable hardware implementation options. We developed the Quantization Aware Training (QAT) algorithm that allowed training of low bit width Power-of-Two (PoT) networks and achieved accuracies on par with state-of-the-art floating point models for different tasks. We explored PoT weight encoding techniques and investigated hardware designs of MAC units for three different quantization schemesuniform, PoT and Additive-PoT (APoT) -to show the increased efficiency when using the proposed approach. Eventually, the experiments showed that for low bit width precision, non-uniform quantization performs better than uniform, and at the same time, PoT quantization vastly reduces the computational complexity of the neural network.
In this paper the research on optimisation of visual object tracking
using a Siamese neural network for embedded vision systems is presented.
It was assumed that the solution shall operate in real-time, preferably
for a high resolution video stream, with the lowest possible energy
consumption. To meet these requirements, techniques such as the
reduction of computational precision and pruning were considered.
Brevitas, a tool dedicated for optimisation and quantisation of neural
networks for FPGA implementation, was used. A number of training
scenarios were tested with varying levels of optimisations-from integer
uniform quantisation with 16 bits to ternary and binary networks. Next,
the influence of these optimisations on the tracking performance was
evaluated. It was possible to reduce the size of the convolutional
filters up to 10 times in relation to the original network. The obtained
results indicate that using quantisation can significantly reduce the
memory and computational complexity of the proposed network while still
enabling precise tracking, thus allow to use it in embedded vision
systems. Moreover , quantisation of weights positively affects the
network training by decreasing overfitting.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.