In the context of this paper, a shape tracking application based on landmark alignment is transformed to support implementation in Field Programmable Gate Arrays (FPGAs). Towards this direction, several challenges are posed since a) computational intensive operations have to be replaced by faster ones, b) specific loops have to be modified (e.g., unrolled) to support the implementation of operations in parallel with different hardware resources, c) multiple pretrained models have to be compared in terms of speed and accuracy, d) partial loading of the pre-trained models has to be examined in order to fit their parameters in the Block Random Access Memories (BRAMs) of the FPGA for faster access, and e) alternative arithmetic representations have to be evaluated for higher speed and reduced resources.
The C++ Deformable Shape Tracking (DEST) implementation of face alignment that is based on an Ensemble of RegressionTrees is employed in our approach. The DEST application uses Eigen library routines to implement algebraic operations which are proved to be quite slow. The achievements of this paper, concern the replacement of appropriate Eigen calls in time critical paths with fast C code that can be directly used to synthesize reconfigurable hardware implementations. The elimination of the computational intensive Eigen calls has already improved the speed of the face alignment application by more than 240 times. In this paper we examine how the modified source code structure of the DEST application can be used to address the challenges described above.