Panagiotis Christakos scite author profile

et al. 2021

Shape tracking is based on landmark detection and alignment. Open-source code and pre-trained models are available for an implementation that is based on an ensemble of regression trees. The C++ Deformable Shape Tracking (DEST) implementation of face alignment that is using Eigen template library for algebraic operations is employed in this work. The overhead of the C++ Eigen library calls is measured and selected computational intensive operations are ported from Eigen implementation to custom C code achieving a remarkable acceleration in the shape tracking application. An important achievement of this work is the fact that the restructured code can be directly implemented with reconfigurable hardware for further speed improvement. Driver drowsiness and distraction detection applications are exploiting shape tracking by measuring landmark distances in order to detect eye blinking, yawning, etc. Fast video processing and accuracy is mandatory in these safety critical applications. The modified software implementation of the original DEST face alignment method presented in this paper, is almost 250 times faster due to the custom implementation of computational intensive vector/matrix operations and rotations. Eigen library is still used in non-time critical parts of the code for compact description and higher readability. Flattening of nested routines and inline implementation is also used to eliminate excessive argument copies and data type checking and conversions.

Challenges Towards Hardware Acceleration of the Deformable Shape Tracking Application

Petrellis

et al. 2021

In the context of this paper, a shape tracking application based on landmark alignment is transformed to support implementation in Field Programmable Gate Arrays (FPGAs). Towards this direction, several challenges are posed since a) computational intensive operations have to be replaced by faster ones, b) specific loops have to be modified (e.g., unrolled) to support the implementation of operations in parallel with different hardware resources, c) multiple pretrained models have to be compared in terms of speed and accuracy, d) partial loading of the pre-trained models has to be examined in order to fit their parameters in the Block Random Access Memories (BRAMs) of the FPGA for faster access, and e) alternative arithmetic representations have to be evaluated for higher speed and reduced resources. The C++ Deformable Shape Tracking (DEST) implementation of face alignment that is based on an Ensemble of RegressionTrees is employed in our approach. The DEST application uses Eigen library routines to implement algebraic operations which are proved to be quite slow. The achievements of this paper, concern the replacement of appropriate Eigen calls in time critical paths with fast C code that can be directly used to synthesize reconfigurable hardware implementations. The elimination of the computational intensive Eigen calls has already improved the speed of the face alignment application by more than 240 times. In this paper we examine how the modified source code structure of the DEST application can be used to address the challenges described above.

A High Performance and Robust FPGA Implementation of a Driver State Monitoring Application

Christakos¹,

Petrellis²,

Mousouliotis³

et al. 2023

Preprint

A high performance Driver State Monitoring (DSM) application for the detection of driver drowsiness is presented in this paper. It relies on the usage of an Ensemble of Regression Trees (ERTs) machine learning method that aligns 68 facial landmarks. Special focus is given on the acceleration of the frame processing using reconfigurable hardware. Reducing the frame processing latency saves time that can be used to apply frame-to-frame facial shape coherency rules. False face detection and false shape estimations can be ignored for higher robustness and accuracy in the operation of the DSM application without reducing the frame processing rate that can reach 65 frames per second. The sensitivity and precision in yawning recognition can reach 93% and 97%, respectively. The implementation of the employed DSM algorithm in reconfigurable hardware is challenging since the kernel arguments require large data transfers and the degree of data reuse in the computational kernel is low. Due to this, unconventional hardware acceleration techniques have been employed that can also be useful for the acceleration of several other applications.

Exploiting Vitis Framework for Accelerating Sobel Algorithm

Mousouliotis

et al. 2021

Edge detection is one of the most common operations needed in the image processing domain. In this work, alternative implementations of the Sobel algorithm are tested on a ZCU102 Xilinx embedded platform, demonstrating how different optimization techniques can be conveniently configured in Xilinx Vitis environment. We exploit (a) Xilinx Runtime library (XRT) that allows the reprogramming of the reconfigurable logic at real time and (b) the various high-level attributes offered by the OpenCL API for efficient resource allocation in the state-of-the-art Xilinx Ultrascale Multi-Processor System-on-Chips (MPSoC). Specifically, different implementations of the Sobel algorithm (varying the data transfer models and data packing modes) are demonstrated and analyzed. Our experimental results shows that starting from a CPU implementation with 656 ms latency, the frame processing time is reduced to a range between 17 ms and 22 ms depending on the allocated resources, leading to a solution that is up to 38 times faster.

High Speed Implementation of the Deformable Shape Tracking Face Alignment Algorithm

Petrellis

et al. 2021

The 2D facial landmark alignment method, implemented in C++ in the open source libraries DLIB and Deformable Shape Tracking (DEST), is used in several applications such as driver drowsiness detection. The most challenging of these applications require fast video frame processing. Therefore, the alignment of the facial landmarks in a single video frame has to be performed with the minimum possible latency without precision loss. In this paper, the DEST implementation of the face alignment method that is based on regression trees is heavily restructured to reduce latency. The resulting face alignment predictor is implemented in C. The elimination of multiple nested routine calls, excessive argument copying, type conversions and integrity checks lead to a software implementation that is 240 times faster than the one provided in the DEST library. Moreover, the structure of the new face alignment predictor is appropriate for hardware implementation on a Field Programmable Gate Array (FPGA) for further acceleration 1 .CPSoSaware: Cross-layer cognitive optimization tools & methods for the lifecycle support of dependable CPSoS. project.