Xiebing Wang scite author profile

Kiwus

et al. 2018

Lane detection is a cardinal functionality in stateof-the-art Advanced Driver Assistant Systems (ADAS). However, it is still not straightforward to fulfill the real-time performance demand of processing High Definition (HD) images with high robustness and scalability. To address this problem, we propose an improved lane detection algorithm based on topview image transformation and two-stage RANdom SAmple Consensus (RANSAC) model fitting. By virtue of off-line affine homography matrix adaption to bound an adaptive Region Of Interest (ROI) for subsequent on-line Warp Perspective Mapping (WPM) transformation, the algorithm can analyze arbitrary onroad videos and generate adaptive ROI without priori knowledge about camera parameter. To ensure the scalability, we present a comprehensive parallel design of the application in a heterogeneous system consisting of multi-core CPU, GPU and FPGA. We show in detail how the potentially parallel task loads are implemented and optimized so that they can be mapped to the most suitable processor so as to achieve optimal performance. Experimental results reveal that our improved algorithm can robustly process the video streams with a higher accuracy. Moreover, the heterogeneous executions are capable of processing HD 1920×1080 images with runtime performance of 81.6 fps and 47.9 fps, respectively, on an AMD FirePro W7100 GPU and a Terasic Arria 10 FPGA.

A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation

Huang

Knoll

et al. 2019

This paper proposes a hybrid framework for fast and accurate performance estimation of OpenCL kernels running on GPUs. The kernel execution flow is statically analyzed and thereupon the execution trace is generated via a loop-based bidirectional branch search. Then the trace is dynamically simulated to perform a dummy execution of the kernel to obtain the estimated time. The framework does not rely on profiling or measurement results which are used in conventional performance estimation techniques. Moreover, the lightweight trace-based simulation consumes much less time than a fine-grained GPU simulator. Our framework can accurately grasp the variation trend of the execution time in the design space and robustly predict the performance of the kernels across two generations of recent Nvidia GPU architectures. Experiments on four Commercial Off-The-Shelf (COTS) GPUs show that our framework can predict the runtime performance with average Mean Absolute Percentage Error (MAPE) of 17.04% and time consumption of a few seconds. We also demonstrate the practicability of our framework with a realworld application.

Performance Optimisation of Parallelized ADAS Applications in FPGA-GPU Heterogeneous Systems: A Case Study With Lane Detection

IEEE Trans. Intell. Veh.

Huang

Knoll

2019

The explosive growth of massive data captured by various sensors on modern vehicles has impelled the deployment of Commercial Off-The-Shelf (COTS) accelerators for the research and development of Advanced Driver Assistance Systems (ADAS). Although the advent of cross-platform programming framework such as Open Computing Language (OpenCL) facilitates the programmability of ADAS applications on heterogeneous devices, the performance portability is still vulnerable and subject to different hardware implementations by the heterogeneous manufacturers. With this issue in mind, in this article we propose a detailed procedure that helps guide the performance optimisation of parallelized ADAS applications in an FPGA-GPU combined heterogeneous system. Taking two different lane detection applications as case studies, we provide one intra-accelerator and two interaccelerator optimisation methods, as well as both FPGA-specific and application-oriented optimisation strategies, to boost the program runtime performance. Experiment results on a heterogeneous platform with COTS FPGA and GPU components reveal that the optimal designs generated from the procedure can improve the runtime performance of the two applications by an average of 109.21% and 83.48% over the native parallel implementations, respectively.

Exploring FPGA-GPU Heterogeneous Architecture for ADAS: Towards Performance and Energy

Liu²,

Huang³

et al. 2017

This paper investigates the feasibility of using heterogeneous computing for future advanced driver assistance systems (ADAS) applications. In particular, we take lane detection algorithm (LDA) as a test case. The algorithm is customized into FPGA-GPU heterogeneous implementations which can be executed in either workload constant or balanced scheme. Then the heterogeneous executions are evaluated in view of performance and energy consumption, and further compared with the single-accelerator run. Experiments show that the heterogeneous execution alleviates both the performance and energy bottlenecks caused when only using a single accelerator. Moreover, compared with the single FPGA execution, the workload balance scheme increases the performance by 236.9% and 42.9% on our two tested platforms respectively, while ensuring the low energy cost.

Peak Temperature Minimization for Hard Real-Time Systems Using DVS and DPM

Zhou

Cheng

Dell'antonio

et al. 2019

J CIRCUIT SYST COMP

With the increasing power densities, managing the on-chip temperature has become an important design challenge, especially for hard real-time systems. This paper addresses the problem of minimizing the peak temperature under hard real-time constraints using a combination of dynamic voltage scaling and dynamic power management. We derive a closed-form formulation for the peak temperature and provide a genetic-algorithm-based approach to solve the problem. Our approach is evaluated with both simulations and real measurements with an Intel i5 processor. The evaluation results demonstrate the effectiveness of the proposed approach compared to related works in the literature.