The success of research using convolutional neural network (CNN)-based camera sensor processing for autonomous driving has accelerated the development of autonomous driving vehicles. Since autonomous driving algorithms require high-performance computing for fast and accurate perception, a heterogeneous embedded platform consisting of a graphics processing unit (GPU) and a power-efficient dedicated deep learning accelerator (DLA) has been developed to efficiently implement deep learning algorithms in limited hardware environments. However, because the hardware utilization of these platforms remains low, performance differences such as processing speed and power efficiency between the heterogeneous platform and an embedded platform with only GPUs remain insignificant. To address this problem, this paper proposes an optimization technique that fully utilizes the available hardware resources in heterogeneous embedded platforms using parallel processing on DLA and GPU. Our proposed power-efficient network inference method improves processing speed without losing accuracy based on analyzing the problems encountered when dividing the networks between DLA and GPU for parallel processing. Moreover, the high compatibility of the proposed method is demonstrated by applying the proposed method to various CNN-based object detectors. The experimental results show that the proposed method increases the processing speed by 77.8%, 75.6%, and 55.2% and improves the power efficiency by 84%, 75.9%, and 62.3% on YOLOv3, SSD, and YOLOv5 networks, respectively, without an accuracy penalty.INDEX TERMS Autonomous vehicle, convolutional neural network, embedded platform, low-power design, parallel processing, real-time system.
Several unmanned retail stores have been introduced with the development of sensors, wireless communication, and computer vision technologies. A vision-based kiosk that is only equipped with a vision sensor has significant advantages such as compactness and low implementation cost. Using convolutional neural network (CNN)-based object detectors, the kiosk recognizes an object when a customer picks up a product. In retail object recognition, the key challenge is the limited number of detections and high interclass similarity. In this study, these challenges are addressed by utilizing the "view-specific" feature of an object; specifically, an object class is divided into multiple "view-based" subclasses, and the object detectors are trained using these data. Further, the "view-aware feature" is defined by aggregating subclass detection results from multiple cameras. A superclass classifier predicts a superclass by utilizing an informative subclass detection result that distinguishes the target object from other similar-looking objects. To verify the effectiveness of the proposed approach, a prototype of the vision-based unmanned kiosk system is implemented. Experimental results indicate that the proposed method outperforms the conventional method, even on a state-of-the-art detection network. The dataset used in this study has been subsequently provided in the IEEE DataPort for reproducibility.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.