Darknet on OpenCL: A Multi-platform Tool for Object Detection and Classification

Sowa, Piotr; Izydorczyk, Jacek

doi:10.20944/preprints202007.0506.v1

Cited by 4 publications

(9 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To implement our proposed method, we modified the Darknet-OpenCL framework [ 43 ], which is an open-source framework ported from Darknet [ 44 ] using CUDA [ 45 ] to OpenCL [ 46 ]. To accelerate matrix multiplication, which takes up most of the time in deep learning operations, we modified the basic linear algebra subprograms (BLAS) library of Darknet-OpenCL, OpenBLAS [ 47 ] for CPU, and CLBlast [ 48 ] for GPU.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Accelerating On-Device Learning with Layer-Wise Processor Selection Method on Unified Memory

Kim

Moon

et al. 2021

Sensors

View full text Add to dashboard Cite

Recent studies have applied the superior performance of deep learning to mobile devices, and these studies have enabled the running of the deep learning model on a mobile device with limited computing power. However, there is performance degradation of the deep learning model when it is deployed in mobile devices, due to the different sensors of each device. To solve this issue, it is necessary to train a network model specific to each mobile device. Therefore, herein, we propose an acceleration method for on-device learning to mitigate the device heterogeneity. The proposed method efficiently utilizes unified memory for reducing the latency of data transfer during network model training. In addition, we propose the layer-wise processor selection method to consider the latency generated by the difference in the processor performing the forward propagation step and the backpropagation step in the same layer. The experiments were performed on an ODROID-XU4 with the ResNet-18 model, and the experimental results indicate that the proposed method reduces the latency by at most 28.4% compared to the central processing unit (CPU) and at most 21.8% compared to the graphics processing unit (GPU). Through experiments using various batch sizes to measure the average power consumption, we confirmed that device heterogeneity is alleviated by performing on-device learning using the proposed method.

show abstract

Section: Methodsmentioning

confidence: 99%

“…The number of epochs was set to 300. Our experiments were conducted using the Darknet-OpenCL [ 43 ] framework.…”

Section: Methodsmentioning

confidence: 99%

Accelerating On-Device Learning with Layer-Wise Processor Selection Method on Unified Memory

Kim

Moon

et al. 2021

Sensors

View full text Add to dashboard Cite

show abstract

“…F I G U R E 2 Multi-GPU computing monitor state example 14 F I G U R E 3 YOLO2 training process step example It works very well and improves execution in the most nested loop by a number of tuning value that is computed dynamically by dividing the "filters" variable by "4". The last important information is that parameter "t" cannot be correctly checked in the conditions or printed out.…”

Section: Listing 2: Cpu Run Time Assert Protection Examplementioning

confidence: 99%

Darknet on OpenCL: A multiplatform tool for object detection and classification

Sowa¹,

Izydorczyk

2022

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

The goal of this article is to overview the challenges and problems on the way from the state-of-the-art CUDA accelerated neural network code to multi-GPU code. For this purpose, the authors describe the journey of porting that existing in GitHub, a fully featured CUDA-accelerated Darknet engine, to OpenCL. This article presents the lessons learned and the techniques that were put in place for this porting. There are few other implementations on GitHub that leverage the OpenCL standard, and a few have tried to port Darknet as well. Darknet is a well-known convolutional neural network (CNN) framework. The authors of this article investigated all aspects of porting and achieved a fully featured Darknet engine on OpenCL. The effort was focused not only on classification using YOLO1, YOLO2, YOLO3, and YOLO4 CNN models.Other aspects were also covered, such as training neural networks and benchmarks to identify weak points in the implementation. Compared with the standard CPU version, the GPU computing code substantially improves the Darknet computing time by using underutilized hardware in existing systems. If the system is OpenCL-based, it is practically hardware-independent. The authors also improved the CUDA version as Darknet-vNext.

show abstract

“…Therefore, an OpenCLbased GPU-accelerated library needs to be linked for deep learning frameworks to be efficiently executed in embedded systems. OpenCL Caffe [40], DeepCL [41], TensorFlow Lite [42], and Darknet on OpenCL [43] are deep learning frameworks that support OpenCLbased GPU-accelerated libraries at present.…”

Section: Deep Learning Frameworkmentioning

confidence: 99%

CitiusSynapse: A Deep Learning Framework for Embedded Systems

2021

View full text Add to dashboard Cite

As embedded systems, such as smartphones with limited resources, have become increasingly popular, active research has recently been conducted on performing on-device deep learning in such systems. Therefore, in this study, we propose a deep learning framework that is specialized for embedded systems with limited resources, the operation processing structure of which differs from that of standard PCs. The proposed framework supports an OpenCL-based accelerator engine for accelerator deep learning operations in various embedded systems. Moreover, the parallel processing performance of OpenCL is maximized through an OpenCL kernel that is optimized for embedded GPUs, and the structural characteristics of embedded systems, such as unified memory. Furthermore, an on-device optimizer for optimizing the performance in on-device environments, and model converters for compatibility with conventional frameworks, are provided. The results of a performance evaluation show that the proposed on-device framework outperformed conventional methods.

show abstract

Darknet on OpenCL: A Multi-platform Tool for Object Detection and Classification

Cited by 4 publications

References 29 publications

Accelerating On-Device Learning with Layer-Wise Processor Selection Method on Unified Memory

Accelerating On-Device Learning with Layer-Wise Processor Selection Method on Unified Memory

Darknet on OpenCL: A multiplatform tool for object detection and classification

CitiusSynapse: A Deep Learning Framework for Embedded Systems

Contact Info

Product

Resources

About