2021
DOI: 10.1016/j.sysarc.2020.101896
|View full text |Cite
|
Sign up to set email alerts
|

Benchmarking vision kernels and neural network inference accelerators on embedded platforms

Abstract: Developing efficient embedded vision applications requires exploring various algorithmic optimization trade-offs and a broad spectrum of hardware architecture choices. This makes navigating the solution space and finding the design points with optimal performance trade-offs a challenge for developers. To help provide a fair baseline comparison, we conducted comprehensive benchmarks of accuracy, run-time, and energy efficiency of a wide range of vision kernels and neural networks on multiple embedded platforms:… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 30 publications
(10 citation statements)
references
References 17 publications
0
10
0
Order By: Relevance
“…Therefore, it is necessary to use the model and pre-trained weights together with the most appropriate software to work with the GPU. Studies were conducted on software environments such as TensorFlow, TensorFlow Light, NVIDIA TensorRT, and OpenCV DNN Module [35] . These types of software frameworks allow real-time operation of models with 8-bit integer and 16-bit floating-point optimizations.…”
Section: Methodsmentioning
confidence: 99%
“…Therefore, it is necessary to use the model and pre-trained weights together with the most appropriate software to work with the GPU. Studies were conducted on software environments such as TensorFlow, TensorFlow Light, NVIDIA TensorRT, and OpenCV DNN Module [35] . These types of software frameworks allow real-time operation of models with 8-bit integer and 16-bit floating-point optimizations.…”
Section: Methodsmentioning
confidence: 99%
“…Table I shows the calculation results of different models, where ResNet-18, a common backbone network in computer vision, is used as a benchmark for comparison. Qasaimeh et al [28] measured the performance of ResNet-18 on embedded platforms and showed that ResNet-18 could achieve 5.17 frames/s on ARM Cortex A57 CPU and 145 frames/s on Jetson TX2 GPU. Compared to ResNet-18, the AMagPoseNet has only 24% of its NPs and 0.98% of its computation (FLOPs).…”
Section: Table I Nps and Flops For Different Modelsmentioning
confidence: 99%
“…Reference [20] investigates the on-the-edge inference of DNNs in terms of latency, energy consumption, and temperature, on five different hardware platforms; unlike the proposed method, this work does not take advantage of the optimization frameworks we have investigated. In [21], an in-depth benchmark analysis of three embedded platforms is performed for image vision applications including MobileNet and InceptionV2; in [22], EDLAB is delivered, an end-to-end benchmark to evaluate the overall performance of three image classification and one object detection models across Intel NCS2, Edge TPU and Jetson Xavier NX. In [23], a performance analysis of the edge TPU board is provided for object classification.…”
Section: Related Workmentioning
confidence: 99%