PreVIous: A Methodology for Prediction of Visual Inference Performance on IoT Devices

Velasco-Montero, Delia; Fernández-Berni, Jorge; Carmona-Galán, Ricardo; Rodríguez-Vázquez, Á.

doi:10.1109/jiot.2020.2981684

Cited by 24 publications

(13 citation statements)

References 59 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…FastDeepIoT [18] uses execution time models based on linear model trees to predict the layer execution time on the devices Nexus 5 and Galaxy Nexus to finally compress VGGNet for both devices and reduce the neural network execution time by 48% to 78% and energy consumption by 37% to 69% compared with the state-of-the-art compression algorithms. In PreVIous [19], the execution time models are based on linear regression, and for the devices, Raspberry 3 and Odroid-XU4 reaches about 96% average accuracy for the layer-wise estimation. These results lead us to believe that the task of estimating layer execution times for task optimized computing architectures is significantly more challenging than for CPUs.…”

Section: Related Workmentioning

confidence: 99%

“…As a result, there have been some recent attempts to predict network latency and performance on different hardware platforms. However, most of the work targets either Graphic Processing Units (GPUs) [16], [17] server or the embedded Central Processing Units (CPUs) [18], [19], leaving out a wide range of hardware accelerators such as Field Programmable Gate Arrays (FPGAs) and hardware specifically designed for AI tasks e.g. Xilinx ZCU102 and Intel NCS2.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

ANNETTE: Accurate Neural Network Execution Time Estimation With Stacked Models

et al. 2021

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

ANNETTE: Accurate Neural Network Execution Time Estimation With Stacked Models

et al. 2021

View full text Add to dashboard Cite

“…However, it relies on the Nvidia System Management Interface (SMI), which is not available on Jetson platforms. PreVIous [12] presents a similar approach using linear regression models. It targets embedded CPU platforms like a Raspberry Pi3 and an Odroid-XU4 and reports an average error of 3.24% for the tested networks.…”

Section: Related Workmentioning

confidence: 99%

“…To skip the time consuming compiling step, DNN latency prediction techniques based on analytical or statistical models have been put forward. They target either large desktop-grade GPUs [10], [11] embedded Central Processing Units (CPUs) [12] but not more powerful embedded devices. Methods like [10] designed for desktop GPUs rely on the Nvidia System Management Interface (Nvidia SMI), which is not available on mobile GPUs.…”

Section: Introductionmentioning

confidence: 99%

“…Methods like [10] designed for desktop GPUs rely on the Nvidia System Management Interface (Nvidia SMI), which is not available on mobile GPUs. CPUs, on the other hand, feature a much smaller number of parallel computation units allowing linear models to accurately predict latency [12]. Estimation frameworks designed for Application Specific Integrated Circuits (ASIC) or Field Programmable Gate Arrays (FPGA) using white-box approaches rely on knowledge of the underlying hardware [13], [14].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Blackthorn: Latency Estimation Framework for CNNs on Embedded Nvidia Platforms

Lechner

Jantsch

2021

IEEE Access

View full text Add to dashboard Cite

With more powerful yet efficient embedded devices and accelerators being available for Deep Neural Networks (DNN), machine learning is becoming an integral part of edge computing. As the number of such devices increases, finding the best platform for a specific application has become more challenging. A common question for application developers is to find the most cost-effective combination of a DNN and a device while still meeting latency and accuracy requirements. In this work, we propose Blackthorn, a layerwise latency estimation framework for embedded Nvidia GPUs based on analytical models. We provide accurate predictions for each layer, helping developers to find bottlenecks and optimize the architecture of a DNN to fit target platforms. Our framework can quickly evaluate and compare large amounts of network optimizations without needing to build time-consuming execution engines. Our experimental results on Jetson TX2 and Jetson Nano devices show a per-layer estimation error of 6.104% Root-Mean-Square-Percentage-Error (RMSPE) and 5.888% RMSPE, which significantly outperforms current state-of-the-art methods. At network level, the average latency error is below 3% for the tested DNNs.

show abstract