Proceedings of the ACM MobiHoc Workshop on Pervasive Systems in the IoT Era 2019
DOI: 10.1145/3331052.3332477
|View full text |Cite
|
Sign up to set email alerts
|

Distributed Deep Neural Network Deployment for Smart Devices from the Edge to the Cloud

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(17 citation statements)
references
References 13 publications
0
17
0
Order By: Relevance
“…It comprehensively considers large-scale model partition plan and migration plan, reduces inference latency and optimizes DNN real-time query performance. Chang-You Lin et al [62] study the deployment of distributed DNN with limited completion time to solve the deployment problem considering both response time and inference throughput.Dey et al [63] realize a deep learning inference system which involved a robot vehicle based on Raspberry Pi 3 and hardware accelerator of Intel, reducing inference latency and improving tasks efficiency.…”
Section: Total Inference Latency Minimizationmentioning
confidence: 99%
“…It comprehensively considers large-scale model partition plan and migration plan, reduces inference latency and optimizes DNN real-time query performance. Chang-You Lin et al [62] study the deployment of distributed DNN with limited completion time to solve the deployment problem considering both response time and inference throughput.Dey et al [63] realize a deep learning inference system which involved a robot vehicle based on Raspberry Pi 3 and hardware accelerator of Intel, reducing inference latency and improving tasks efficiency.…”
Section: Total Inference Latency Minimizationmentioning
confidence: 99%
“…Those benefits have been translated to significant performance improvements [43,67]. The 8-bit precision inference is a de facto standard on embedded systems since it can frequently match floating-point accuracy and is the lowest precision natively supported in computation [21,47]. Further, microcontrollers often may not contain floating-point units (FPUs), making floating-point computation prohibitively expensive [43].…”
Section: Accessibility Of Edge MLmentioning
confidence: 99%
“…While all the above knobs are readily available to machine learning researchers, it is not obvious how they interact with hardware configurations, given the specific set of constraints, e.g., cost, latency, size, and user experience. While approaches like neural architecture search (NAS) can automate finding feasible solutions, they are often targeted at larger models [20], are constrained in scope [47,60], and rarely optimize the cost of the overall system. As a result, deploying efficient ML models on edge devices in a cost-aware fashion currently requires significant expertise, which makes them inaccessible to a vast pool of potential developers.…”
Section: Introductionmentioning
confidence: 99%
“…This requires a special training of the neural network; therefore, it cannot be used for pre-trained networks as those considered in this article. Lin et al [17] also consider a three-tier network and DNN partitioned into stages.…”
Section: Related Workmentioning
confidence: 99%
“…Accordingly, several solutions have been recently proposed for task offloading [8][9][10][11], especially for accelerating 1 deep neural network (DNN) inference (Section VI). A few of them operate only locally [15]; some split DNN computations between the local (or edge) network and the cloud [3,7]; and others leverage devices in a tiered network architecture [16,17]. In this context, the main challenge is deciding how to collaboratively partition and distribute computations under dynamic network conditions.…”
Section: Introductionmentioning
confidence: 99%