Distributed Deep Neural Network Deployment for Smart Devices from the Edge to the Cloud

Lin, Changhong; Wang, Tzu-Chen; Chen, Kuan‐Chih; Lee, Bor-Yan; Kuo, Jian-Jhih

doi:10.1145/3331052.3332477

Cited by 16 publications

(17 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It comprehensively considers large-scale model partition plan and migration plan, reduces inference latency and optimizes DNN real-time query performance. Chang-You Lin et al [62] study the deployment of distributed DNN with limited completion time to solve the deployment problem considering both response time and inference throughput.Dey et al [63] realize a deep learning inference system which involved a robot vehicle based on Raspberry Pi 3 and hardware accelerator of Intel, reducing inference latency and improving tasks efficiency.…”

Section: Total Inference Latency Minimizationmentioning

confidence: 99%

A Survey on Collaborative DNN Inference for Edge Intelligence

Ren¹,

Qu²,

Chen³

et al. 2022

Preprint

View full text Add to dashboard Cite

With the vigorous development of artificial intelligence (AI), the intelligent applications based on deep neural network (DNN) change people's lifestyles and the production efficiency. However, the huge amount of computation and data generated from the network edge becomes the major bottleneck, and traditional cloud-based computing mode has been unable to meet the requirements of real-time processing tasks. To solve the above problems, by embedding AI model training and inference capabilities into the network edge, edge intelligence (EI) becomes a cutting-edge direction in the field of AI. Furthermore, collaborative DNN inference among the cloud, edge, and end device provides a promising way to boost the EI. Nevertheless, at present, EI oriented collaborative DNN inference is still in its early stage, lacking a systematic classification and discussion of existing research efforts. Thus motivated, we have made a comprehensive investigation on the recent studies about EI oriented collaborative DNN inference. In this paper, we firstly review the background and motivation of EI. Then, we classify four typical collaborative DNN inference paradigms for EI, and analyze the characteristics and key technologies of them. Finally, we summarize the current challenges of collaborative DNN inference, discuss the future development trend and provide the future research direction.

show abstract

Section: Total Inference Latency Minimizationmentioning

confidence: 99%

A Survey on Collaborative DNN Inference for Edge Intelligence

Ren¹,

Qu²,

Chen³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Those benefits have been translated to significant performance improvements [43,67]. The 8-bit precision inference is a de facto standard on embedded systems since it can frequently match floating-point accuracy and is the lowest precision natively supported in computation [21,47]. Further, microcontrollers often may not contain floating-point units (FPUs), making floating-point computation prohibitively expensive [43].…”

Section: Accessibility Of Edge MLmentioning

confidence: 99%

“…While all the above knobs are readily available to machine learning researchers, it is not obvious how they interact with hardware configurations, given the specific set of constraints, e.g., cost, latency, size, and user experience. While approaches like neural architecture search (NAS) can automate finding feasible solutions, they are often targeted at larger models [20], are constrained in scope [47,60], and rarely optimize the cost of the overall system. As a result, deploying efficient ML models on edge devices in a cost-aware fashion currently requires significant expertise, which makes them inaccessible to a vast pool of potential developers.…”

Section: Introductionmentioning

confidence: 99%

Machine Learning-Based Predictive Model to Improve Cloud Application Performance in Cloud SaaS

Sharma¹,

Gupta²

2022

Machine Learning and Optimization Models for Optimization in Cloud

View full text Add to dashboard Cite

Researchers have long touted a vision of the future enabled by a proliferation of internet-of-things devices, including smart sensors, homes, and cities. Increasingly, embedding intelligence in such devices involves the use of deep neural networks. However, their storage and processing requirements make them prohibitive for cheap, off-the-shelf platforms. Overcoming those requirements is necessary for enabling widely-applicable smart devices. While many ways of making models smaller and more efficient have been developed, there is a lack of understanding of which ones are best suited for particular scenarios. More importantly for edge platforms, those choices cannot be analyzed in isolation from cost and user experience. In this work, we holistically explore how quantization, model scaling, and multi-modality interact with system components such as memory, sensors, and processors. We perform this hardware/software co-design from the cost, latency, and user-experience perspective, and develop a set of guidelines for optimal system design and model deployment for the most cost-constrained platforms. We demonstrate our approach using an end-to-end, on-device, biometric user authentication system using a $20 ESP-EYE board.

show abstract

“…This requires a special training of the neural network; therefore, it cannot be used for pre-trained networks as those considered in this article. Lin et al [17] also consider a three-tier network and DNN partitioned into stages.…”

Section: Related Workmentioning

confidence: 99%

“…Accordingly, several solutions have been recently proposed for task offloading [8][9][10][11], especially for accelerating 1 deep neural network (DNN) inference (Section VI). A few of them operate only locally [15]; some split DNN computations between the local (or edge) network and the cloud [3,7]; and others leverage devices in a tiered network architecture [16,17]. In this context, the main challenge is deciding how to collaboratively partition and distribute computations under dynamic network conditions.…”

Section: Introductionmentioning

confidence: 99%

Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading

Mohammed

Joe‐Wong

Babbar

et al. 2020

IEEE INFOCOM 2020 - IEEE Conference on Computer Communications

148

View full text Add to dashboard Cite

Deep neural networks (DNN) are the de-facto solution behind many intelligent applications of today, ranging from machine translation to autonomous driving. DNNs are accurate but resource-intensive, especially for embedded devices such as mobile phones and smart objects in the Internet of Things. To overcome the related resource constraints, DNN inference is generally offloaded to the edge or to the cloud. This is accomplished by partitioning the DNN and distributing computations at the two different ends. However, most of existing solutions simply split the DNN into two parts, one running locally or at the edge, and the other one in the cloud. In contrast, this article proposes a technique to divide a DNN in multiple partitions that can be processed locally by end devices or offloaded to one or multiple powerful nodes, such as in fog networks. The proposed scheme includes both an adaptive DNN partitioning scheme and a distributed algorithm to offload computations based on a matching game approach. Results obtained by using a selfdriving car dataset and several DNN benchmarks show that the proposed solution significantly reduces the total latency for DNN inference compared to other distributed approaches and is 2.6 to 4.2 times faster than the state of the art. Index Terms-DNN inference, task partitioning, task offloading, distributed algorithm, matching game.

show abstract

Distributed Deep Neural Network Deployment for Smart Devices from the Edge to the Cloud

Cited by 16 publications

References 13 publications

A Survey on Collaborative DNN Inference for Edge Intelligence

A Survey on Collaborative DNN Inference for Edge Intelligence

Machine Learning-Based Predictive Model to Improve Cloud Application Performance in Cloud SaaS

Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading

Contact Info

Product

Resources

About