Auto-tuning Neural Network Quantization Framework for Collaborative Inference Between the Cloud and Edge

Li, Guangli; Liu, Lei; Wang, Xueying; Dong, Xiao; Zhao, Peng; Feng, Xiaobing

doi:10.1007/978-3-030-01418-6_40

Cited by 63 publications

(29 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…However, the accuracy degradation of such approach is not evaluated, while insufficient details are provided in the paper to replicate the framework. Li et al [28] proposes a framework using a partitioning and quantization strategy applied to a neural network to reduce the total delay. However, the compression gain is limited, as aggressive quantization will inevitably degrade accuracy.…”

Section: Related Workmentioning

confidence: 99%

Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing Systems

et al. 2020

View full text Add to dashboard Cite

As the complexity of Deep Neural Network (DNN) models increases, their deployment on mobile devices becomes increasingly challenging, especially in complex vision tasks such as image classification. Many of recent contributions aim either to produce compact models matching the limited computing capabilities of mobile devices or to offload the execution of such burdensome models to a compute-capable device at the network edge-the edge servers. In this paper, we propose to modify the structure and training process of DNN models for complex image classification tasks to achieve in-network compression in the early network layers. Our training process stems from knowledge distillation, a technique that has been traditionally used to build small-student-models mimicking the output of larger-teacher-models. Here, we adopt this idea to obtain aggressive compression while preserving accuracy. Our results demonstrate that our approach is effective for state-of-the-art models trained over complex datasets, and can extend the parameter region in which edge computing is a viable and advantageous option. Additionally, we demonstrate that in many settings of practical interest we reduce the inference time with respect to specialized models such as MobileNet v2 executed at the mobile device, while improving accuracy.

show abstract

Section: Related Workmentioning

confidence: 99%

Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing Systems

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Building on the work of Kang et al [10], recent contributions propose DNN splitting methods [2,3,8,11,16,24,26]. Most of these studies, however, (I) do not evaluate models using their proposed lossy compression techniques [2], (II) lack of motivation to split the models as the size of the input data is exceedingly small, e.g., 32 × 32 pixels RGB images in [8,24,26], (III) specifically select models and network conditions in which their proposed method is advantageous [11], and/or (IV) assess proposed models in simple classification tasks such as miniImageNet, Caltech 101, CIFAR -10, and -100 datasets [3,8,16,24].…”

Section: Split Computingmentioning

confidence: 99%

“…Recently, an intermediate option, namely split Deep Neural Network (DNN) or split computing, has been attracting a considerable interest [2,3,8,10,11,16,24,26]. Many of such methods literally split DNN models into head and tail portions, which are executed by the mobile device and edge computer, respectively.…”

Section: Introductionmentioning

confidence: 99%

Split Computing for Complex Object Detectors

Matsubara

Levorato

2020

Proceedings of the 4th International Workshop on Embedded and Mobile Deep Learning

View full text Add to dashboard Cite

Following the trends of mobile and edge computing for DNN models, an intermediate option, split computing, has been attracting attentions from the research community. Previous studies empirically showed that while mobile and edge computing often would be the best options in terms of total inference time, there are some scenarios where split computing methods can achieve shorter inference time. All the proposed split computing approaches, however, focus on image classification tasks, and most are assessed with small datasets that are far from the practical scenarios. In this paper, we discuss the challenges in developing split computing methods for powerful R-CNN object detectors trained on a large dataset, COCO 2017. We extensively analyze the object detectors in terms of layer-wise tensor size and model size, and show that naive split computing methods would not reduce inference time. To the best of our knowledge, this is the first study to inject small bottlenecks to such object detectors and unveil the potential of a split computing approach. CCS CONCEPTS • Computing methodologies → Computer vision; • Information systems → Computing platforms.

show abstract

“…Some works also explore model compression based on partitioned DNNs. For example, [55] proposes an auto-tuning neural network quantization framework for collaborative inference between edge and cloud. Firstly, DNN is partitioned.…”

Section: Ai On Edgementioning

confidence: 99%

Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence

Deng

Zhao

Fang

et al. 2020

IEEE Internet Things J.

713

224

View full text Add to dashboard Cite

Along with the deepening development in communication technologies and the surge of mobile devices, a brandnew computation paradigm, Edge Computing, is surging in popularity. Meanwhile, Artificial Intelligence (AI) applications are thriving with the breakthroughs in deep learning and the upgrade of hardware architectures. Billions of bytes of data, generated at the network edge, put great demands on data processing and structural optimization. Therefore, there exists a strong demand to integrate Edge Computing and AI, which gives birth to Edge Intelligence. In this article, we divide Edge Intelligence into AI for edge (Intelligence-enabled Edge Computing) and AI on edge (Artificial Intelligence on Edge). The former focuses on providing a more optimal solution to the key concerns in Edge Computing with the help of popular and effective AI technologies while the latter studies how to carry out the entire process of building AI models, i.e., model training and inference, on edge. This article focuses on giving insights into this new inter-disciplinary field from a broader vision and perspective. It discusses the core concepts and the research road-map, which should provide the necessary background for potential future research programs in Edge Intelligence.

show abstract

Auto-tuning Neural Network Quantization Framework for Collaborative Inference Between the Cloud and Edge

Cited by 63 publications

References 12 publications

Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing Systems

Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing Systems

Split Computing for Complex Object Detectors

Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence

Contact Info

Product

Resources

About