2018
DOI: 10.1007/978-3-030-01418-6_40
|View full text |Cite
|
Sign up to set email alerts
|

Auto-tuning Neural Network Quantization Framework for Collaborative Inference Between the Cloud and Edge

Abstract: Recently, deep neural networks (DNNs) have been widely applied in mobile intelligent applications. The inference for the DNNs is usually performed in the cloud. However, it leads to a large overhead of transmitting data via wireless network. In this paper, we demonstrate the advantages of the cloud-edge collaborative inference with quantization. By analyzing the characteristics of layers in DNNs, an auto-tuning neural network quantization framework for collaborative inference is proposed. We study the effectiv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 63 publications
(29 citation statements)
references
References 12 publications
0
28
0
1
Order By: Relevance
“…However, the accuracy degradation of such approach is not evaluated, while insufficient details are provided in the paper to replicate the framework. Li et al [28] proposes a framework using a partitioning and quantization strategy applied to a neural network to reduce the total delay. However, the compression gain is limited, as aggressive quantization will inevitably degrade accuracy.…”
Section: Related Workmentioning
confidence: 99%
“…However, the accuracy degradation of such approach is not evaluated, while insufficient details are provided in the paper to replicate the framework. Li et al [28] proposes a framework using a partitioning and quantization strategy applied to a neural network to reduce the total delay. However, the compression gain is limited, as aggressive quantization will inevitably degrade accuracy.…”
Section: Related Workmentioning
confidence: 99%
“…Building on the work of Kang et al [10], recent contributions propose DNN splitting methods [2,3,8,11,16,24,26]. Most of these studies, however, (I) do not evaluate models using their proposed lossy compression techniques [2], (II) lack of motivation to split the models as the size of the input data is exceedingly small, e.g., 32 × 32 pixels RGB images in [8,24,26], (III) specifically select models and network conditions in which their proposed method is advantageous [11], and/or (IV) assess proposed models in simple classification tasks such as miniImageNet, Caltech 101, CIFAR -10, and -100 datasets [3,8,16,24].…”
Section: Split Computingmentioning
confidence: 99%
“…Recently, an intermediate option, namely split Deep Neural Network (DNN) or split computing, has been attracting a considerable interest [2,3,8,10,11,16,24,26]. Many of such methods literally split DNN models into head and tail portions, which are executed by the mobile device and edge computer, respectively.…”
Section: Introductionmentioning
confidence: 99%
“…Some works also explore model compression based on partitioned DNNs. For example, [55] proposes an auto-tuning neural network quantization framework for collaborative inference between edge and cloud. Firstly, DNN is partitioned.…”
Section: Ai On Edgementioning
confidence: 99%