2020
DOI: 10.1109/access.2020.3039714
|View full text |Cite
|
Sign up to set email alerts
|

Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing Systems

Abstract: As the complexity of Deep Neural Network (DNN) models increases, their deployment on mobile devices becomes increasingly challenging, especially in complex vision tasks such as image classification. Many of recent contributions aim either to produce compact models matching the limited computing capabilities of mobile devices or to offload the execution of such burdensome models to a compute-capable device at the network edge-the edge servers. In this paper, we propose to modify the structure and training proce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
47
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 66 publications
(57 citation statements)
references
References 34 publications
1
47
0
Order By: Relevance
“…Unlike the research on vertical DI [26], [27], we have focused on the horizontal DI, because the sharing of the raw input in the vertical DI includes a critical privacy risk. In the horizontal DI literature, some works have addressed the achievement of low communication latency by optimizing the division point [5], [14], [15], leveraging multiple sink nodes [13], pruning the DNN model [14], quantizing the message [15]- [17], dimensional reduction of the message [18]- [20], and combining multiple inference tasks into a single one [16]. However, these works assumed a reliable communication link and aimed to reduce the communication payload size.…”
Section: Related Workmentioning
confidence: 99%
“…Unlike the research on vertical DI [26], [27], we have focused on the horizontal DI, because the sharing of the raw input in the vertical DI includes a critical privacy risk. In the horizontal DI literature, some works have addressed the achievement of low communication latency by optimizing the division point [5], [14], [15], leveraging multiple sink nodes [13], pruning the DNN model [14], quantizing the message [15]- [17], dimensional reduction of the message [18]- [20], and combining multiple inference tasks into a single one [16]. However, these works assumed a reliable communication link and aimed to reduce the communication payload size.…”
Section: Related Workmentioning
confidence: 99%
“…Most of the existing studies train the altered models from scratch [16], [18], [19]. Others reuse pretrained parameters in available architectures for the tail model, while re-designing and retraining the head portion to introduce a bottleneck [17], [21], [22]. These latter contributions introduce the notion of Head Network Distillation (HND) and Generalized HND (GHND), that use knowledge distillation in the training process.…”
Section: Related Workmentioning
confidence: 99%
“…• We apply BottleFit on cutting-edge CNNs such as DenseNet-169, DenseNet-201 and ResNet-152 on the ImageNet dataset, and compare the accuracy obtained by BottleFit with state-of-the-art local computing [6] and split computing approaches [16]- [19], [21], [24]. Our training campaign concludes that BottleFit achieves up to 77.1% data compression (with respect to JPEG) with only up to 0.6% loss in accuracy, while existing mobile and split computing approaches incur considerable accuracy loss of up to 6% and 3.6%, respectively.…”
Section: Introductionmentioning
confidence: 99%
“…create an ecosystem for achieving inference models with excellent performance. In this regard, many attempts have been reported to optimize the DNN models at edge devices [61], [62]. While Communication load, communication overhead, cost, memory, processing speed, network bandwidth, jitter, complexity are a few performance parameters, much of the preliminary research has focused on low-latency and energyefficient computations.…”
Section: Task Parallelizationmentioning
confidence: 99%