Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing Systems

Matsubara, Yoshitomo; Callegaro, Davide; Baidya, Sabur; Levorato, Marco; Singh, Sameer

doi:10.1109/access.2020.3039714

Cited by 66 publications

(57 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Unlike the research on vertical DI [26], [27], we have focused on the horizontal DI, because the sharing of the raw input in the vertical DI includes a critical privacy risk. In the horizontal DI literature, some works have addressed the achievement of low communication latency by optimizing the division point [5], [14], [15], leveraging multiple sink nodes [13], pruning the DNN model [14], quantizing the message [15]- [17], dimensional reduction of the message [18]- [20], and combining multiple inference tasks into a single one [16]. However, these works assumed a reliable communication link and aimed to reduce the communication payload size.…”

Section: Related Workmentioning

confidence: 99%

Communication-Oriented Model Fine-Tuning for Packet-Loss Resilient Distributed Inference Under Highly Lossy IoT Networks

et al. 2022

View full text Add to dashboard Cite

The distributed inference (DI) framework has gained traction as a technique for real-time applications empowered by cutting-edge deep machine learning (ML) on resource-constrained Internet of things (IoT) devices. In DI, computational tasks are offloaded from the IoT device to the edge server via lossy IoT networks. However, generally, there is a communication system-level trade-off between communication latency and reliability; thus, to provide accurate DI results, a reliable and high-latency communication system is required to be adapted, which results in non-negligible end-to-end latency of the DI. This motivated us to improve the trade-off between the communication latency and accuracy by efforts on ML techniques. Specifically, we have proposed a communication-oriented model tuning (COMtune), which aims to achieve highly accurate DI with low-latency but unreliable communication links. In COMtune, the key idea is to fine-tune the ML model by emulating the effect of unreliable communication links through the application of the dropout technique. This enables the DI system to obtain robustness against unreliable communication links. Our ML experiments revealed that COMtune enables accurate predictions with low latency and under lossy networks.

show abstract

Section: Related Workmentioning

confidence: 99%

Communication-Oriented Model Fine-Tuning for Packet-Loss Resilient Distributed Inference Under Highly Lossy IoT Networks

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Most of the existing studies train the altered models from scratch [16], [18], [19]. Others reuse pretrained parameters in available architectures for the tail model, while re-designing and retraining the head portion to introduce a bottleneck [17], [21], [22]. These latter contributions introduce the notion of Head Network Distillation (HND) and Generalized HND (GHND), that use knowledge distillation in the training process.…”

Section: Related Workmentioning

confidence: 99%

“…• We apply BottleFit on cutting-edge CNNs such as DenseNet-169, DenseNet-201 and ResNet-152 on the ImageNet dataset, and compare the accuracy obtained by BottleFit with state-of-the-art local computing [6] and split computing approaches [16]- [19], [21], [24]. Our training campaign concludes that BottleFit achieves up to 77.1% data compression (with respect to JPEG) with only up to 0.6% loss in accuracy, while existing mobile and split computing approaches incur considerable accuracy loss of up to 6% and 3.6%, respectively.…”

Section: Introductionmentioning

confidence: 99%

BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split Computing

Matsubara,

Callegaro,

Singh

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Although mission-critical applications require the use of deep neural networks (DNNs), their continuous execution at mobile devices results in a significant increase in energy consumption. While edge offloading can decrease energy consumption, erratic patterns in channel quality, network and edge server load can lead to severe disruption of the system's key operations. An alternative approach, called split computing, generates compressed representations within the model (called "bottlenecks"), to reduce bandwidth usage and energy consumption. Prior work has proposed approaches that introduce additional layers, to the detriment of energy consumption and latency. For this reason, we propose a new framework called BottleFit, which, in addition to targeted DNN architecture modifications, includes a novel training strategy to achieve high accuracy even with strong compression rates. We apply BottleFit on cutting-edge DNN models in image classification, and show that BottleFit achieves 77.1% data compression with up to 0.6% accuracy loss on ImageNet dataset, while state of the art such as SPINN loses up to 6% in accuracy. We experimentally measure the power consumption and latency of an image classification application running on an NVIDIA Jetson Nano board (GPU-based) and a Raspberry PI board (GPU-less). We show that BottleFit decreases power consumption and latency respectively by up to 49% and 89% with respect to (w.r.t.) local computing and by 37% and 55% w.r.t. edge offloading. We also compare BottleFit with state-of-the-art autoencoders-based approaches, and show that (i) BottleFit reduces power consumption and execution time respectively by up to 54% and 44% on the Jetson and 40% and 62% on Raspberry PI; (ii) the size of the head model executed on the mobile device is 83 times smaller. The code repository will be published for full reproducibility of the results.

show abstract

“…create an ecosystem for achieving inference models with excellent performance. In this regard, many attempts have been reported to optimize the DNN models at edge devices [61], [62]. While Communication load, communication overhead, cost, memory, processing speed, network bandwidth, jitter, complexity are a few performance parameters, much of the preliminary research has focused on low-latency and energyefficient computations.…”

Section: Task Parallelizationmentioning

confidence: 99%

Low Latency Deep Learning Inference Model for Distributed Intelligent IoT Edge Clusters

Naveen¹,

Kounte

Ahmed

2021

IEEE Access

View full text Add to dashboard Cite

Edge computing is a new paradigm enabling intelligent applications for the Internet of Things (IoT) using mobile, low-cost IoT devices embedded with data analytics. Due to the resource limitations of Internet of Things devices, it is essential to use these resources optimally. Therefore, intelligence needs to be applied through an efficient deep learning model to optimize resources like memory, power, and computational ability. In addition, intelligent edge computing is essential for real-time applications requiring end-to-end delay or response time within a few seconds. We propose decentralized heterogeneous edge clusters deployed with an optimized pre-trained yolov2 model. In our model, the weights have been pruned and then split into fused layers and distributed to edge devices for processing. Later the gateway device merges the partial results from each edge device to obtain the processed output. We deploy a convolutional neural network (CNN) on resource-constraint IoT devices to make them intelligent and realistic. Evaluation was done by deploying the proposed model on five IoT edge devices and a gateway device enabled with hardware accelerator. The evaluation of our proposed model shows significant improvement in terms of communication size and inference latency. Compared to DeepThings for 5 X 5 fused layer partitioning for five devices, our proposed model reduces communication size by ∼ 14.4% and inference latency by ∼16%.

show abstract

Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing Systems

Cited by 66 publications

References 34 publications

Communication-Oriented Model Fine-Tuning for Packet-Loss Resilient Distributed Inference Under Highly Lossy IoT Networks

Communication-Oriented Model Fine-Tuning for Packet-Loss Resilient Distributed Inference Under Highly Lossy IoT Networks

BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split Computing

Low Latency Deep Learning Inference Model for Distributed Intelligent IoT Edge Clusters

Contact Info

Product

Resources

About