DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters

Zhao, Zhuoran; Barijough, Kamyar Mirzazad; Gerstlauer, Andreas

doi:10.1109/tcad.2018.2858384

Cited by 387 publications

(223 citation statements)

References 12 publications

Supporting

Mentioning

222

Contrasting

Unclassified

Order By: Relevance

“…The follow-up work Deep Compression [71] which blends the advantages of pruning, weight sharing and Huffman coding to compress DNNs, further pushes the compression ratio to 35-49x. However, for energy-constrained end devices, the above magnitude-based weight pruning method may not be directly applicable, since empirical measurements show that the reduction of the number of weights does not necessarily translate into significant energy saving [72]. This is because for DNNs as exemplified by AlexNet, the energy of the convolutional layers dominates the total energy cost, while the number in the fully-connected layers contributes most of the total number of Model Partition • Computation offloading to the edge server or mobile devices • Latency-and energy-oriented optimization [10], [78]- [86] Model Early-Exit • Partial DNNs model inference • Accuracy-aware [10], [15], [78], [87]- [91] Edge Caching • Fast response towards reusing the previous results of the same task [92]- [96] Input Filtering • Detecting difference between inputs, avoiding abundant computation [97]- [101] Model Selection • Inputs-oriented optimization • Accuracy-aware [102]- [106] Support for Multi-Tenancy • Scheduling multiple DNN-based task • Resource-efficient [38], [104], [107]- [111] Application-specific Optimization • Optimizations for the specific DNN-based application • Resource-efficient [104], [112] weights in the DNN. This suggests that the number of weights may not be a good indicator for energy, and the weight pruning should be directly energy-aware for end devices.…”

Section: Enabling Technologiesmentioning

confidence: 99%

Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing

et al. 2019

View full text Add to dashboard Cite

With the breakthroughs in deep learning, the recent years have witnessed a booming of artificial intelligence (AI) applications and services, spanning from personal assistant to recommendation systems to video/audio surveillance. More recently, with the proliferation of mobile computing and Internet-of-Things (IoT), billions of mobile and IoT devices are connected to the Internet, generating zillions Bytes of data at the network edge. Driving by this trend, there is an urgent need to push the AI frontiers to the network edge so as to fully unleash the potential of the edge big data. To meet this demand, edge computing, an emerging paradigm that pushes computing tasks and services from the network core to the network edge, has been widely recognized as a promising solution.The resulted new inter-discipline, edge AI or edge intelligence, is beginning to receive a tremendous amount of interest. However, research on edge intelligence is still in its infancy stage, and a dedicated venue for exchanging the recent advances of edge intelligence is highly desired by both the computer system and artificial intelligence communities. To this end, we conduct a comprehensive survey of the recent research efforts on edge intelligence. Specifically, we first review the background and motivation for artificial intelligence running at the network edge. We then provide an overview of the overarching architectures, frameworks and emerging key technologies for deep learning model towards training/inference at the network edge. Finally, we discuss future research opportunities on edge intelligence. We believe that this survey will elicit escalating attentions, stimulate fruitful discussions and inspire further research ideas on edge intelligence.

show abstract

Section: Enabling Technologiesmentioning

confidence: 99%

Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing

et al. 2019

View full text Add to dashboard Cite

show abstract

“…Aware Inference our work Quantization [8,12] × × Pruning [4,9,13,23] × × Separable convolutions [3,19,26] × × KD [1,6,25] × × × SplitNet [10] × × MoDNN, DeepThings [14,28] × Proposed NoNN in per node energy w.r.t. teacher.…”

Section: Area Model Communication-distributed Complements Compressionmentioning

confidence: 99%

Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

Bhardwaj

Lin

Sartor

et al. 2019

ACM Trans. Embed. Comput. Syst.

View full text Add to dashboard Cite

Model compression has emerged as an important area of research for deploying deep learning models on Internet-of-Things (IoT). However, for extremely memory-constrained scenarios, even the compressed models cannot fit within the memory of a single device and, as a result, must be distributed across multiple devices. This leads to a distributed inference paradigm in which memory and communication costs represent a major bottleneck. Yet, existing model compression techniques are not communication-aware. Therefore, we propose Network of Neural Networks (NoNN), a new distributed IoT learning paradigm that compresses a large pretrained 'teacher' deep network into several disjoint and highly-compressed 'student' modules, without loss of accuracy. Moreover, we propose a network science-based knowledge partitioning algorithm for the teacher model, and then train individual students on the resulting disjoint partitions. Extensive experimentation on five image classification datasets, for user-defined memory/performance budgets, show that NoNN achieves higher accuracy than several baselines and similar accuracy as the teacher model, while using minimal communication among students. Finally, as a case study, we deploy the proposed model for CIFAR-10 dataset on edge devices and demonstrate significant improvements in memory footprint (up to 24×), performance (up to 12×), and energy per node (up to 14×) compared to the large teacher model. We further show that for distributed inference on multiple edge devices, our proposed NoNN model results in up to 33× reduction in total latency w.r.t. a state-of-the-art model compression baseline.

show abstract

“…Finally, Zhao, Barijough, and Gerstlauer [32] proposed DeepThings, a framework for the inference distribution with a partitioning along the neural network data flow to resource-constrained IoT edge devices. However, they used a small number of devices and a high amount of memory, avoiding the use of more constrained devices such as the ones used in this work.…”

Section: Machine Learning and Iot Toolsmentioning

confidence: 99%

“…DeepThings [32] No No No along the neural ML IoT network layers Multifidelity [10] Yes Yes No N/A ML IoT Benedetto et al [30] No No Yes per neurons IoT * Not applicable. ** To use implemented functions.…”

Section: Machine Learning and Iot Toolsmentioning

confidence: 99%

Partitioning Convolutional Neural Networks to Maximize the Inference Rate on Constrained IoT Devices

Oliveira

Borin

2019

Future Internet

View full text Add to dashboard Cite

Billions of devices will compose the IoT system in the next few years, generating a huge amount of data. We can use fog computing to process these data, considering that there is the possibility of overloading the network towards the cloud. In this context, deep learning can treat these data, but the memory requirements of deep neural networks may prevent them from executing on a single resource-constrained device. Furthermore, their computational requirements may yield an unfeasible execution time. In this work, we propose *dn2pciot, a new algorithm to partition neural networks for efficient distributed execution. Our algorithm can optimize the neural network inference rate or the number of communications among devices. Additionally, our algorithm accounts appropriately for the shared parameters and biases of *cnn. We investigate the inference rate maximization for the LeNet model in constrained setups. We show that the partitionings offered by popular machine learning frameworks such as TensorFlow or by the general-purpose framework METIS may produce invalid partitionings for very constrained setups. The results show that our algorithm can partition LeNet for all the proposed setups, yielding up to 38% more inferences per second than METIS.

show abstract

DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters

Cited by 387 publications

References 12 publications

Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing

Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing

Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

Partitioning Convolutional Neural Networks to Maximize the Inference Rate on Constrained IoT Devices

Contact Info

Product

Resources

About