2018
DOI: 10.1109/tcad.2018.2858384
|View full text |Cite
|
Sign up to set email alerts
|

DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
222
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 387 publications
(223 citation statements)
references
References 12 publications
0
222
0
1
Order By: Relevance
“…The follow-up work Deep Compression [71] which blends the advantages of pruning, weight sharing and Huffman coding to compress DNNs, further pushes the compression ratio to 35-49x. However, for energy-constrained end devices, the above magnitude-based weight pruning method may not be directly applicable, since empirical measurements show that the reduction of the number of weights does not necessarily translate into significant energy saving [72]. This is because for DNNs as exemplified by AlexNet, the energy of the convolutional layers dominates the total energy cost, while the number in the fully-connected layers contributes most of the total number of Model Partition • Computation offloading to the edge server or mobile devices • Latency-and energy-oriented optimization [10], [78]- [86] Model Early-Exit • Partial DNNs model inference • Accuracy-aware [10], [15], [78], [87]- [91] Edge Caching • Fast response towards reusing the previous results of the same task [92]- [96] Input Filtering • Detecting difference between inputs, avoiding abundant computation [97]- [101] Model Selection • Inputs-oriented optimization • Accuracy-aware [102]- [106] Support for Multi-Tenancy • Scheduling multiple DNN-based task • Resource-efficient [38], [104], [107]- [111] Application-specific Optimization • Optimizations for the specific DNN-based application • Resource-efficient [104], [112] weights in the DNN. This suggests that the number of weights may not be a good indicator for energy, and the weight pruning should be directly energy-aware for end devices.…”
Section: Enabling Technologiesmentioning
confidence: 99%
“…The follow-up work Deep Compression [71] which blends the advantages of pruning, weight sharing and Huffman coding to compress DNNs, further pushes the compression ratio to 35-49x. However, for energy-constrained end devices, the above magnitude-based weight pruning method may not be directly applicable, since empirical measurements show that the reduction of the number of weights does not necessarily translate into significant energy saving [72]. This is because for DNNs as exemplified by AlexNet, the energy of the convolutional layers dominates the total energy cost, while the number in the fully-connected layers contributes most of the total number of Model Partition • Computation offloading to the edge server or mobile devices • Latency-and energy-oriented optimization [10], [78]- [86] Model Early-Exit • Partial DNNs model inference • Accuracy-aware [10], [15], [78], [87]- [91] Edge Caching • Fast response towards reusing the previous results of the same task [92]- [96] Input Filtering • Detecting difference between inputs, avoiding abundant computation [97]- [101] Model Selection • Inputs-oriented optimization • Accuracy-aware [102]- [106] Support for Multi-Tenancy • Scheduling multiple DNN-based task • Resource-efficient [38], [104], [107]- [111] Application-specific Optimization • Optimizations for the specific DNN-based application • Resource-efficient [104], [112] weights in the DNN. This suggests that the number of weights may not be a good indicator for energy, and the weight pruning should be directly energy-aware for end devices.…”
Section: Enabling Technologiesmentioning
confidence: 99%
“…Aware Inference our work Quantization [8,12] × × Pruning [4,9,13,23] × × Separable convolutions [3,19,26] × × KD [1,6,25] × × × SplitNet [10] × × MoDNN, DeepThings [14,28] × Proposed NoNN in per node energy w.r.t. teacher.…”
Section: Area Model Communication-distributed Complements Compressionmentioning
confidence: 99%
“…Finally, Zhao, Barijough, and Gerstlauer [32] proposed DeepThings, a framework for the inference distribution with a partitioning along the neural network data flow to resource-constrained IoT edge devices. However, they used a small number of devices and a high amount of memory, avoiding the use of more constrained devices such as the ones used in this work.…”
Section: Machine Learning and Iot Toolsmentioning
confidence: 99%
“…DeepThings [32] No No No along the neural ML IoT network layers Multifidelity [10] Yes Yes No N/A ML IoT Benedetto et al [30] No No Yes per neurons IoT * Not applicable. ** To use implemented functions.…”
Section: Machine Learning and Iot Toolsmentioning
confidence: 99%