2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS) 2020
DOI: 10.1109/icdcs47774.2020.00101
|View full text |Cite
|
Sign up to set email alerts
|

Context-Aware Deep Model Compression for Edge Cloud Computing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(4 citation statements)
references
References 18 publications
0
2
0
Order By: Relevance
“…These authors propose an iterative algorithm that optimizes the partitioning and compression of a base DNN model. By dynamically adjusting strategies based on rewards, the approach efficiently maximizes performance while meeting resource constraints [71]. This study also validates that pruned models demonstrate accelerated inference and reduced memory usage by introducing GNN-RL, a pruning method that combines graph neural networks (GNNs) and reinforcement learning for topology-aware compression [72].…”
Section: Model Compressionmentioning
confidence: 55%
“…These authors propose an iterative algorithm that optimizes the partitioning and compression of a base DNN model. By dynamically adjusting strategies based on rewards, the approach efficiently maximizes performance while meeting resource constraints [71]. This study also validates that pruned models demonstrate accelerated inference and reduced memory usage by introducing GNN-RL, a pruning method that combines graph neural networks (GNNs) and reinforcement learning for topology-aware compression [72].…”
Section: Model Compressionmentioning
confidence: 55%
“…As an important research direction to improve QoS in EI, latency optimization has attracted the attention of many researchers. While model compression [23], [24] and model early exit [18], [25] can accelerate the DNN inference, these methods result in a loss of accuracy and are not suitable for intelligent services with high accuracy. Therefore, the model partitioning that has no effect on accuracy is a good choice.…”
Section: Related Workmentioning
confidence: 99%
“…This step involves identifying the new metadata to maintain the required level of performance for the DNN based on the operational conditions of the edge-cloud system. To quickly adapt to run-time changes, optimal partition point may be identified using an estimation-based approach to predict the latency of individual layers of the DNN [18] or by using a real-time benchmarking approach [6].…”
Section: A Baseline Approachmentioning
confidence: 99%