Context-Aware Deep Model Compression for Edge Cloud Computing

Wang, Lingdong; Xiang, Liyao; Xu, Jiayu; Chen, Jiaju; Zhao, Xing; Yao, Dixi; Wang, Xinbing; Li, Baochun

doi:10.1109/icdcs47774.2020.00101

Cited by 18 publications

(4 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These authors propose an iterative algorithm that optimizes the partitioning and compression of a base DNN model. By dynamically adjusting strategies based on rewards, the approach efficiently maximizes performance while meeting resource constraints [71]. This study also validates that pruned models demonstrate accelerated inference and reduced memory usage by introducing GNN-RL, a pruning method that combines graph neural networks (GNNs) and reinforcement learning for topology-aware compression [72].…”

Section: Model Compressionmentioning

confidence: 55%

A Survey of Machine Learning in Edge Computing: Techniques, Frameworks, Applications, Issues, and Research Directions

Jouini,

Sethom,

Namoun

et al. 2024

Technologies

View full text Add to dashboard Cite

Internet of Things (IoT) devices often operate with limited resources while interacting with users and their environment, generating a wealth of data. Machine learning models interpret such sensor data, enabling accurate predictions and informed decisions. However, the sheer volume of data from billions of devices can overwhelm networks, making traditional cloud data processing inefficient for IoT applications. This paper presents a comprehensive survey of recent advances in models, architectures, hardware, and design requirements for deploying machine learning on low-resource devices at the edge and in cloud networks. Prominent IoT devices tailored to integrate edge intelligence include Raspberry Pi, NVIDIA’s Jetson, Arduino Nano 33 BLE Sense, STM32 Microcontrollers, SparkFun Edge, Google Coral Dev Board, and Beaglebone AI. These devices are boosted with custom AI frameworks, such as TensorFlow Lite, OpenEI, Core ML, Caffe2, and MXNet, to empower ML and DL tasks (e.g., object detection and gesture recognition). Both traditional machine learning (e.g., random forest, logistic regression) and deep learning methods (e.g., ResNet-50, YOLOv4, LSTM) are deployed on devices, distributed edge, and distributed cloud computing. Moreover, we analyzed 1000 recent publications on “ML in IoT” from IEEE Xplore using support vector machine, random forest, and decision tree classifiers to identify emerging topics and application domains. Hot topics included big data, cloud, edge, multimedia, security, privacy, QoS, and activity recognition, while critical domains included industry, healthcare, agriculture, transportation, smart homes and cities, and assisted living. The major challenges hindering the implementation of edge machine learning include encrypting sensitive user data for security and privacy on edge devices, efficiently managing resources of edge nodes through distributed learning architectures, and balancing the energy limitations of edge devices and the energy demands of machine learning.

show abstract

Section: Model Compressionmentioning

confidence: 55%

A Survey of Machine Learning in Edge Computing: Techniques, Frameworks, Applications, Issues, and Research Directions

Jouini,

Sethom,

Namoun

et al. 2024

Technologies

View full text Add to dashboard Cite

show abstract

“…As an important research direction to improve QoS in EI, latency optimization has attracted the attention of many researchers. While model compression [23], [24] and model early exit [18], [25] can accelerate the DNN inference, these methods result in a loss of accuracy and are not suitable for intelligent services with high accuracy. Therefore, the model partitioning that has no effect on accuracy is a good choice.…”

Section: Related Workmentioning

confidence: 99%

Game-Based Adaptive FLOPs and Partition Point Decision Mechanism With Latency and Energy-Efficient Tradeoff for Edge Intelligence

Niu,

Huang,

Wang

et al. 2024

IEEE Trans. Comput.

View full text Add to dashboard Cite

As the product of the combination of edge computing and artificial intelligence, edge intelligence (EI) not only solves the problem of insufficient computing capacity of the end device, but also can provide users with various types of intelligent services. However, offline and online model partitioning methods respectively have problems of poor adaptability to the real computing environment and delayed feedback. In addition, previous work on optimizing energy consumption through model partitioning often ignores the latency of intelligent services. Similarly, the energy consumption of end devices and edge servers is usually not considered when optimizing latency. Therefore, we propose game-based adaptive floating-point operations and partition point decision mechanism (GAFPD) to efficiently find the optimal partition point that reduces latency and improves energy efficiency simultaneously in a dynamically changing computing environment. Numerous simulation experiments and robot-based EI system experiments show that GAFPD can simultaneously reduce the latency of intelligent services and improve the energy efficiency of edge devices, while exhibiting strong adaptability to bandwidth changes.

show abstract

“…This step involves identifying the new metadata to maintain the required level of performance for the DNN based on the operational conditions of the edge-cloud system. To quickly adapt to run-time changes, optimal partition point may be identified using an estimation-based approach to predict the latency of individual layers of the DNN [18] or by using a real-time benchmarking approach [6].…”

Section: A Baseline Approachmentioning

confidence: 99%

NEUKONFIG: Reducing Edge Service Downtime When Repartitioning DNNs

Majeed¹,

Kilpatrick²,

Spence³

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep Neural Networks (DNNs) may be partitioned across the edge and the cloud to improve the performance efficiency of inference. DNN partitions are determined based on operational conditions such as network speed. When operational conditions change DNNs will need to be repartitioned to maintain the overall performance. However, repartitioning using existing approaches, such as Pause and Resume, will incur a service downtime on the edge. This paper presents the NEUKONFIG framework that identifies the service downtime incurred when repartitioning DNNs and proposes approaches for reducing edge service downtime. The proposed approaches are based on 'Dynamic Switching' in which, when the network speed changes and given an existing edge-cloud pipeline, a new edge-cloud pipeline is initialised with new DNN partitions. Incoming inference requests are switched to the new pipeline for processing data. Two dynamic switching scenarios are considered: when a second edgecloud pipeline is always running and when a second pipeline is only initialised when the network speed changes. Experimental studies are carried out on a lab-based testbed to demonstrate that Dynamic Switching reduces the downtime by at least an order of magnitude when compared to a baseline using Pause and Resume that has a downtime of 6 seconds. A trade-off in the edge service downtime and memory required is noted. The Dynamic Switching approach that requires the same amount of memory as the baseline reduces the edge service downtime to 0.6 seconds and to less than 1 millisecond in the best case when twice the amount of memory as the baseline is available.

show abstract

Context-Aware Deep Model Compression for Edge Cloud Computing

Cited by 18 publications

References 18 publications

A Survey of Machine Learning in Edge Computing: Techniques, Frameworks, Applications, Issues, and Research Directions

A Survey of Machine Learning in Edge Computing: Techniques, Frameworks, Applications, Issues, and Research Directions

Game-Based Adaptive FLOPs and Partition Point Decision Mechanism With Latency and Energy-Efficient Tradeoff for Edge Intelligence

NEUKONFIG: Reducing Edge Service Downtime When Repartitioning DNNs

Contact Info

Product

Resources

About