2020
DOI: 10.48550/arxiv.2007.01793
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CacheNet: A Model Caching Framework for Deep Learning Inference on the Edge

Abstract: The success of deep neural networks (DNN) in machine perception applications such as image classification and speech recognition comes at the cost of high computation and storage complexity. Inference of uncompressed large scale DNN models can only run in the cloud with extra communication latency back and forth between cloud and end devices, while compressed DNN models achieve real-time inference on end devices at the price of lower predictive accuracy. In order to have the best of both worlds (latency and ac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…To achieve the same accuracy level of 90% with traditional training, we had to increase the resolution and scaling, resulting in a configuration of N N ⟨r =1, s=2⟩ that was 6.6× bigger (35.6 MB). Furthermore, compared to image classification tasks based on CIFAR-10 deployed on the target device as described in [11] and [12], our approach achieved approximately 1.6× and 1.9× better frame rate, respectively, with comparable (90% vs. 93% for [12]) or better accuracy (90% vs. 82.9% for [11]).…”
Section: Multi-objective Solutionsmentioning
confidence: 96%
“…To achieve the same accuracy level of 90% with traditional training, we had to increase the resolution and scaling, resulting in a configuration of N N ⟨r =1, s=2⟩ that was 6.6× bigger (35.6 MB). Furthermore, compared to image classification tasks based on CIFAR-10 deployed on the target device as described in [11] and [12], our approach achieved approximately 1.6× and 1.9× better frame rate, respectively, with comparable (90% vs. 93% for [12]) or better accuracy (90% vs. 82.9% for [11]).…”
Section: Multi-objective Solutionsmentioning
confidence: 96%
“…Kumar [ 21 ] observed that caching intermediate layer outputs can help avoid running all the layers of a DNN for a sizeable fraction of inference requests [ 22 ]. They proposed approximate caching (in different domains [ 23 ]) at each intermediate layer.…”
Section: Related Workmentioning
confidence: 99%
“…For device-level caching, researchers investigate the spatio-temporal locality of users within an area to cache repeatedly requested computation results on edge server [47]- [49], [130]- [133]. In addition, there are also some works propose to cache multiple deep learning models on edge server for specialized missions to improve the quality of service [134]- [137]. Some applications on eye gaze tracking [138] and voice assistant [139], [140] based on computation caching are introduced at last.…”
Section: A Edge Cachingmentioning
confidence: 99%
“…Aforementioned solutions require users to upload data to edge server for processing, which leads to relatively high latency (but much lower than cloud-based solutions). Fang et al propose a caching scheme with the joint consideration of latency and accuracy [137]. A complex model is partitioned and distributed between edge devices and the cloud server.…”
Section: B Computation Cachingmentioning
confidence: 99%