The 25th Annual International Conference on Mobile Computing and Networking 2019
DOI: 10.1145/3300061.3345447
|View full text |Cite
|
Sign up to set email alerts
|

Occlumency

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 76 publications
(12 citation statements)
references
References 16 publications
0
5
0
Order By: Relevance
“…Origami achieves 11× performance improvement than a pure SGX approach, and 15.1× performance improvement compared to Slalom [142]. Aiming to address the memory limitation and page swapping of SGX enclave, Lee et al [84] carefully developed on-demand weights loading, memory-efficient inference, and parallel processing pipelines in their proposed Occlumency, which gives a 3.6× speedup compared to a pure TEE-based method and 72% latency overhead compared to a pure GPU approach. In [125], Alexander et al presented the eNNclave tool-chain to cut TensorFlow models at any layers and split them into public and enclave layers, where GPU performs public layers for acceleration.…”
Section: Research Status Of Soi Due To the Similarity Of Computation ...mentioning
confidence: 99%
“…Origami achieves 11× performance improvement than a pure SGX approach, and 15.1× performance improvement compared to Slalom [142]. Aiming to address the memory limitation and page swapping of SGX enclave, Lee et al [84] carefully developed on-demand weights loading, memory-efficient inference, and parallel processing pipelines in their proposed Occlumency, which gives a 3.6× speedup compared to a pure TEE-based method and 72% latency overhead compared to a pure GPU approach. In [125], Alexander et al presented the eNNclave tool-chain to cut TensorFlow models at any layers and split them into public and enclave layers, where GPU performs public layers for acceleration.…”
Section: Research Status Of Soi Due To the Similarity Of Computation ...mentioning
confidence: 99%
“…Model Execution with Memory Budget. To accommodate tight memory budgets, two popular solutions are explored in the literature: model compression [10], [11], [12], [13], [14], [15], [16], [52] and offloading [17], [18], [19], [20], [21], [22]. Model compression techniques reduce the model size by removing redundant parameters such as layers, filters and channels [10], lowering the parameter precision [15], searching efficient model architectures [16], etc.…”
Section: Related Workmentioning
confidence: 99%
“…However, when a model is compressed, its accuracy or robustness is often compromised, which is not favorable in mission-critical applications, e.g., self-driving. Research on offloading dynamically changes model partition position [22], optimizes offloading patterns to reduce delay [53], improves the inference privacy [20], etc. Offloading does not harm model accuracy but requires network connections, making it vulnerable to network fluctuations.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The inference latency has undoubtedly become the most severe obstacle for SGX to develop deep learning systems in the cloud [8,9]. Taegyeong Lee [10] found that deep learning inference inside an Enclave is up to 6.4 times slower than running outside the Enclave. The main reason for the performance degradation is the hardware design limitation.…”
Section: Introductionmentioning
confidence: 99%