Proceedings of the 36th ACM International Conference on Supercomputing 2022
DOI: 10.1145/3524059.3532366
|View full text |Cite
|
Sign up to set email alerts
|

Pame

Abstract: In emerging DNN serving systems, queries are usually batched to fully leverage hardware resources, and all the queries in a batch run through the complete model and return at the same time. According to our findings, some queries only need to pass through a portion of the DNN model to attain sufficient precision in a DNN service. These queries can have shorter latencies if they can return early in the middle of a model. Therefore, we propose precision-aware multiexit inference serving, PAME, to achieve the abo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
references
References 33 publications
0
0
0
Order By: Relevance