2022
DOI: 10.48550/arxiv.2202.11243
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MLProxy: SLA-Aware Reverse Proxy for Machine Learning Inference Serving on Serverless Computing Platforms

Abstract: Serving machine learning inference workloads on the cloud is still a challenging task on the production level. Optimal configuration of the inference workload to meet SLA requirements while optimizing the infrastructure costs is highly complicated due to the complex interaction between batch configuration, resource configurations, and variable arrival process. Serverless computing has emerged in recent years to automate most infrastructure management tasks.Workload batching has revealed the potential to improv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 16 publications
0
2
0
Order By: Relevance
“…Towards this direction, several works with different approaches have been conducted, aiming to address the problem of efficient resource orchestration and optimization for ML inference serving systems. Adaptive and pre-defined batching techniques [30]- [33] have been introduced aiming to support ML inference, while auto-scaling approaches are also considered [32], [34], [35], [35], [36]. Furthermore, aiming to support efficient ML inference, serverless approaches have been considered [33], [37], while MLbased and predictive solutions for load request and resource utilization have been widely utilized [13], [14], [38]- [43], [43], [44], such as reinforcement learning-based solutions [43].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Towards this direction, several works with different approaches have been conducted, aiming to address the problem of efficient resource orchestration and optimization for ML inference serving systems. Adaptive and pre-defined batching techniques [30]- [33] have been introduced aiming to support ML inference, while auto-scaling approaches are also considered [32], [34], [35], [35], [36]. Furthermore, aiming to support efficient ML inference, serverless approaches have been considered [33], [37], while MLbased and predictive solutions for load request and resource utilization have been widely utilized [13], [14], [38]- [43], [43], [44], such as reinforcement learning-based solutions [43].…”
Section: Related Workmentioning
confidence: 99%
“…Adaptive and pre-defined batching techniques [30]- [33] have been introduced aiming to support ML inference, while auto-scaling approaches are also considered [32], [34], [35], [35], [36]. Furthermore, aiming to support efficient ML inference, serverless approaches have been considered [33], [37], while MLbased and predictive solutions for load request and resource utilization have been widely utilized [13], [14], [38]- [43], [43], [44], such as reinforcement learning-based solutions [43]. Aiming to provide lightweight decision making, the approach of model-less decision making has also been considered as an alternative [27], [45].…”
Section: Related Workmentioning
confidence: 99%