2019 IEEE 12th Conference on Service-Oriented Computing and Applications (SOCA) 2019
DOI: 10.1109/soca.2019.00016
|View full text |Cite
|
Sign up to set email alerts
|

Serving Machine Learning Workloads in Resource Constrained Environments: a Serverless Deployment Example

Abstract: Deployed AI platforms typically ship with bulky system architectures which present bottlenecks and a high risk of failure. A serverless deployment can mitigate these factors and provide a cost-effective, automatically scalable (up or down) and elastic real-time on-demand AI solution. However, deploying high complexity production workloads into serverless environments is far from trivial, e.g., due to factors such as minimal allowance for physical codebase size, low amount of runtime memory, lack of GPU support… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 17 publications
(10 citation statements)
references
References 20 publications
0
8
0
Order By: Relevance
“…We note that this paper builds and extends the work that appeared in the IEEE Conference on Service Oriented Computing and Applications (SOCA) 2019 [6]. More specifically, the present paper includes a more elaborate treatment of the data handling aspects (lookup and storage), reports on additional experiments and associated evaluation, takes a closer look at related work and the more general context of infrastructures for serving AI workloads in the cloud, although this piece of work is fairly novel.…”
Section: Introductionmentioning
confidence: 79%
“…We note that this paper builds and extends the work that appeared in the IEEE Conference on Service Oriented Computing and Applications (SOCA) 2019 [6]. More specifically, the present paper includes a more elaborate treatment of the data handling aspects (lookup and storage), reports on additional experiments and associated evaluation, takes a closer look at related work and the more general context of infrastructures for serving AI workloads in the cloud, although this piece of work is fairly novel.…”
Section: Introductionmentioning
confidence: 79%
“…Some recent works have shown that the large-scale parallelism and autoscaling features provided by serverless platforms make them well-suited for burst-parallel ne-grained tasks and parallel computation work ows [12]. In essence, the FaaS model is apt for embarrassingly parallel computing use cases such as linear algebra [31], optimization algorithms [10], data analytics [30], and real-time machine learning classi cations [9].…”
Section: A Serverless and Its Challengesmentioning
confidence: 99%
“…While cloud services offer scalable computation resources, embedded system have hard constraints. ML specific options are, e.g., to optimize towards the target hardware [142] regarding CPU and GPU availability, to optimize towards the target operation system (demonstrated for Android and iOS by [143]) or to optimize the ML workload for a specific platform [144]. Monitoring and maintenance (Section 3.6) need be considered in the overall architecture.…”
Section: Deploymentmentioning
confidence: 99%