Performance Modeling of Serverless Computing Platforms

Mahmoudi, Nima; Khazaei, Hamzeh

doi:10.1109/tcc.2020.3033373

Cited by 58 publications

(39 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Performance and cost predictions under diferent resource conigurations in serverless settings are explored in [2] and [36]. Mahmoudi et al also propose an analytical model to help developers to extract performance metrics for their applications before the actual deployment [84]. In particular, their model enables the calculation of the cold start probability, average response time and the required average number of function instances, under stable conditions.…”

Section: Elements Of Resource Managementmentioning

confidence: 99%

A Holistic View on Resource Management in Serverless Computing Environments: Taxonomy and Future Directions

2022

View full text Add to dashboard Cite

Serverless computing has emerged as an attractive deployment option for cloud applications in recent times. The unique features of this computing model include rapid auto-scaling, strong isolation, fine-grained billing options and access to a massive service ecosystem which autonomously handles resource management decisions. This model is increasingly being explored for deployments in geographically distributed edge and fog computing networks as well, due to these characteristics. Effective management of computing resources has always gained a lot of attention among researchers. The need to automate the entire process of resource provisioning, allocation, scheduling, monitoring and scaling, has resulted in the need for specialized focus on resource management under the serverless model. In this article, we identify the major aspects covering the broader concept of resource management in serverless environments and propose a taxonomy of elements which influence these aspects, encompassing characteristics of system design, workload attributes and stakeholder expectations. We take a holistic view on serverless environments deployed across edge, fog and cloud computing networks. We also analyse existing works discussing aspects of serverless resource management using this taxonomy. This article further identifies gaps in literature and highlights future research directions for improving capabilities of this computing model.

show abstract

Section: Elements Of Resource Managementmentioning

confidence: 99%

A Holistic View on Resource Management in Serverless Computing Environments: Taxonomy and Future Directions

2022

View full text Add to dashboard Cite

show abstract

“…Using our proposed platform, one can benefit the scale-to-zero capabilities of serverless computing while still having the ability to serve high-traffic workloads. In previous studies, we have developed and evaluated steady-state and transient performance models along with simulators for serverless computing platforms [17][18][19] with homogeneous workloads. However, the unique characteristics and challenges in machine learning inference workloads, along with the ever-lasting need for adaptive methods for optimization components, led to the development of MLProxy.…”

Section: Related Workmentioning

confidence: 99%

MLProxy: SLA-Aware Reverse Proxy for Machine Learning Inference Serving on Serverless Computing Platforms

Mahmoudi¹,

Khazaei²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Serving machine learning inference workloads on the cloud is still a challenging task on the production level. Optimal configuration of the inference workload to meet SLA requirements while optimizing the infrastructure costs is highly complicated due to the complex interaction between batch configuration, resource configurations, and variable arrival process. Serverless computing has emerged in recent years to automate most infrastructure management tasks.Workload batching has revealed the potential to improve the response time and cost-effectiveness of machine learning serving workloads. However, it has not yet been supported out of the box by serverless computing platforms. Our experiments have shown that for various machine learning workloads, batching can hugely improve the system's efficiency by reducing the processing overhead per request.In this work, we present MLProxy, an adaptive reverse proxy to support efficient machine learning serving workloads on serverless computing systems. MLProxy supports adaptive batching to ensure SLA compliance while optimizing serverless costs. We performed rigorous experiments on Knative to demonstrate the effectiveness of MLProxy. We showed that MLProxy could reduce the cost of serverless deployment by up to 92% while reducing SLA violations by up to 99% that can be generalized across state-of-the-art model serving frameworks.

show abstract

“…Modern auto-scaling mechanisms are extremely reactive in the sense that they adapt capacity relying on fresh observations of the system state rather than historical data. This especially holds true in serverless computing platforms, or Function-asa-Service, which nowadays provide the convenient solution to deploy any type of application or backend service [22].…”

Section: Introductionmentioning

confidence: 99%

“…Here, auto-scaling mechanisms are extremely reactive and the decisions of turning servers on or off are based on instantaneous observations of the current system state rather than on the long-run equilibrium behavior or historical data. Therefore, the timescale separation assumption above becomes arguable [22] because it would mean to assume that job dynamics achieve stochastic equilibrium between consecutive changes of N , i.e., in milliseconds.…”

Section: Introductionmentioning

confidence: 99%

“…While there is a large body of literature investigating load balancing and auto-scaling separately, very little has been done when both are applied jointly within the same timescale. Most of existing works focused on synchronous and centralized architectures where all servers share a common queue [15,22], where synchronous means that scale-up and dispatching decisions are taken simultaneously. In typical cloud architectures however, no central queue is maintained as it would affect scalability.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Asynchronous Load Balancing and Auto-scaling: Mean-Field Limit and Optimal Design

Anselmi¹

2022

Preprint

View full text Add to dashboard Cite

We introduce a Markovian framework for load balancing where classical algorithms such as Power-of-d are combined with asynchronous auto-scaling features. These allow the net service capacity to scale up or down in response to the current load within the same timescale of job dynamics. This is inspired by serverless frameworks such as Knative, used among others by Google Cloud Run, where servers are software functions that can be flexibly instantiated in milliseconds according to user-defined scaling rules. In this context, load balancing and auto-scaling are employed together to optimize both user-perceived delay performance and energy consumption. In the literature, these mechanisms are synchronous or rely on a central queue. The architectural novelty of our work is to consider an asynchronous and decentralized system, as in Knative, which takes scalability to the next level.Under a general assumption on the auto-scaling process, we prove a mean-field limit theorem that provides an accurate and tractable approximation for the system dynamics when the mean demand and nominal service capacity grow large in proportion. We characterize the fixed points of the mean-field limit model and provide a simple condition telling whether or not all the available servers need to be turned on to handle the incoming demand. Then, we investigate how to design optimal auto-scaling rules and find a general condition able to drive the mean-field dynamics to delay and relative energy optimality, a situation where the user-perceived delay and the relative energy wastage induced by idle servers vanish. The proposed optimality condition suggests to scale up capacity if and only if the mean demand exceeds the overall rate at which servers become idle-on, i.e., idle and active. This yields the definition of tractable optimization frameworks to trade off between energy and performance, which we show as an application of our work.

show abstract

Performance Modeling of Serverless Computing Platforms

Cited by 58 publications

References 51 publications

A Holistic View on Resource Management in Serverless Computing Environments: Taxonomy and Future Directions

A Holistic View on Resource Management in Serverless Computing Environments: Taxonomy and Future Directions

MLProxy: SLA-Aware Reverse Proxy for Machine Learning Inference Serving on Serverless Computing Platforms

Asynchronous Load Balancing and Auto-scaling: Mean-Field Limit and Optimal Design

Contact Info

Product

Resources

About