Serving Machine Learning Workloads in Resource Constrained Environments: a Serverless Deployment Example

Christidis, Angelos; Davies, Roy; Moschoyiannis, Sotiris

doi:10.1109/soca.2019.00016

Cited by 17 publications

(10 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We note that this paper builds and extends the work that appeared in the IEEE Conference on Service Oriented Computing and Applications (SOCA) 2019 [6]. More specifically, the present paper includes a more elaborate treatment of the data handling aspects (lookup and storage), reports on additional experiments and associated evaluation, takes a closer look at related work and the more general context of infrastructures for serving AI workloads in the cloud, although this piece of work is fairly novel.…”

Section: Introductionmentioning

confidence: 79%

Enabling Serverless Deployment of Large-Scale AI Workloads

et al. 2020

Self Cite

View full text Add to dashboard Cite

We propose a set of optimization techniques for transforming a generic AI codebase so that it can be successfully deployed to a restricted serverless environment, without compromising capability or performance. These involve (1) slimming the libraries and frameworks (e.g., pytorch) used, down to pieces pertaining to the solution; (2) dynamically loading pre-trained AI/ML models into local temporary storage, during serverless function invocation; (3) using separate frameworks for training and inference, with ONNX model formatting; and, (4) performance-oriented tuning for data storage and lookup. The techniques are illustrated via worked examples that have been deployed live on geospatial data from the transportation domain. This draws upon a real-world case study in intelligent transportation looking at on-demand, real-time predictions of flows of train movements across the UK rail network. Evaluation of the proposed techniques shows the response time, for varying volumes of queries involving prediction, to remain almost constant (at 50 ms), even as the database scales up to the 250M entries. The query response time is important in this context as the target is predicting train delays. It is even more important in a serverless environment due to the stringent constraints on serverless functions' runtime before timeout. The similarities of a serverless environment to other resource constrained environments (e.g., IoT, telecoms) means the techniques can be applied to a range of use cases.

show abstract

Section: Introductionmentioning

confidence: 79%

Enabling Serverless Deployment of Large-Scale AI Workloads

et al. 2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…Some recent works have shown that the large-scale parallelism and autoscaling features provided by serverless platforms make them well-suited for burst-parallel ne-grained tasks and parallel computation work ows [12]. In essence, the FaaS model is apt for embarrassingly parallel computing use cases such as linear algebra [31], optimization algorithms [10], data analytics [30], and real-time machine learning classi cations [9].…”

Section: A Serverless and Its Challengesmentioning

confidence: 99%

ReactiveFnJ: A Choreographed Model for Fork-Join Workflow in Serverless Computing

Bharti¹,

Goel

Gupta

2022

Preprint

View full text Add to dashboard Cite

Function-as-a-Service (FaaS) is an event-based reactive programming model where functions run in ephemeral stateless containers for short duration. For building complex serverless applications, function composition is crucial to coordinate and synchronize the workflow of an application. Some serverless orchestration systems exist, but they are in their primitive state and do not provide inherent support for non-trivial workflows like, Fork-Join. To address this gap, we propose a fully serverless and scalable design model ReactiveFnJ for Fork-Join workflow. The intent of this work is to illustrate a design which is completely choreographed, reactive, asynchronous, and represents a dynamic composition model for serverless applications. Our design uses two innovative patterns, namely, Relay Composition and Master-Worker Composition to solve execution time-out challenges. As a Proof-of-Concept (PoC), the prototypical implementation of Split-Sort-Merge use case, based on Fork-Join workflow is discussed and evaluated. The ReactiveFnJ handles embarrassingly parallel computations, and its design does not depend on any external orchestration services, messaging services, and queue services. ReactiveFnJ facilitates in designing fully automated pipelines for distributed data processing systems, satisfying the Serverless Trilemma in true essence. A file of any size can be processed using our effective and extensible design without facing execution time-out challenges. The proposed model is generic and can be applied to a wide range of serverless applications that are based on the Fork-Join workflow pattern. It fosters the choreographed serverless composition for complex workflows. The proposed design model is useful for software engineers and developers in industry and commercial organizations, total solution vendors and academic researchers.

show abstract

“…While cloud services offer scalable computation resources, embedded system have hard constraints. ML specific options are, e.g., to optimize towards the target hardware [142] regarding CPU and GPU availability, to optimize towards the target operation system (demonstrated for Android and iOS by [143]) or to optimize the ML workload for a specific platform [144]. Monitoring and maintenance (Section 3.6) need be considered in the overall architecture.…”

Section: Deploymentmentioning

confidence: 99%

Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology

Studer

Bui

Drescher

et al. 2021

MAKE

129

View full text Add to dashboard Cite

Machine learning is an established and frequently used technique in industry and academia, but a standard process model to improve success and efficiency of machine learning applications is still missing. Project organizations and machine learning practitioners face manifold challenges and risks when developing machine learning applications and have a need for guidance to meet business expectations. This paper therefore proposes a process model for the development of machine learning applications, covering six phases from defining the scope to maintaining the deployed machine learning application. Business and data understanding are executed simultaneously in the first phase, as both have considerable impact on the feasibility of the project. The next phases are comprised of data preparation, modeling, evaluation, and deployment. Special focus is applied to the last phase, as a model running in changing real-time environments requires close monitoring and maintenance to reduce the risk of performance degradation over time. With each task of the process, this work proposes quality assurance methodology that is suitable to address challenges in machine learning development that are identified in the form of risks. The methodology is drawn from practical experience and scientific literature, and has proven to be general and stable. The process model expands on CRISP-DM, a data mining process model that enjoys strong industry support, but fails to address machine learning specific tasks. The presented work proposes an industry- and application-neutral process model tailored for machine learning applications with a focus on technical tasks for quality assurance.

show abstract

Serving Machine Learning Workloads in Resource Constrained Environments: a Serverless Deployment Example

Cited by 17 publications

References 20 publications

Enabling Serverless Deployment of Large-Scale AI Workloads

Enabling Serverless Deployment of Large-Scale AI Workloads

ReactiveFnJ: A Choreographed Model for Fork-Join Workflow in Serverless Computing

Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology

Contact Info

Product

Resources

About