We present OptEx, a closed-form model of job execution on Apache Spark, a
popular parallel processing engine. To the best of our knowledge, OptEx is the
first work that analytically models job completion time on Spark. The model can
be used to estimate the completion time of a given Spark job on a cloud, with
respect to the size of the input dataset, the number of iterations, the number
of nodes comprising the underlying cluster. Experimental results demonstrate
that OptEx yields a mean relative error of 6% in estimating the job completion
time. Furthermore, the model can be applied for estimating the cost optimal
cluster composition for running a given Spark job on a cloud under a completion
deadline specified in the SLO (i.e., Service Level Objective). We show
experimentally that OptEx is able to correctly estimate the cost optimal
cluster composition for running a given Spark job under an SLO deadline with an
accuracy of 98%.Comment: 10 pages, IEEE CCGrid 201
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.