Optimizing Speculative Execution in Spark Heterogeneous Environments

Fu, Zhongming

doi:10.1109/tcc.2019.2947674

Cited by 11 publications

(3 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A considerable portion of both analytical and learningbased approaches have reported the employment of sampling and micro-benchmarking. In several researches [20], [39], [40], [41], linear regression of selected sample executions are considered as the predictor for the actual-size performance of a Hadoop application. Based on a similar sampling approach, more sophisticated learning techniques have been adopted such as deep reinforcement learning [12] or combining multiple regression models each for a single stage of the whole application [13].…”

Section: Related Workmentioning

confidence: 99%

Fixed-Point Iteration Approach to Spark Scalable Performance Modeling and Evaluation

Karimian-Aliabadi

Aseman-Manzar

Entezari‐Maleki

et al. 2023

IEEE Trans. Cloud Comput.

View full text Add to dashboard Cite

Companies depend on mining data to grow their business more than ever. To achieve optimal performance of Big Data analytics workloads, a careful configuration of the cluster and the employed software framework is required. The lack of flexible and accurate performance models, however, render this a challenging task. This paper fills this gap by presenting accurate performance prediction models based on Stochastic Activity Networks (SANs). In contrast to existing work, the presented models consider multiple work queues, a critical feature to achieve high accuracy in realistic usage scenarios. We first introduce a monolithic analytical model for a multi-queue YARN cluster running DAG-based Big Data applications that models each queue individually. To overcome the limited scalability of the monolithic model, we then present a fixed-point model that iteratively computes the throughput of a single queue with respect to the rest of the system until a fixed-point is reached. The models are evaluated on a real-world cluster running the widely-used Apache Spark framework and the YARN scheduler. Experiments with the common transaction-based TPC-DS benchmark show that the proposed models achieve an average error of only 5.6% in predicting the execution time of the Spark jobs. The presented models enable businesses to optimize their cluster configuration for a given workload and thus to reduce their expenses and minimize service level agreement (SLA) violations. Makespan minimization and per-stage analysis are examined as representative efforts to further assess the applicability of our proposition.

show abstract

Section: Related Workmentioning

confidence: 99%

Fixed-Point Iteration Approach to Spark Scalable Performance Modeling and Evaluation

Karimian-Aliabadi

Aseman-Manzar

Entezari‐Maleki

et al. 2023

IEEE Trans. Cloud Comput.

View full text Add to dashboard Cite

show abstract

“…Since every node's capability may vary, it is essential to have an appropriate metric to measure the performance of heterogeneity nodes. Therefore, the capability of a node can be obtained through the amount of tasks completed and total tasks processed as in ( 15) [33]:…”

Section: ) Backup Straggler Task On Proper Nodementioning

confidence: 99%

“…In this section, the performance of the proposed framework is assessed on a spark cluster with a diverse set of nodes. Also, the proposed framework is compared with Spark-Default, Spark-Speculation and the work in [33] (marked as Spark-ETWR) in various benchmarks at different input sizes. The performance is evaluated in terms of the job execution time that refers to the elapsed time from the beginning to the end of the job in seconds.…”

Section: Performance Evaluationmentioning

confidence: 99%

An Optimized Straggler Mitigation Framework for Large-Scale Distributed Computing Systems

et al. 2022

View full text Add to dashboard Cite

Nowadays, Big Data becomes a research focus in industrial, banking, social network, and other fields. In addition, the explosive increase of data and information require efficient processing solutions. Therefore, Spark is considered as a promising candidate of Large-Scale Distributed Computing Systems for big data processing. One primary challenge is the straggler problem that occurred due to the presence of heterogeneity where a machine takes an extra-long time to finish execution of a task, which decreases the system throughput. To mitigate straggler tasks, Spark adopts speculative execution mechanism, in which the scheduler launches additional backup to avoid slow task processing and achieve acceleration. In this paper, a new Optimized Straggler Mitigation Framework is proposed. The proposed framework uses a dynamic criterion to determine the closest straggler tasks. This criterion is based on multiple coefficients to achieve a reliable straggler decision. Also, it integrates the historical data analysis and online adaptation for intelligent straggler judgment. This guarantees the effectiveness of speculative tasks by improving cluster performance. Experimental results on various benchmarks and applications show that the proposed framework achieves 23.5% to 30.7% execution time reductions, and 25.4 to 46.3% increase of the cluster throughputs compared with spark engine.

show abstract

TS-REPLICA: A novel replica placement algorithm based on the entropy weight TOPSIS method in spark for multimedia data analysis

Liu

Xie

Chen

et al. 2023

Information Sciences

View full text Add to dashboard Cite

Optimizing Speculative Execution in Spark Heterogeneous Environments

Cited by 11 publications

References 28 publications

Fixed-Point Iteration Approach to Spark Scalable Performance Modeling and Evaluation

Fixed-Point Iteration Approach to Spark Scalable Performance Modeling and Evaluation

An Optimized Straggler Mitigation Framework for Large-Scale Distributed Computing Systems

TS-REPLICA: A novel replica placement algorithm based on the entropy weight TOPSIS method in spark for multimedia data analysis

Contact Info

Product

Resources

About