Mining performance data for metascheduling decision support in the Grid

Li, Hui; Groep, David; Wolters, Lex

doi:10.1016/j.future.2006.04.009

Cited by 28 publications

(33 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Predictions of performance metrics, such as application run times and queue wait times on clusters, serve as important information for scheduling decision making at the Grid level. The main patterns that we identify for data-intensive clusters, namely periodicity, long range dependence, and temporal locality, suggest that prediction techniques based on historical data modeling would most likely work on real production systems [21,15]. The Grid-level scheduling strategies can also take advantages of specific VO job arrival patterns.…”

Section: Modeling and Predictionsmentioning

confidence: 91%

Towards A Better Understanding of Workload Dynamics on Data-Intensive Clusters and Grids

Wolters

2007

2007 IEEE International Parallel and Distributed Processing Symposium

Self Cite

View full text Add to dashboard Cite

This paper presents a comprehensive statistical analysis of workloads collected on data-intensive clusters and Grids. The analysis is conducted at different levels, including Virtual Organization (VO) and user behavior. The aggregation procedure and scaling analysis are applied to job arrival processes, leading to the identification of several basic patterns, namely, pseudo-periodicity, long range dependence (LRD), and (multi)fractals. It is shown that statistical measures based on interarrivals are of limited usefulness and count based measures should be trusted instead when it comes to correlations. We also study workload characteristics like job run time, memory consumption, and cross correlations between these characteristics. A "bag-of-tasks" behavior is empirically proved, strongly indicating temporal locality. We argue that pseudo-periodicity, LRD, and "bag-of-tasks" behavior are important workload properties on data-intensive clusters and Grids, which are not present in traditional parallel workloads. This study has important implications on workload modeling and performance predictions in data-intensive Grid environments.

show abstract

Section: Modeling and Predictionsmentioning

confidence: 91%

Towards A Better Understanding of Workload Dynamics on Data-Intensive Clusters and Grids

Wolters

2007

2007 IEEE International Parallel and Distributed Processing Symposium

Self Cite

View full text Add to dashboard Cite

show abstract

“…Li et al [28] present an Instance Based Learning technique to forecast response times of jobs in grids by means of historical performance data mining. This approach is based on the definition of similarity between jobs.…”

Section: B Multi-stage Predictor Evaluationmentioning

confidence: 99%

Grid Global Behavior Prediction

Montes

Sánchez

Pérez

2011

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

View full text Add to dashboard Cite

Abstract-Complexity has always been one of the most important issues in distributed computing. From the first clusters to grid and now cloud computing, dealing correctly and efficiently with system complexity is the key to taking technology a step further. In this sense, global behavior modeling is an innovative methodology aimed at understanding the grid behavior. The main objective of this methodology is to synthesize the grid's vast, heterogeneous nature into a simple but powerful behavior model, represented in the form of a single, abstract entity, with a global state. Global behavior modeling has proved to be very useful in effectively managing grid complexity but, in many cases, deeper knowledge is needed. It generates a descriptive model that could be greatly improved if extended not only to explain behavior, but also to predict it. In this paper we present a prediction methodology whose objective is to define the techniques needed to créate global behavior prediction models for grid systems. This global behavior prediction can benefit grid management, specially in áreas such as fault tolerance or job scheduling. The paper presents experimental results obtained in real scenarios in order to valídate this approach.

show abstract

“…Weighted Average (WA) and Locally Weighted Linear Regression (LLWR) are used as the candidate induction models for predictions. We refer to [9] for details and formulations of the basic prediction algorithm.…”

Section: The Basic Prediction Algorithmmentioning

confidence: 99%

“…The sequential search is relatively slow as it has to calculate distances with all entries in the history base. Since it involves resource state attributes, the distance calculations for queue wait times are much more expensive and it cannot employ caching like run times without compromising accuracy [9]. To improve performance a different access structure is needed and we investigate M-Tree in this context.…”

Section: Nearest Neighbor Searchmentioning

confidence: 99%

Improving a Local Learning Technique for QueueWait Time Predictions

Chen

Tang

et al. 2006

Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)

View full text Add to dashboard Cite

Local learning has been proposed as a common framework to predict both application run times and queue wait times based on workload traces. The queue wait time is shown to be more difficult and expensive to predict because its distance calculations typically involve not only job attributes but also resource states. In this paper methods and algorithms are investigated to improve prediction accuracy and prediction performance for queue wait times. Firstly, the so-called "local tuning" is adopted to tune parameters for each training subset divided by a pivot attribute (e.g., group or queue name). Bias-variance analysis of error is conducted on local tuning and its global counterparts -tuning parameters on the whole training set. A method is then developed to select tuning type adaptively based on the generalization error and bias-variance decomposition. Secondly, an efficient search tree structure called "M-Tree" is integrated into our algorithm to speed up k-nearest neighbor search. Experimental studies are conducted to evaluate the proposed methods and algorithms using real-world workload traces, which are collected from the NIKHEF production cluster on the LHC Computer Grid and Blue Horizon in the San Diego Supercomputer Center (SDSC). The results show that adaptive tuning can reduce the average prediction error by 3 to 10 percents compared to global tuning, and that the M-Tree nearest neighbor search is up to 8 times faster than the original sequential search.

show abstract

Mining performance data for metascheduling decision support in the Grid

Cited by 28 publications

References 10 publications

Towards A Better Understanding of Workload Dynamics on Data-Intensive Clusters and Grids

Towards A Better Understanding of Workload Dynamics on Data-Intensive Clusters and Grids

Grid Global Behavior Prediction

Improving a Local Learning Technique for QueueWait Time Predictions

Contact Info

Product

Resources

About