Abstract-In market-based Grid systems, a main aim is to execute jobs with considered quality of service requirements based on user defined budget. Since grid has heterogonous resource with unpredictable faults, the user cost constraints and expected service requirements may not provided. Therefore, using a better approach to resource scheduling to reduce fault is necessary. This paper presents a predictive approach on fault tolerance mechanisms for faultless job scheduling on market-based grids. The Case-Based Reasoning technique has been used for selecting fault tolerant nodes. This approach applies a specific structure in order to prepare fault tolerance between provider nodes to retain system in a safe state with minimum data transferring. Certainly, this algorithm increases fault tolerant confidence therefore, performance of grid will be high.Index Terms-Market-based grid, fault tolerance, case-based reasoning, job scheduling.
I. INTRODUCTIONGrid computing is an amazing infrastructure to solve some problems that need to strong and heavy computation with very long time execution [1]. It is cooperation of different computers, for a specific task, so that the user acquires better performance for that specific task. In this environment, the resources are geographically distributed, but in logical aspect, these are as virtual single resource with high performance [2]. Grid computing allows a group of computers to share the system securely and optimizes their collective resources to meet required workloads by using open standards OGSA (Open Grid Services Architecture) [3]. The Grid allows executing jobs in different nodes. In order to perform job scheduling and resource management at Grid level, usually it has used a Resource scheduler or a meta-scheduler. A scheduler is fundamental in any large-scale Grid environment. The task of a Grid resource scheduler is to dynamically identify and characterize the available resources, and to select the most appropriate resources in order to submit jobs. In grid scheduling discussion, selecting best nodes with looking at economic and fault tolerance criteria is considerable [4]. Choosing the suitable fault tolerance resource for a user job to meet predefined constraints such as deadline, speedup and cost of execution is an important problem in the grids. In our approach, we highly have solved some of these problems.As known, grid scheduling consists of three steps. The first step is resource discovery and filtering, and the second is selecting nodes and scheduling jobs to related nodes, and the last step is submitting and monitoring jobs. Surely, step 2 is vital because some nodes always have best behavior, while some others often have fault with low performance [5].In scheduling phase on grid, schedulers usually use some information about resources" attributes (CPU speed and load, memory) to do the scheduling. The information used by the schedulers is usually provided by an information service that is responsible for gathering data about all resources that compose the grid. ...