A hybrid fault tolerance technique in grid computing system

Qureshi, Kalim; Khan, Fiaz Gul; Manuel, Paul; Nazir, Babar

doi:10.1007/s11227-009-0345-y

Cited by 23 publications

(15 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A classification of techniques has been proposed by [106] which distinguishes between two classes of failure handling techniques, namely, task level failure handling and workflow level failure handling. The recovery techniques that can be performed at the task level for masking the fault effects are called task level techniques.…”

Section: Task Resubmissionmentioning

confidence: 99%

“…In other words, workflow level FTTs change the flow of execution on failure according to the knowledge of task execution context. They can also be classified into four different types: alternate task, redundancy, user defined exception handling and rescue workflow [106]. The only difference between alternate task and retry technique is that alternate task exchanges a task with a different implementation of the same task with different execution characteristics on the failure of the first one.…”

Section: Task Resubmissionmentioning

confidence: 99%

See 1 more Smart Citation

Reliability and high availability in cloud computing environments: a reference roadmap

Mesbahi

Rahmani

Hosseinzadeh

2018

Hum. Cent. Comput. Inf. Sci.

View full text Add to dashboard Cite

Reliability and high availability have always been a major concern in distributed systems. Providing highly available and reliable services in cloud computing is essential for maintaining customer confidence and satisfaction and preventing revenue losses. Although various solutions have been proposed for cloud availability and reliability, but there are no comprehensive studies that completely cover all different aspects in the problem. This paper presented a ‘Reference Roadmap’ of reliability and high availability in cloud computing environments. A big picture was proposed which was divided into four steps specifying through four pivotal questions starting with ‘Where?’, ‘Which?’, ‘When?’ and ‘How?’ keywords. The desirable result of having a highly available and reliable cloud system could be gained by answering these questions. Each step of this reference roadmap proposed a specific concern of a special portion of the issue. Two main research gaps were proposed by this reference roadmap.

show abstract

Section: Task Resubmissionmentioning

confidence: 99%

Section: Task Resubmissionmentioning

confidence: 99%

Reliability and high availability in cloud computing environments: a reference roadmap

Mesbahi

Rahmani

Hosseinzadeh

2018

Hum. Cent. Comput. Inf. Sci.

View full text Add to dashboard Cite

show abstract

“…To attain high levels of availability and reliability, the infrastructure of grid must be fault tolerant (Qureshi et al 2011). Avizienis et al (2004) presented a dependability taxonomy that has been extended by incorporating more factors extracted from the literature.…”

Section: Challenges In Grid Dependabilitymentioning

confidence: 99%

“…Similarly, the design goals have also been identified that can lead us to more reliable, available, and secure grid environments. Previously identified and published research (Nazir et al 2012; Haider and Ansari 2012; Haider et al 2011; Qureshi et al 2011; Malik et al 2012; Nazir et al 2009; Khan et al 2010) regarding fault tolerance pertaining to different types of errors, failures, and faults and the corresponding subtypes are also part of this survey, which discloses a very wide range of problems expected in the grid computing environments.…”

Section: Challenges In Grid Dependabilitymentioning

confidence: 99%

Fault tolerance in computational grids: perspectives, challenges, and issues

Haider

Nazir

2016

SpringerPlus

Self Cite

View full text Add to dashboard Cite

Computational grids are established with the intention of providing shared access to hardware and software based resources with special reference to increased computational capabilities. Fault tolerance is one of the most important issues faced by the computational grids. The main contribution of this survey is the creation of an extended classification of problems that incur in the computational grid environments. The proposed classification will help researchers, developers, and maintainers of grids to understand the types of issues to be anticipated. Moreover, different types of problems, such as omission, interaction, and timing related have been identified that need to be handled on various layers of the computational grid. In this survey, an analysis and examination is also performed pertaining to the fault tolerance and fault detection mechanisms. Our conclusion is that a dependable and reliable grid can only be established when more emphasis is on fault identification. Moreover, our survey reveals that adaptive and intelligent fault identification, and tolerance techniques can improve the dependability of grid working environments.

show abstract

“…It has been widely used in solving challenging problems in the real world, such as protein folding [1,2], hydrology modelling [3], and natural disasters simulation [4]. The main reason for deploying grid computing is to introduce a system that is scalable, simple to use, autonomic, and able to deal with faults [5]. Grid computing emerged from meta-computing back in 1990s to support diverse online processing and data intensive application [6,7,8].…”

Section: Introductionmentioning

confidence: 99%