Reliability-aware resource management for computational grid/cluster environments

Limaye, K.; Leangsuksun, Box; Liu, Yudan; Greenwood, Z. D.; Scott, Stephen L.; Libby, R.; Chanchio, Kasidit

doi:10.1109/grid.2005.1542744

Cited by 13 publications

(5 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Section: Fault-tolerance Replication Modelmentioning

confidence: 97%

“…SOAP request with small number of parameters or data), and not tasks that would involve time-intensive replication (such as duplication of large data sets) and whose handling would influence the overall performance of the proposed scheme. Other research works that are based on the same approach of a negligible replication overhead are presented in [27][28][29]. Furthermore, by producing task replicas, the failure probability of each task can be significantly lowered; however, the number of tasks that are finally assigned to the mobile Grid increases, increasing, respectively, the total workload that is assigned to the Grid for execution.…”

Section: Fault-tolerance Replication Modelmentioning

confidence: 99%

See 1 more Smart Citation

Fault tolerant and prioritized scheduling in OGSA‐based mobile Grids

Λίτκε

Halkos

Tserpes

et al. 2008

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARYGrids and mobile Grids can form the basis and the enabling technology for pervasive and utility computing due to their ability to being open, highly heterogeneous and scalable. In this paper we present a scheme for advancing quality of service (QoS) attributes, such as fault tolerance and prioritized scheduling, in OGSA-based mobile Grids. The fault tolerance is achieved by producing and managing sufficient replicas of tasks submitted for execution on the mobile Grid resources. We design a simple and efficient prioritization scheme, which allows the scheduling of the tasks submitted by the Grid users as distinguished priorities that can be managed and exploited as a QoS parameter by the Grid infrastructure operator. The results that are presented show the efficiency of the proposed scheme in being simple and additionally enriching with reliability and QoS features the applications that are built on the concept of mobile Grids. Copyright

show abstract

Section: Fault-tolerance Replication Modelmentioning

confidence: 97%

Section: Fault-tolerance Replication Modelmentioning

confidence: 99%

Fault tolerant and prioritized scheduling in OGSA‐based mobile Grids

Λίτκε

Halkos

Tserpes

et al. 2008

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…Single points of failure can be avoided through implementation of redundancy, such as the backup generator in Fig. 1 [26]. • State control: Each smart grid operator object (e.g., voltage stabilizer in Fig.…”

Section: Smart Grid Object Requirementsmentioning

confidence: 99%

Representativeness models of systems: smart grid example

Schneidewind

2011

Innovations Syst Softw Eng

View full text Add to dashboard Cite

Given the great emphasis being placed on energy efficiency in contemporary society, in which the smart grid plays a prominent role, this is an opportune time to explore methodologies for appropriately representing system attributes. We suggest this is important for effective system development because the primary factor in correctly mapping between requirements and implementation is how representative the system design is of requirements. Since representativeness is an abstract term, it is imperative to identify ways to quantify it. We use several metrics. Among these is the priority of system elements (e.g., electric generator) in the set of elements, based on importance to system success. Secondly, fault tree analysis is employed to identify elements that operate in an unsafe state and the probabilities of reaching these unsafe states. Thirdly, state transition analysis provides traces of which elements are on the routes to unsafe states. These analyses provide the information needed to reduce element faults and failures on a priority basis.

show abstract

“…Clusters [9][10][11] are deployed to improve reliability and availability in safety-critical systems, such as Google Linux Cluster. A cluster system consists of a group of independent computers running an independent operating system and working together as a single system to provide a powerful computing environment and high availability of services, particularly for computation intensive tasks.…”

Section: Introductionmentioning

confidence: 99%