Barry Linnert scite author profile

Abstract. Resource reservations in advance are a mature concept for the allocation of various resources, particularly in grid environments. Common grid toolkits such as Globus support advance reservations and assign jobs to resources at admission time. While the allocation mechanisms for advance reservations are available in current grid management systems, in case of failures the advance reservation perspective demands for strategies that support more than recovery of jobs or applications that are active at the time the resource failure occurs. Instead, also already admitted, but not yet started applications are affected by the failure and hence, need to be dealt with in an appropriate manner. In this paper, we discuss the properties of advance reservations with respect to failure recovery and outline a number of strategies applicable in such cases in order to reduce the impact of resource failures and outages. It can be shown that it pays to remap also affected but not yet started jobs to alternative resources if available. Alike reserving in advance, this can be considered as remapping in advance. In particular, a remapping strategy that prefers requests that were allocated a long time ago, provides a high fairness for clients as it implements similar functionality as advance reservations, while achieving the same performance as the other strategies.

show abstract

Rerouting Strategies for Networks with Advance Reservations

Burchard

Linnert

Schneider

View full text Add to dashboard Cite

A distributed load-based failure recovery mechanism for advance reservation environments

Burchard

Linnert

Schneider

2005

View full text Add to dashboard Cite

Abstract-Resource reservations in advance are a mature concept for the allocation of various resources, particularly in grid environments. Common grid toolkits support advance reservations and assign jobs to resources at admission time. In such a distributed environment, it is necessary to develop carefully tailored failure recovery mechanisms that provide seamless transparent migration of jobs from one resource to another. As the migration of running jobs is difficult, an important issue in advance reservation, i.e., planning based, management infrastructures is to determine the duration of a failure in order to remap jobs that are already allocated to a currently failed resource but not yet active. As shown in previous work, underestimations of the failure duration and as a consequence the remapping of too few jobs results in an increased amount of job terminations. In order to overcome this drawback, in this paper we propose a load-based computation of the jobs to be remapped. A centralized and a distributed version of the strategy are presented, showing it is not necessary to have knowledge beyond the local allocation on the failed resource. The loadbased strategy achieves to effectively remap jobs while avoiding -inevitably inaccurate -estimations of the failure duration.

show abstract

Distributed dynamic processor allocation for multicomputers

2007

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Barry Linnert

The virtual resource manager: an architecture for SLA-aware resource management

Failure Recovery in Distributed Environments with Advance Reservation Management Systems

Rerouting Strategies for Networks with Advance Reservations

A distributed load-based failure recovery mechanism for advance reservation environments

Distributed dynamic processor allocation for multicomputers

Contact Info

Product

Resources

About