Israel Hsu scite author profile

Israel Hsu

3Publications

13Citation Statements Received

55Citation Statements Given

How they've been cited

How they cite others

Affiliations

Concurrent Technologies Corporation, University of California, Los Angeles

Publications

Order By: Most citations

Resilient Virtual Clusters

Hsu

Tamir

2011

View full text Add to dashboard Cite

Abstract-Clusters of computers can provide, in aggregate, reliable services despite the failure of individual computers. System-levelv irtualization is widely used to consolidate the workload of multiple physical systems as multiple virtual machines (VMs) on a single physical computer.Asingle physical computer thus forms a virtual cluster of VMs. Akey difficulty with virtualization is that the failure of the virtualization infrastructure (VI) often leads to the failure of multiple VMs. This is likely to overload ''cluster computing'' resiliencym echanisms, typically designed to tolerate the failure of only a single node at a time. By supporting recovery from failure of key VIc omponents, we have enhanced the resiliencyo faV I( Xen), thus enabling the use of existing ''cluster computing''t echniques to provide resilient virtual clusters. In the overwhelming majority of cases, these enhancements allowr ecovery from errors in the VI to be accomplished without the failure of more than a single VM. The resulting resiliencyo ft he virtual cluster is demonstrated by running twoe xisting ''cluster computing''s ystems while subjecting the VI to injected faults.

show abstract

Using Virtualization to Validate Fault-Tolerant Distributed Systems

Hsu

Gallagher

et al. 2010

View full text Add to dashboard Cite

Asynchronous events and complexs ystem state distributed across independent nodes makee xposure and diagnosis of flaws in distributed systems a challenge. The difficulties are exacerbated when the goal is to validate fault tolerance mechanisms that are activated only by the occurrence of errors, which are, by nature, rare. Va lidation of fault tolerance mechanisms is often done by injecting faults that emulate the actual faults and ''stress''t he functionality of the resilience mechanisms. Va lidation campaigns lasting days and involving thousands of fault injections are often necessary.W ep resent an infrastructure that combines virtualization and software-implemented fault injection to automate validation campaigns and support the analysis of the behavior of a distributed system under test. Virtualization enables: 1) aflexible fault injector capable of emulating a wide variety of faults, and 2) am echanism for autonomously recovering faulty nodes so that the campaign can continue running on a target system that is fully functional. As ac ase study we use this infrastructure to validate a Byzantine-fault-tolerant cluster manager.O ver 1280 hours of fault injections yielded the exposure of 11 unique flaws in the cluster manager.

show abstract

Design and validation of portable communication infrastructure for fault-tolerant cluster middleware

Tao¹,

Goldberg²,

Hsu³

et al.

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Israel Hsu

Resilient Virtual Clusters

Using Virtualization to Validate Fault-Tolerant Distributed Systems

Design and validation of portable communication infrastructure for fault-tolerant cluster middleware

Contact Info

Product

Resources

About