Abstract-Clusters of computers can provide, in aggregate, reliable services despite the failure of individual computers. System-levelv irtualization is widely used to consolidate the workload of multiple physical systems as multiple virtual machines (VMs) on a single physical computer.Asingle physical computer thus forms a virtual cluster of VMs. Akey difficulty with virtualization is that the failure of the virtualization infrastructure (VI) often leads to the failure of multiple VMs. This is likely to overload ''cluster computing'' resiliencym echanisms, typically designed to tolerate the failure of only a single node at a time. By supporting recovery from failure of key VIc omponents, we have enhanced the resiliencyo faV I( Xen), thus enabling the use of existing ''cluster computing''t echniques to provide resilient virtual clusters. In the overwhelming majority of cases, these enhancements allowr ecovery from errors in the VI to be accomplished without the failure of more than a single VM. The resulting resiliencyo ft he virtual cluster is demonstrated by running twoe xisting ''cluster computing''s ystems while subjecting the VI to injected faults.