Organizations which use grid computing have to deal with events such as a machine turned off or a failed component. Some of these events can completely break a grid. We propose a mechanism to maximize resource usage by monitoring grid middleware components and making them capable of recovering from failures.