This paper presents analytical and simulation models for evaluating the operation of a VLSI processor (in a uniprocessor configuration) which utilizes a time-redundant approach (such as recomputation by shifted operands) for fault-tolerant computing. In the proposed approach, all incoming jobs to the uniprocessor are duplicated, thus two versions of each job must be processed. A discrepancy in the results produced by comparing the outcomes of the two versions of the same job indicates that a fault may have occurred. Several methods for appropriately scheduling the primary and secondary versions of the jobs are proposed and analyzed.
Introduction.To meet the increasing demand of system reliability and availability, fault tolerant techniques have been widely employed in today's computers [l]. One of the arrangements commonly employed for achieving fault tolerance consists of providing some form of redundancy in the system. Redundancy can be applied in either the hardware (for example by using a duplex or a higher module replication) or in the software (as the N version programming redundancy of [2]). This type of redundancy is referred to as space redundancy. As in a redundant system, replication of at least twice the hardware or software components is required, then the cost of manufacturing fault tolerant computing systems in VLSI can be expected to increase substantially [l]. Another interesting method for attaining fault tolerant computing is time redundancy [I]; an example of this type of approach is the recomputation with shifted operands (RESO) [3], in which fault tolerance is achieved by limiting the modular replication of space redundancy, but at least doubling the time required for the basic computation step.In a multi-processor system, processors provide some natural form of space redundancy. Studies have shown thet demand for system resources varies in a stochastic manner, in most cases a so-called spare capacity exists to implement some form of space redundancy [5].