Fault injectionis important to evaluating the dependability of computer systems.Researchers and engineers have created many novel methods to inject faults, which can be implemented in both hardware and software.ependability evaluation involves the study of failures and errors. The destructive nature of a crash and long error latency make it difficult to identify the causes of failures in the operational environment.It is particularly hard to recreate a failure scenario for a large, complex system.To identify and understand potential failures, we use an experiment-based approach for studying the dependability of a system. Such an approach is applied not only during the conception and design phases, but also during the prototype and operational phases, t; To take an experiment-based approach, we must first understand a system's architecture, structure, and behavior. Specifically, we need to know its tolerance for faults and failures, including its built-in detection and recovery mechanisms, 3 and we need specific instruments and tools to inject faults, create failures or errors, and monitor their effects. DIFFERENT PHASES, DIFFERENT TECHNIQUESEngineers most often use low-cost, simulationbased fault injection to evaluate the dependability of a system that is in the conceptual and design phases. At this point, the system under study is only a series of high-level abstractions; implementation details have yet to be determined, Thus the system is simulated on the basis of simplified assumptions. Simulation-based fault injection, which assumes that errors or failures occur according to-predetermined dismbutio_ is mefd, for evaluating theeffe¢-! tiveaemof fault-toleran__ and a_ whid/hre:difflcult to_supp_ measurements. Testing a prototype, on the other hand, allows us to evaluate the system without any assumptions about system design, which yields more accurate results. In prototype-based fault iniection, we inject faults into the system to • identify dependability bottlenecks,• study system behavior in the presence of faults,• determine the coverage of error detection and recovery mechanisms, and • evaluate the effectiveness of fault tolerance mechanisms (such as reconfiguration schemes) and performance loss.To do prototype-based fault injection, faults are injected either at the hardware Level (logical or electrical faults) or at the software level (code or data corruption) and the effects are monitored. The system used for evaluation can be either a prototype or a fully operational system. Injecting faults into an operational system can provide information about the failure process. However, fault injection is suitable for studying emulated faults only. It also fails to provide dependability measures such as mean time between failures and availability.
This paper presents a dependability study of high-speed, switched Local Area Networks (LANs) using Myrinet as an example testbed (with theoretical speeds of 2.56 Gbps). The study uses results of two fault injection methods, simulated fault injection and softwareimplemented fault injection (SWIFI), to analyze the application-level impact of transient faults injected into the network interface hardware. These results include a number of errors such as dropped or corrupt messages, host interface or host resets, and local or remote host interface hangs. The paper presents the study in two parts: First, the results from the SWIFI method in the real system are used as a basis to validate the simulation and identify the major factors leading to di erences between the methods. A comparison between the two injection methods shows that they agree for 83% of the fault injections. The results, however, vary greatly depending on the fault type considered. The study presents an analysis of the e ects of varying workload intensity, host platform, and interface function targeted by the injection. An example of this analysis is to show that the function targetted has a signi cant impact on the fault activation rate. Finally, the study identi es two mechanisms by which faults may propagate from the interface to other parts of the network; in one example, this propagation caused the interface's host computer to reboot while another caused a remote interface in the network to hang.
This paper presents an injection-based approach to analyze dependability of high-speed networks using the Myrinet as a n example testbed. Instead of injecting faults related to network protocols, we injected faults into the host interface component, which performs the actual send and receive operations. The fault model used was a temporary single bit flip in an instruction executing o n the host interface's custom processor, corresponding to a transient fault in the processor itself. Results show that more than 25% of the injected faults resulted in interface failures. Furthermore, we observed fault propagation from a n interface to its host computer or to another interface to which it sent a message. These findings suggest that two important issues for high-speed networking in critical applications are protecting the host computer from errant or malicious interface components and implementing thorough message acceptance test mechanisms to prevent errant messages from propagating faults between interfaces.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.