Dynamic fault trees (dft) are widely adopted in industry to assess the dependability of safety-critical equipment. Since many systems are too large to be studied numerically, dfts dependability is often analysed using Monte Carlo simulation. A bottleneck here is that many simulation samples are required in the case of rare events, e.g. in highly reliable systems where components fail seldomly. Rare event simulation (res) provides techniques to reduce the number of samples in the case of rare events. We present a res technique based on importance splitting, to study failures in highly reliable dfts. Whereas res usually requires meta-information from an expert, our method is fully automatic: By cleverly exploiting the fault tree structure we extract the so-called importance function. We handle dfts with Markovian and non-Markovian failure and repair distributions-for which no numerical methods existand show the efficiency of our approach on several case studies.
Fault trees.A fault tree (ft) describes how component failures occur and propagate through the system, eventually leading to system failures. Technically, an ft is a directed acyclic graph whose leaves model component failures, and $ This work was partially funded by NWO, NS, and ProRail project 15474 (SE-whose other nodes (called gates) model failure propagation. Using fault trees one can compute dependability metrics to quantify how a system fares w.r.t. certain performance indicators. Two common metrics are system reliability-the probability that there are no system failures during a given mission time-and system availability-the average percentage of time that a system is operational.Static fault trees (aka standard fts) contain a few basic gates, like AND and OR gates. This makes them easy to design and analyse, but also limits their expressivity. Dynamic fault trees (dfts [21,52]) are a common and widely applied extension of standard fts, catering for more complex dependability patterns, like spare management and causal dependencies. To model these patterns, dfts come with additional gates, for instance SPARE, PAND, and FDEP.Such gates make dfts more difficult to analyse. In static fts it only matters whether or not a component has failed, so they can be analysed with Boolean methods, such as binary decision diagrams [32]. Dynamic fault trees, on the other hand, crucially depend on the failure order, so Boolean methods are insufficient. Moreover and on top of these two classes, repairable fault trees (rft [7]) permit components to be repaired after they have failed. This is crucial to model faulttolerant systems more realistically. Yet repairs make analyses even harder: it does not suffice to know which components failed, or in which order, but also if they are simultaneously failed. The general rule is that the more complex the formalism, the more realistic the model, and the harder the analyses. Fig. 2 is an rft with a top AND gate, a SPARE (Rcab), and three leaves.Fault tree analysis. The reliability/availability of a fault tree can be computed via...