Error propagation analysis is a consolidated practice to gain insights into error modes and effects that pertain to the activation of faults in software systems. A variety of approaches, such as architecture-based, source code instrumentation and variable tracing, have been proposed so far to address software error propagation analysis. Although valuable, existing approaches entail a substantial degree of system internals' knowledge, visibility and code manipulation that is not well-suited for real-life production environments. This paper proposes an empirical analysis of error propagation. We specifically address the challenges in using fault data and error events in the logs, which are a convenient byproduct of the system's execution. The approach puts forth the construction of error reporting graphs. We apply the approach to 2,042 failure data points from two real-world critical systems from the Air Traffic Control domain by a top industry provider. The approach contributes to develop a deep understanding on error modes and propagation paths, which can be leveraged by practitioners to make informed decisions on the placement of error detection mechanisms.
Keywords Error analysis • Error propagation • Critical systems • Monitoring
IntroductionError propagation analysis is a consolidated practice to gain insights into the dependability of software systems. It allows to infer error modes, intermediate paths and effects