This paper analyzes the root causes of safety-related software errors in safety-critical, embedded systems. The results show that software errors identi ed a s potentially hazardous to the system tend to be p r oduced by di erent error mechanisms than non-safetyrelated software errors. Safety-related software errors are shown to arise most commonly from (1) discrepancies between the documented r equirements specications and the requirements needed for correct functioning of the system and (2) misunderstandings of the software's interface with the rest of the system. The paper uses these results to identify methods by which requirements errors can be p r evented. The goal is to reduce safety-related software e r r ors and to enhance the safety of complex, embedded systems.
Abstract-Analysis of anomalies that occur during operations is an important means of improving the quality of current and future software. Although the benefits of anomaly analysis of operational software are widely recognized, there has been relatively little research on anomaly analysis of safety-critical systems. In particular, patterns of software anomaly data for operational, safety-critical systems are not well understood. This paper presents the results of a pilot study using Orthogonal Defect Classification (ODC) to analyze nearly two hundred such anomalies on seven spacecraft systems. These data show several unexpected classification patterns such as the causal role of difficulties accessing or delivering data, of hardware degradation, and of rare events. The anomalies often revealed latent software requirements that were essential for robust, correct operation of the system. The anomalies also caused changes to documentation and to operational procedures to prevent the same anomalous situations from recurring. Feedback from operational anomaly reports helped measure the accuracy of assumptions about operational profiles, identified unexpected dependencies among embedded software and their systems and environment, and indicated needed improvements to the software, the development process, and the operational procedures. The results indicate that, for long-lived, critical systems, analysis of the most severe anomalies can be a useful mechanism both for maintaining safer, deployed systems and for building safer, similar systems in the future.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.