Over the past two decades, questions have surfaced about the effectiveness and contribution of intelligent systems to decision makers in a variety of settings. This paper focuses on the evaluation challenges associated with intelligent real-time software systems that are embedded in larger host systems. With the proliferation of such systems in operational settings such as aerospace, medical, manufacturing, and transportation systems, increased attention to evaluations of such systems, and to resulting software safety, is warranted. This paper describes one such evaluation and proposes a set of evaluation criteria for embedded intelligent real-time systems (EIRTS). Implications of the evaluation and the evaluation criteria are discussed.the masters and crew of the Exxon Benicia and other trans-Alaskan Pipeline System (TAPS) trade tankers who participated in this research. The authors would also like to thank A1 Wallace, two anonymous reviewers, and the associate editor, who contributed greatly to this manuscript.
95Evaluation of Embedded Intelligent Real-Time Systems support systems, knowledge-based systems, and expert systems. These systems were initially conceived as stand-alone information technology, designed to provide decision support to their users in a timely and appropriate fashion. Recently, such systems have evolved from stand-alone systems and integrated systems, which share resources and information with other systems, to embedded intelligent real-time systems (EIRTS), which are knowledge-based systems deployed in larger host systems with real-time response requirements. These systems are essential to the functioning of many safety-critical large-scale systems, such as air traffic control systems, worldwide financial and telecommunications systems, and real-time command and control networks.The urge to develop embedded intelligent real-time systems to guide users' actions in complex large scale systems has been described as "almost irresistible," despite questions about the effectiveness or contribution of such systems to decision makers: Given decades of research on expert systems, advisory systems, and operator aids, there are few operational success stories. Empirical studies show that advice-giving systems frequently fail to enhance overall system performance. Sometimes operators fail to request the proffered advice. Sometimes operators do not like the "tone" of the advice-giving agent and thus refuse to follow its advice. Sometimes the operators do not feel the advice is worth seeking. Other aids are effective for novices, but as operators acquire skill, the aid no longer has measurable effect (Mitchell & Sundstrom, 1997, p. 269).These observations suggest that assessments of EIRTS are important, particularly in safety-critical large-scale systems such as aviation and aerospace where software safety (Leveson, 1995) is important. In these systems, humans and technology are often jointly responsible for executing tasks and, thus, both technical and organizational evaluation measures are important....