We present a novel augmented reality (AR) interface to provide effective means to diagnose a robot's erroneous behaviors, endow it with new skills, and patch its knowledge structure represented by an And-Or-Graph (AOG). Specifically, an AOG representation of opening medicine bottles is learned from human demonstration and yields a hierarchical structure that captures the spatiotemporal compositional nature of the given task, which is highly interpretable for the users. Through a series of psychological experiments, we demonstrate that the explanations of a robotic system, inherited from and produced by the AOG, can better foster human trust compared to other forms of explanations. Moreover, by visualizing the knowledge structure and robot states, the AR interface allows human users to intuitively understand what the robot knows, supervise the robot's task planner, and interactively teach the robot with new actions. Together, users can quickly identify the reasons for failures and conveniently patch the current knowledge structure to prevent future errors. This capability demonstrates the interpretability of our knowledge representation and the new forms of interactions afforded by the proposed AR interface.augmented reality (AR), explainable artificial intelligence (XAI), robot learning
| INTRODUCTIONThe ever-growing vast amount of data and rapid-increasing computing power have recently boosted a data-driven machine learning paradigm. Despite promising progress in model performance, such pure data-driven approaches, especially methods using deep neural networks, have one well-known limitation-the lack of interpretability. Various attempts, therefore, have been made to alleviate this shortcoming, such as visualizing filter responses, 1-3 developing communication protocols, [4][5][6][7] and generating text descriptions for images, 8,9 or robot behaviors. 10,11 However, these explanation mechanisms only suffice with two-dimensional (2D) interactions (ie, using computer screens) and fall short of interacting with physical robots, especially in some mission-critical settings that require supervising multiple robots. Therefore, an interpretable knowledge representation for robots and an effective explanation interface beyond 2D are needed for better situational awareness, richer spatial information, and in situ explanations during human-robot interaction. Of note, less attention has been paid, for physical robots, in introducing