As robots become more and more capable and autonomous, there is an increasing need for humans to understand what the robots do and think. In this paper, we investigate what such understanding means and includes, and how robots can be designed to support understanding. After an in-depth survey of related earlier work, we discuss examples showing that understanding includes not only the intentions of the robot, but also desires, knowledge, beliefs, emotions, perceptions, capabilities, and limitations of the robot. The term understanding is formally defined, and the term communicative actions is defined to denote the various ways in which a robot may support a human’s understanding of the robot. A novel model of interaction for understanding is presented. The model describes how both human and robot may utilize a first or higher-order theory of mind to understand each other and perform communicative actions in order to support the other’s understanding. It also describes simpler cases in which the robot performs static communicative actions in order to support the human’s understanding of the robot. In general, communicative actions performed by the robot aim at reducing the mismatch between the mind of the robot, and the robot’s inferred model of the human’s model of the mind of the robot. Based on the proposed model, a set of questions are formulated, to serve as support when developing and implementing the model in real interacting robots.
Abstract:In many complex robotics systems, interaction takes place in all directions between human, robot, and environment. Performance of such a system depends on this interaction, and a proper evaluation of a system must build on a proper modeling of interaction, a relevant set of performance metrics, and a methodology to combine metrics into a single performance value. In this paper, existing models of human-robot interaction are adapted to fit complex scenarios with one or several humans and robots. The interaction and the evaluation process is formalized, and a general method to fuse performance values over time and for several performance metrics is presented. The resulting value, denoted interaction quality, adds a dimension to ordinary performance metrics by being explicit about the interplay between performance metrics, and thereby provides a formal framework to understand, model, and address complex aspects of evaluation of human-robot interaction.
Wizard-of-Oz experiments play a vital role in Human-Robot Interaction (HRI), as they allow for quick and simple hypothesis testing. Still, a publicly available general tool to conduct such experiments is currently not available in the research community, and researchers often develop and implement their own tools, customized for each individual experiment. Besides being inefficient in terms of programming efforts, this also makes it harder for non-technical researchers to conduct Wizard-of-Oz experiments. In this paper, we present a general and easy-to-use tool for the Pepper robot, one of the most commonly used robots in this context. While we provide the concrete interface for Pepper robots only, the system architecture is independent of the type of robot and can be adapted for other robots. A configuration file, which saves experiment-specific parameters, enables a quick setup for reproducible and repeatable Wizard-of-Oz experiments. A central server provides a graphical interface via a browser while handling the mapping of user input to actions on the robot. In our interface, keyboard shortcuts may be assigned to phrases, gestures, and composite behaviors to simplify and speed up control of the robot. The interface is lightweight and independent of the operating system. Our initial tests confirm that the system is functional, flexible, and easy to use. The interface, including source code, is made commonly available, and we hope that it will be useful for researchers with any background who want to conduct HRI experiments.
We introduce and investigate stack transducers, which are one-way stack automata with an output tape. A one-way stack automaton is a classical pushdown automaton with the additional ability to move the stack head inside the stack without altering the contents. For stack transducers, we distinguish between a digging and a non-digging mode. In digging mode, the stack transducer can write on the output tape when its stack head is inside the stack, whereas in non-digging mode, the stack transducer is only allowed to emit symbols when its stack head is at the top of the stack. These stack transducers have a motivation from natural-language interface applications, as they capture long-distance dependencies in syntactic, semantic, and discourse structures. We study the computational capacity for deterministic digging and non-digging stack transducers, as well as for their non-erasing and checking versions. We finally show that even for the strongest variant of stack transducers the stack languages are regular.
Semantic embeddings have advanced the state of the art for countless natural language processing tasks, and various extensions to multimodal domains, such as visual-semantic embeddings, have been proposed. While the power of visual-semantic embeddings comes from the distillation and enrichment of information through machine learning, their inner workings are poorly understood and there is a shortage of analysis tools. To address this problem, we generalize the notion of probing tasks to the visual-semantic case. To this end, we (i) discuss the formalization of probing tasks for embeddings of image-caption pairs, (ii) define three concrete probing tasks within our general framework, (iii) train classifiers to probe for those properties, and (iv) compare various state-of-the-art embeddings under the lens of the proposed probing tasks. Our experiments reveal an up to 12% increase in accuracy on visual-semantic embeddings compared to the corresponding unimodal embeddings, which suggest that the text and image dimensions represented in the former do complement each other.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.