Understanding the recommendation of an artificially intelligent (AI) assistant for decision-making is especially important in high-risk tasks, such as deciding whether a mushroom is edible or poisonous. To foster user understanding and appropriate trust in such systems, we tested the effects of explainable artificial intelligence (XAI) methods and an educational intervention on AI-assisted decision-making behavior in a 2x2 between subjects online experiment with N = 410 participants. We developed a novel use case in which users go on a virtual mushroom hunt and are tasked with picking only edible mushrooms but leaving poisonous ones. Additionally, users were provided with an AI-based app that shows classification results of mushroom images. For the manipulation of explainability, one subgroup additionally received attribution-based and example-based explanations of the AI's predictions, and for the educational intervention one subgroup received additional information on how the AI worked. We found that the group with explanations outperformed the group without explanations and showed more appropriate trust levels. Contrary to our expectation, we did not find effects for the educational intervention, domain-specific knowledge, or AI knowledge on performance. We discuss practical implications and introduce the mushroom-picking task as a promising use case for XAI research.
There is a confidence crisis in many scientific disciplines, in particular disciplines researching human behavior, as many effects of original experiments have not been replicated successfully in large-scale replication studies. While human-robot interaction (HRI) is an interdisciplinary research field, the study of human behavior, cognition and emotion in HRI plays also a vital part. Are HRI user studies facing the same problems as other fields and if so, what can be done to overcome them? In this article, we first give a short overview of the replicability crisis in behavioral sciences and its causes. In a second step, we estimate the replicability of HRI user studies mainly 1) by structural comparison of HRI research processes and practices with those of other disciplines with replicability issues, 2) by systematically reviewing meta-analyses of HRI user studies to identify parameters that are known to affect replicability, and 3) by summarizing first replication studies in HRI as direct evidence. Our findings suggest that HRI user studies often exhibit the same problems that caused the replicability crisis in many behavioral sciences, such as small sample sizes, lack of theory, or missing information in reported data. In order to improve the stability of future HRI research, we propose some statistical, methodological and social reforms. This article aims to provide a basis for further discussion and a potential outline for improvements in the field.
The “Computers are social actors” (CASA) assumption (Nass and Moon in J Soc Issues 56:81–103, 2000. https://doi.org/10.1111/0022-4537.00153) states that humans apply social norms and expectations to technical devices. One such norm is to distort one’s own response in a socially desirable direction during interviews. However, findings for such an effect are mixed in the literature. Therefore, a new study on the effect of social desirability bias in human–robot evaluation was conducted, aiming for a conceptual replication of previous findings. In a between-subject laboratory experiment, $$N = 107$$
N
=
107
participants had to evaluate the robot and the interaction quality after a short conversation in three different groups: In one group, the evaluation was conducted using (1) the same robot of the former interaction, (2) a different robot, (3) a tablet computer. According to the CASA assumption, it was expected, that evaluations on likability and quality of interaction, are higher in the condition with the same robot conducting the evaluation, compared to a different robot or a tablet computer because robots are treated as social actors and hence humans distort ratings in a socially desirable direction. Based on previous findings, we expected robots to evoke higher anthropomorphism and feelings of social presence compared to the tablet computer as potential explanation. However, the data did not support the hypotheses. Low sample size, low statistical power, lack of measurement validation and other problems that could lead to an overestimation of effect sizes—in this study and the literature in general—are discussed in light of the replicability crisis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.