Social robots have a recognizable physical appearance, a distinct voice, and interact with users in specific contexts. Previous research has suggested a 'matching hypothesis', which seeks to rationalise how people judge a robot's appropriateness for a task by its appearance. Other research has extended this to cover combinations of robot voices and appearances. In this paper, we examine the missing connection between robot voice, robot appearance, and deployment context. In so doing, we asked participants to match a robot image to a voice within a defined interaction context. We selected widely available social robots, identified task contexts they are used in, and manipulated the voices in terms of gender, naturalness, and accent. We found that the task context mediates the 'matching hypothesis'. People consistently selected a robot based on a vocal feature for a certain context, and a different robot based on the same vocal feature for another context. We suggest that robot voice design should take advantage of current technology that enables the creation and tuning of custom voices. They are a flexible tool to increase perception of appropriateness, which has a positive influence on Human-Robot Interaction. I. INTRODUCTION Spoken communication is the primary form of interaction between a growing number of social robots and their users. However, the overall effort of designing robot voices is considerably less than the amount of work that goes into designing their physical appearance [1]-[4]. Research in Human-Robot Interaction (HRI) could exploit the flexibility afforded by many available Text-to-Speech systems, voice banks, and custom recordings, enabling designers to gain greater control over how robots are perceived. Voices contribute to impression formation of newly-met individuals in human-human interactions [5], as well as shaping how impressions develop over time [6]. Besides linguistic content, voices carry a wide variety of information, ranging from indexical characteristics of the speaker such as gender, age, and place of origin, as well as temporary state alterations, like mood, emotions, or health [7], [8].
A class of master of science students and a group of preschool children codesigned new digital musical instruments based on workshop interviews involving vocal sketching, a method for imitating and portraying sounds. The aim of the study was to explore how the students and children would approach vocal sketching as one of several design methods. The children described musical instruments to the students using vocal sketching and other modalities (verbal, drawing, gestures). The resulting instruments built by the students were showcased at the Swedish Museum of Performing Arts in Stockholm. Although all the children tried vocal sketching during preparatory tasks, few employed the method during the workshop. However, the instruments seemed to meet the children’s expectations. Consequently, even though the vocal sketching method alone provided few design directives in the given context, we suggest that vocal sketching, under favorable circumstances, can be an engaging component that complements other modalities in codesign involving children.
This paper presents three studies where we probe aesthetics strategies of sound produced by movement sonification of a Pepper robot by mapping its movements to sound models. We developed two sets of sound models. The first set was made by two sound models, a sawtooth-based one and another based on feedback chains, for investigating how the perception of synthesized robot sounds would depend on their design complexity. We implemented the second set of sound models for probing the “materiality” of sound made by a robot in motion. This set consisted of a sound synthesis based on an engine highlighting the robot’s internal mechanisms, a metallic sound synthesis highlighting the robot’s typical appearance, and a whoosh sound synthesis highlighting the movement. We conducted three studies. The first study explores how the first set of sound models can influence the perception of expressive gestures of a Pepper robot through an online survey. In the second study, we carried out an experiment in a museum installation with a Pepper robot presented in two scenarios: (1) while welcoming patrons into a restaurant and (2) while providing information to visitors in a shopping center. Finally, in the third study, we conducted an online survey with stimuli similar to those used in the second study. Our findings suggest that participants preferred more complex sound models for the sonification of robot movements. Concerning the materiality, participants liked better subtle sounds that blend well with the ambient sound (i.e., less distracting) and soundscapes in which sound sources can be identified. Also, sound preferences varied depending on the context in which participants experienced the robot-generated sounds (e.g., as a live museum installation vs. an online display).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.