This paper presents a user interface for a service robot that can bring the objects asked by the user. Speech-based interface is appropriate for this application. However, it alone is not sufficient. The system needs a vision-based interface to recognize gestures as well. Moreover, it needs vision capabilities to obtain the real world information about the objects mentioned in the user's speech. For example, the robot needs to find the target object ordered by speech to carry out the task. This can be considered that vision assists speech. However, vision sometimes fails to detect the objects. Moreover, there are objects for which vision cannot be expected to work well. In these cases, the robot tells the current status to the user so that he/she can give advice by speech to the robot. This can be considered that speech assists vision through the user. This paper presents how the mutual assistance between speech and vision works and demonstrates promising results through experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.