The physical appearance and behavior of a robot is an important asset in terms of Human-Computer Interaction. Multimodality is also fundamental, as we humans usually expect to interact in a natural way with voice, gestures, etc. People approach complex interaction devices with stances similar to those used in their interaction with other people. In this paper we describe a robot head, currently under development, that aims to be a multimodal (vision, voice, gestures,...) perceptual user interface. Modules are described for face detection, tracking, facial movement, action selection and sound localization. Preliminary results indicate that the robot head can potentially achieve the goals we are interested in, namely human interaction and assistance.