Emphasis, by means of either pitch accents or beat gestures (rhythmic co-verbal gestures with no semantic meaning), has been shown to serve two main purposes in human communication: syntactic disambiguation and salience. To use beat gestures in this role, interlocutors must be able to integrate them with the speech they accompany. Whether such integration is possible when the multi-modal communication information is produced by a humanoid robot, and whether it is as efficient as for human communicators, are questions that need to be answered to further understanding of the efficacy of humanoid robots for naturalistic human-like communication.Here, we present an experiment which, using a fully within subjects design, shows that there is a marked difference in speech and gesture integration between human and robot communicators, being significantly less effective for the robot. In contrast to beat gestures, the effects of speech emphasis are the same whether that speech is played through a robot or as part of a video of a human. Thus, while integration of speech emphasis and verbal information do occur for robot communicators, integration of non-informative beat gestures and verbal information does not, despite comparable timing and motion profiles to human gestures.