Abstract-This paper presents a three-stage model of language acquisition that integrates phonological, semantic and syntactic aspects of language learning. With the assumption that these three stages arise roughly in sequence, we test the model using the experimental methodology of cognitive robotics, where an emphasis is placed on situating the robot in a realistic, interactive environment. The first, phonological stage consists in learning sound patterns that are likely to correspond to words. The second stage concerns word-denotation association, which relies not only on sensory input but also on the learner's speech output in 'dialogue'. The data thus gathered allows us to invoke semantic bootstrapping in the third, grammar induction stage, where sets of words are mapped with simple logical types. We have started implementing the model and report here on the initial results of the human-robot interaction experiments we conducted.