Conversational interfaces that interact with humans need to continuously establish, maintain and repair common ground in task-oriented dialogues. Uncertainty, repairs and acknowledgements are expressed in user behaviour in the continuous efforts of the conversational partners to maintain mutual understanding. Users change their behaviour when interacting with systems in different forms of embodiment, which affects the abilities of these interfaces to observe users’ recurrent social signals. Additionally, humans are intellectually biased towards social activity when facing anthropomorphic agents or when presented with subtle social cues. Two studies are presented in this paper examining how humans interact in a referential communication task with wizarded interfaces in different forms of embodiment. In study 1 (N = 30), we test whether humans respond the same way to agents, in different forms of embodiment and social behaviour. In study 2 (N = 44), we replicate the same task and agents but introduce conversational failures disrupting the process of grounding. Findings indicate that it is not always favourable for agents to be anthropomorphised or to communicate with non-verbal cues, as human grounding behaviours change when embodiment and failures are manipulated.