Human speech is an unusually flexible form of primate communication. A limited repertoire of phonemes is assembled into morphemes whose combinations are linked to referents by social learning in culturally specific ways. Non-human primate vocal utterances, in contrast, are drastically less flexible at every level, i.e., acoustic diversity, combinatoriality and referential function. Here, we were interested in whether primate vocal behaviour, despite its inflexible nature, is still amenable to social learning, a well-documented, powerful mechanism in many other aspects of primate behaviour. We provided wild vervet monkeys with social learning opportunities to operate a food dispensing apparatus by uttering specific vocalisations, 'move-grunts', that were contextually inappropriate. In two separate experiments, subjects witnessed how another group member produced move-grunts whilst operating a food dispenser. In experiment 1, subjects only heard playbacks of the calls, followed by food release; in experiment 2, they witnessed a complete demonstration video of conspecific including the calls, followed by food release. None of the subjects learned the task in experiment 1, but 1 of 39 subjects, a juvenile female, succeeded in experiment 2 by producing move-grunts at the food dispenser to obtain food rewards. We discuss our results and behavioural observations in relation to the current theory and its eventual implications for reconstructing the evolutionary pathway to human speech.