We present a model of social learning of both language and skills, while assuming—insofar as possible—strict autonomy, virtual embodiment, and situatedness. This model is built by integrating various previous models of language development and social learning, and it is this integration that, under the mentioned assumptions, provides novel challenges. The aim of the article is to investigate what sociocognitive mechanisms agents should have in order to be able to transmit language from one generation to the next so that it can be used as a medium to transmit internalized rules that represent skill knowledge. We have performed experiments where this knowledge solves the familiar poisonous-food problem. Simulations reveal under what conditions, regarding population structure, agents can successfully solve this problem. In addition to issues relating to perspective taking and mutual exclusivity, we show that agents need to coordinate interactions so that they can establish joint attention in order to form a scaffold for language learning, which in turn forms a scaffold for the learning of rule-based skills. Based on these findings, we conclude by hypothesizing that social learning at one level forms a scaffold for the social learning at another, higher level, thus contributing to the accumulation of cultural knowledge.