The present study examines how young children and their caregivers establish reference by jointly developing stable patterns of bodily, perceptual, and interactive coordination. Our longitudinal investigation focuses on two mother–child dyads engaged in picture-book reading and play. The dyads were videotaped at home once every 6 weeks while the children aged from 9 to 24 months. Inspired by conversation analysis and multimodal analysis, our developmental approach builds on the insight that the situated and embodied production of reference is fundamentally an interactive achievement. To examine the acquisition of reference, we developed a descriptive instrument that takes account of not only the dyad's joint accomplishment but also each participant's contributions to it. The instrument is based on the sequential reconstruction of the jobs that both participants have to accomplish jointly in order to achieve reference: establishing visual perception as a relevant resource, constituting a domain of scrutiny, locating a target, and construing the (meaning of the) referent. Methodologically, these jobs serve as a tertium comparationis for the longitudinal comparison of both the adult's as well as the child's contributions to establishing reference. We used this instrument to examine (1) what bodily and verbal resources the participants employed, and (2) how their contributions to accomplishing the jobs changed over time. Findings showed that the acquisition of reference was closely related to the child's increasing ability to recognize, fulfill, and set up conditional relevancies. We conclude that the adult's dynamic and contextualized use of conditional relevancies, recipient design, and observability is a crucial driving force in the acquisition of reference.