In multimodal documents, different types of cohesive or cross-reference link (i.e., signaling) are used in the text to link verbally coded content with the graphical material. In this study, we identify three types of reference, within the framework of previous work on cohesion (Halliday & Hasan, 1976): directive signaling, descriptive signaling, and no signaling in the text to introduce the figure. In an experimental study, we use eye tracking to investigate how those three reference types influence the processing of the material by humans. The results reveal differences between the reference types both in terms of eye movement parameters and retention of the material.