In the last decade, a growing body of work has convincingly demonstrated that languages embed a certain degree of non-arbitrariness (mostly in the form of iconicity, namely the presence of imagistic links between linguistic form and meaning). Most of this previous work has been limited to assessing the degree (and role) of non-arbitrariness in the speech (for spoken languages) or manual components of signs (for sign languages). When approached in this way, non-arbitrariness is acknowledged but still considered to have little presence and purpose, showing a diachronic movement towards more arbitrary forms. However, this perspective is limited as it does not take into account the situated nature of language use in face-to-face interactions, where language comprises categorical components of speech and signs, but also multimodal cues such as prosody, gestures, eye gaze etc. We review work concerning the role of context-dependent iconic and indexical cues in language acquisition and processing to demonstrate the pervasiveness of non-arbitrary multimodal cues in language use and we discuss their function. We then move to argue that the online omnipresence of multimodal non-arbitrary cues supports children and adults in dynamically developing situational models.