This thesis investigates how participants in 20 hours of video-recorded scenic opera rehearsals make use of depictions (Clark, 2016), a communicative strategy based on iconicity, to create performance bodies, i.e. what the performers should do on stage to music. The method is grounded in ethnomethodologically informed conversation analysis and interactional linguistics (EMCAIL). The thesis aims to reveal how participants in opera rehearsals construct and respond to depictions, what interactional and semiotic functions depictions carry, and the nature of the relationship between depictions and descriptions, and in extension between language and the body in social interaction.The thesis comprises three individual articles. Article I focuses on how performers and the director deploy depictions in proposals of performance bodies. It is argued that depictions reference both themselves as the current state of the artwork, and prototypes of mundane behaviour (distal scenes). The article compares the self-referentiality, or introversive semiosis, of depictions with how interactional practices in general develop over time. Article II focuses specifically on how performers make proposals with depictions. The article concludes that depictions are multimodal gestalts whose interpersonal coordination reflects the distribution of deontic rights during the rehearsals in a visuospatial way, beyond the adjacency pair. Article III focuses on changes in turn design, and the relative deployment of depictions and descriptions, over joint decision-making micro-histories. It is shown how proposals move from descriptive to increasingly depictive states as the participants assure that there is displayed epistemic access to, and alignment and agreement with, the proposed performance bodies. The use of language early in the process secures conditionally relevant responses to the proposed ideas and thereby successful outcomes of proposals. The article reveals the essentially joint nature of the decision-making process on performance bodies.The thesis uncovers the temporally heterogenous nature of depictions. They are achieved in stepwise manners: both in terms of their moment-by-moment realization in turns, and in terms of their development over interactional histories. They are dialogically achieved both in the local and historical sense: their successful realization is dependent on cooperation from co-present participants who are also intrinsically involved in their development over time. Further, it is argued that depictions are both an interactional practice for creating opera performances and the very same performances at their current states. The thesis contributes to a holistic and integrated view of social interaction where no resources, whether traditionally conceived of as linguistic or not, are considered more important than others for the local constitution of social action.